xref: /dflybsd-src/contrib/gcc-8.0/gcc/bb-reorder.c (revision 38fd149817dfbff97799f62fcb70be98c4e32523)
1*38fd1498Szrj /* Basic block reordering routines for the GNU compiler.
2*38fd1498Szrj    Copyright (C) 2000-2018 Free Software Foundation, Inc.
3*38fd1498Szrj 
4*38fd1498Szrj    This file is part of GCC.
5*38fd1498Szrj 
6*38fd1498Szrj    GCC is free software; you can redistribute it and/or modify it
7*38fd1498Szrj    under the terms of the GNU General Public License as published by
8*38fd1498Szrj    the Free Software Foundation; either version 3, or (at your option)
9*38fd1498Szrj    any later version.
10*38fd1498Szrj 
11*38fd1498Szrj    GCC is distributed in the hope that it will be useful, but WITHOUT
12*38fd1498Szrj    ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13*38fd1498Szrj    or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
14*38fd1498Szrj    License for more details.
15*38fd1498Szrj 
16*38fd1498Szrj    You should have received a copy of the GNU General Public License
17*38fd1498Szrj    along with GCC; see the file COPYING3.  If not see
18*38fd1498Szrj    <http://www.gnu.org/licenses/>.  */
19*38fd1498Szrj 
20*38fd1498Szrj /* This file contains the "reorder blocks" pass, which changes the control
21*38fd1498Szrj    flow of a function to encounter fewer branches; the "partition blocks"
22*38fd1498Szrj    pass, which divides the basic blocks into "hot" and "cold" partitions,
23*38fd1498Szrj    which are kept separate; and the "duplicate computed gotos" pass, which
24*38fd1498Szrj    duplicates blocks ending in an indirect jump.
25*38fd1498Szrj 
26*38fd1498Szrj    There are two algorithms for "reorder blocks": the "simple" algorithm,
27*38fd1498Szrj    which just rearranges blocks, trying to minimize the number of executed
28*38fd1498Szrj    unconditional branches; and the "software trace cache" algorithm, which
29*38fd1498Szrj    also copies code, and in general tries a lot harder to have long linear
30*38fd1498Szrj    pieces of machine code executed.  This algorithm is described next.  */
31*38fd1498Szrj 
32*38fd1498Szrj /* This (greedy) algorithm constructs traces in several rounds.
33*38fd1498Szrj    The construction starts from "seeds".  The seed for the first round
34*38fd1498Szrj    is the entry point of the function.  When there are more than one seed,
35*38fd1498Szrj    the one with the lowest key in the heap is selected first (see bb_to_key).
36*38fd1498Szrj    Then the algorithm repeatedly adds the most probable successor to the end
37*38fd1498Szrj    of a trace.  Finally it connects the traces.
38*38fd1498Szrj 
39*38fd1498Szrj    There are two parameters: Branch Threshold and Exec Threshold.
40*38fd1498Szrj    If the probability of an edge to a successor of the current basic block is
41*38fd1498Szrj    lower than Branch Threshold or its count is lower than Exec Threshold,
42*38fd1498Szrj    then the successor will be the seed in one of the next rounds.
43*38fd1498Szrj    Each round has these parameters lower than the previous one.
44*38fd1498Szrj    The last round has to have these parameters set to zero so that the
45*38fd1498Szrj    remaining blocks are picked up.
46*38fd1498Szrj 
47*38fd1498Szrj    The algorithm selects the most probable successor from all unvisited
48*38fd1498Szrj    successors and successors that have been added to this trace.
49*38fd1498Szrj    The other successors (that has not been "sent" to the next round) will be
50*38fd1498Szrj    other seeds for this round and the secondary traces will start from them.
51*38fd1498Szrj    If the successor has not been visited in this trace, it is added to the
52*38fd1498Szrj    trace (however, there is some heuristic for simple branches).
53*38fd1498Szrj    If the successor has been visited in this trace, a loop has been found.
54*38fd1498Szrj    If the loop has many iterations, the loop is rotated so that the source
55*38fd1498Szrj    block of the most probable edge going out of the loop is the last block
56*38fd1498Szrj    of the trace.
57*38fd1498Szrj    If the loop has few iterations and there is no edge from the last block of
58*38fd1498Szrj    the loop going out of the loop, the loop header is duplicated.
59*38fd1498Szrj 
60*38fd1498Szrj    When connecting traces, the algorithm first checks whether there is an edge
61*38fd1498Szrj    from the last block of a trace to the first block of another trace.
62*38fd1498Szrj    When there are still some unconnected traces it checks whether there exists
63*38fd1498Szrj    a basic block BB such that BB is a successor of the last block of a trace
64*38fd1498Szrj    and BB is a predecessor of the first block of another trace.  In this case,
65*38fd1498Szrj    BB is duplicated, added at the end of the first trace and the traces are
66*38fd1498Szrj    connected through it.
67*38fd1498Szrj    The rest of traces are simply connected so there will be a jump to the
68*38fd1498Szrj    beginning of the rest of traces.
69*38fd1498Szrj 
70*38fd1498Szrj    The above description is for the full algorithm, which is used when the
71*38fd1498Szrj    function is optimized for speed.  When the function is optimized for size,
72*38fd1498Szrj    in order to reduce long jumps and connect more fallthru edges, the
73*38fd1498Szrj    algorithm is modified as follows:
74*38fd1498Szrj    (1) Break long traces to short ones.  A trace is broken at a block that has
75*38fd1498Szrj    multiple predecessors/ successors during trace discovery.  When connecting
76*38fd1498Szrj    traces, only connect Trace n with Trace n + 1.  This change reduces most
77*38fd1498Szrj    long jumps compared with the above algorithm.
78*38fd1498Szrj    (2) Ignore the edge probability and count for fallthru edges.
79*38fd1498Szrj    (3) Keep the original order of blocks when there is no chance to fall
80*38fd1498Szrj    through.  We rely on the results of cfg_cleanup.
81*38fd1498Szrj 
82*38fd1498Szrj    To implement the change for code size optimization, block's index is
83*38fd1498Szrj    selected as the key and all traces are found in one round.
84*38fd1498Szrj 
85*38fd1498Szrj    References:
86*38fd1498Szrj 
87*38fd1498Szrj    "Software Trace Cache"
88*38fd1498Szrj    A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999
89*38fd1498Szrj    http://citeseer.nj.nec.com/15361.html
90*38fd1498Szrj 
91*38fd1498Szrj */
92*38fd1498Szrj 
93*38fd1498Szrj #include "config.h"
94*38fd1498Szrj #define INCLUDE_ALGORITHM /* stable_sort */
95*38fd1498Szrj #include "system.h"
96*38fd1498Szrj #include "coretypes.h"
97*38fd1498Szrj #include "backend.h"
98*38fd1498Szrj #include "target.h"
99*38fd1498Szrj #include "rtl.h"
100*38fd1498Szrj #include "tree.h"
101*38fd1498Szrj #include "cfghooks.h"
102*38fd1498Szrj #include "df.h"
103*38fd1498Szrj #include "memmodel.h"
104*38fd1498Szrj #include "optabs.h"
105*38fd1498Szrj #include "regs.h"
106*38fd1498Szrj #include "emit-rtl.h"
107*38fd1498Szrj #include "output.h"
108*38fd1498Szrj #include "expr.h"
109*38fd1498Szrj #include "params.h"
110*38fd1498Szrj #include "tree-pass.h"
111*38fd1498Szrj #include "cfgrtl.h"
112*38fd1498Szrj #include "cfganal.h"
113*38fd1498Szrj #include "cfgbuild.h"
114*38fd1498Szrj #include "cfgcleanup.h"
115*38fd1498Szrj #include "bb-reorder.h"
116*38fd1498Szrj #include "except.h"
117*38fd1498Szrj #include "fibonacci_heap.h"
118*38fd1498Szrj #include "stringpool.h"
119*38fd1498Szrj #include "attribs.h"
120*38fd1498Szrj 
121*38fd1498Szrj /* The number of rounds.  In most cases there will only be 4 rounds, but
122*38fd1498Szrj    when partitioning hot and cold basic blocks into separate sections of
123*38fd1498Szrj    the object file there will be an extra round.  */
124*38fd1498Szrj #define N_ROUNDS 5
125*38fd1498Szrj 
126*38fd1498Szrj struct target_bb_reorder default_target_bb_reorder;
127*38fd1498Szrj #if SWITCHABLE_TARGET
128*38fd1498Szrj struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder;
129*38fd1498Szrj #endif
130*38fd1498Szrj 
131*38fd1498Szrj #define uncond_jump_length \
132*38fd1498Szrj   (this_target_bb_reorder->x_uncond_jump_length)
133*38fd1498Szrj 
134*38fd1498Szrj /* Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE.  */
135*38fd1498Szrj static const int branch_threshold[N_ROUNDS] = {400, 200, 100, 0, 0};
136*38fd1498Szrj 
137*38fd1498Szrj /* Exec thresholds in thousandths (per mille) of the count of bb 0.  */
138*38fd1498Szrj static const int exec_threshold[N_ROUNDS] = {500, 200, 50, 0, 0};
139*38fd1498Szrj 
140*38fd1498Szrj /* If edge count is lower than DUPLICATION_THRESHOLD per mille of entry
141*38fd1498Szrj    block the edge destination is not duplicated while connecting traces.  */
142*38fd1498Szrj #define DUPLICATION_THRESHOLD 100
143*38fd1498Szrj 
144*38fd1498Szrj typedef fibonacci_heap <long, basic_block_def> bb_heap_t;
145*38fd1498Szrj typedef fibonacci_node <long, basic_block_def> bb_heap_node_t;
146*38fd1498Szrj 
147*38fd1498Szrj /* Structure to hold needed information for each basic block.  */
148*38fd1498Szrj struct bbro_basic_block_data
149*38fd1498Szrj {
150*38fd1498Szrj   /* Which trace is the bb start of (-1 means it is not a start of any).  */
151*38fd1498Szrj   int start_of_trace;
152*38fd1498Szrj 
153*38fd1498Szrj   /* Which trace is the bb end of (-1 means it is not an end of any).  */
154*38fd1498Szrj   int end_of_trace;
155*38fd1498Szrj 
156*38fd1498Szrj   /* Which trace is the bb in?  */
157*38fd1498Szrj   int in_trace;
158*38fd1498Szrj 
159*38fd1498Szrj   /* Which trace was this bb visited in?  */
160*38fd1498Szrj   int visited;
161*38fd1498Szrj 
162*38fd1498Szrj   /* Cached maximum frequency of interesting incoming edges.
163*38fd1498Szrj      Minus one means not yet computed.  */
164*38fd1498Szrj   int priority;
165*38fd1498Szrj 
166*38fd1498Szrj   /* Which heap is BB in (if any)?  */
167*38fd1498Szrj   bb_heap_t *heap;
168*38fd1498Szrj 
169*38fd1498Szrj   /* Which heap node is BB in (if any)?  */
170*38fd1498Szrj   bb_heap_node_t *node;
171*38fd1498Szrj };
172*38fd1498Szrj 
173*38fd1498Szrj /* The current size of the following dynamic array.  */
174*38fd1498Szrj static int array_size;
175*38fd1498Szrj 
176*38fd1498Szrj /* The array which holds needed information for basic blocks.  */
177*38fd1498Szrj static bbro_basic_block_data *bbd;
178*38fd1498Szrj 
179*38fd1498Szrj /* To avoid frequent reallocation the size of arrays is greater than needed,
180*38fd1498Szrj    the number of elements is (not less than) 1.25 * size_wanted.  */
181*38fd1498Szrj #define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5)
182*38fd1498Szrj 
183*38fd1498Szrj /* Free the memory and set the pointer to NULL.  */
184*38fd1498Szrj #define FREE(P) (gcc_assert (P), free (P), P = 0)
185*38fd1498Szrj 
186*38fd1498Szrj /* Structure for holding information about a trace.  */
187*38fd1498Szrj struct trace
188*38fd1498Szrj {
189*38fd1498Szrj   /* First and last basic block of the trace.  */
190*38fd1498Szrj   basic_block first, last;
191*38fd1498Szrj 
192*38fd1498Szrj   /* The round of the STC creation which this trace was found in.  */
193*38fd1498Szrj   int round;
194*38fd1498Szrj 
195*38fd1498Szrj   /* The length (i.e. the number of basic blocks) of the trace.  */
196*38fd1498Szrj   int length;
197*38fd1498Szrj };
198*38fd1498Szrj 
199*38fd1498Szrj /* Maximum count of one of the entry blocks.  */
200*38fd1498Szrj static profile_count max_entry_count;
201*38fd1498Szrj 
202*38fd1498Szrj /* Local function prototypes.  */
203*38fd1498Szrj static void find_traces_1_round (int, profile_count, struct trace *, int *,
204*38fd1498Szrj 				 int, bb_heap_t **, int);
205*38fd1498Szrj static basic_block copy_bb (basic_block, edge, basic_block, int);
206*38fd1498Szrj static long bb_to_key (basic_block);
207*38fd1498Szrj static bool better_edge_p (const_basic_block, const_edge, profile_probability,
208*38fd1498Szrj 			   profile_count, profile_probability, profile_count,
209*38fd1498Szrj 			   const_edge);
210*38fd1498Szrj static bool copy_bb_p (const_basic_block, int);
211*38fd1498Szrj 
212*38fd1498Szrj /* Return the trace number in which BB was visited.  */
213*38fd1498Szrj 
214*38fd1498Szrj static int
215*38fd1498Szrj bb_visited_trace (const_basic_block bb)
216*38fd1498Szrj {
217*38fd1498Szrj   gcc_assert (bb->index < array_size);
218*38fd1498Szrj   return bbd[bb->index].visited;
219*38fd1498Szrj }
220*38fd1498Szrj 
221*38fd1498Szrj /* This function marks BB that it was visited in trace number TRACE.  */
222*38fd1498Szrj 
223*38fd1498Szrj static void
224*38fd1498Szrj mark_bb_visited (basic_block bb, int trace)
225*38fd1498Szrj {
226*38fd1498Szrj   bbd[bb->index].visited = trace;
227*38fd1498Szrj   if (bbd[bb->index].heap)
228*38fd1498Szrj     {
229*38fd1498Szrj       bbd[bb->index].heap->delete_node (bbd[bb->index].node);
230*38fd1498Szrj       bbd[bb->index].heap = NULL;
231*38fd1498Szrj       bbd[bb->index].node = NULL;
232*38fd1498Szrj     }
233*38fd1498Szrj }
234*38fd1498Szrj 
235*38fd1498Szrj /* Check to see if bb should be pushed into the next round of trace
236*38fd1498Szrj    collections or not.  Reasons for pushing the block forward are 1).
237*38fd1498Szrj    If the block is cold, we are doing partitioning, and there will be
238*38fd1498Szrj    another round (cold partition blocks are not supposed to be
239*38fd1498Szrj    collected into traces until the very last round); or 2). There will
240*38fd1498Szrj    be another round, and the basic block is not "hot enough" for the
241*38fd1498Szrj    current round of trace collection.  */
242*38fd1498Szrj 
243*38fd1498Szrj static bool
244*38fd1498Szrj push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds,
245*38fd1498Szrj 		      profile_count count_th)
246*38fd1498Szrj {
247*38fd1498Szrj   bool there_exists_another_round;
248*38fd1498Szrj   bool block_not_hot_enough;
249*38fd1498Szrj 
250*38fd1498Szrj   there_exists_another_round = round < number_of_rounds - 1;
251*38fd1498Szrj 
252*38fd1498Szrj   block_not_hot_enough = (bb->count < count_th
253*38fd1498Szrj 			  || probably_never_executed_bb_p (cfun, bb));
254*38fd1498Szrj 
255*38fd1498Szrj   if (there_exists_another_round
256*38fd1498Szrj       && block_not_hot_enough)
257*38fd1498Szrj     return true;
258*38fd1498Szrj   else
259*38fd1498Szrj     return false;
260*38fd1498Szrj }
261*38fd1498Szrj 
262*38fd1498Szrj /* Find the traces for Software Trace Cache.  Chain each trace through
263*38fd1498Szrj    RBI()->next.  Store the number of traces to N_TRACES and description of
264*38fd1498Szrj    traces to TRACES.  */
265*38fd1498Szrj 
266*38fd1498Szrj static void
267*38fd1498Szrj find_traces (int *n_traces, struct trace *traces)
268*38fd1498Szrj {
269*38fd1498Szrj   int i;
270*38fd1498Szrj   int number_of_rounds;
271*38fd1498Szrj   edge e;
272*38fd1498Szrj   edge_iterator ei;
273*38fd1498Szrj   bb_heap_t *heap = new bb_heap_t (LONG_MIN);
274*38fd1498Szrj 
275*38fd1498Szrj   /* Add one extra round of trace collection when partitioning hot/cold
276*38fd1498Szrj      basic blocks into separate sections.  The last round is for all the
277*38fd1498Szrj      cold blocks (and ONLY the cold blocks).  */
278*38fd1498Szrj 
279*38fd1498Szrj   number_of_rounds = N_ROUNDS - 1;
280*38fd1498Szrj 
281*38fd1498Szrj   /* Insert entry points of function into heap.  */
282*38fd1498Szrj   max_entry_count = profile_count::zero ();
283*38fd1498Szrj   FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
284*38fd1498Szrj     {
285*38fd1498Szrj       bbd[e->dest->index].heap = heap;
286*38fd1498Szrj       bbd[e->dest->index].node = heap->insert (bb_to_key (e->dest), e->dest);
287*38fd1498Szrj       if (e->dest->count > max_entry_count)
288*38fd1498Szrj 	max_entry_count = e->dest->count;
289*38fd1498Szrj     }
290*38fd1498Szrj 
291*38fd1498Szrj   /* Find the traces.  */
292*38fd1498Szrj   for (i = 0; i < number_of_rounds; i++)
293*38fd1498Szrj     {
294*38fd1498Szrj       profile_count count_threshold;
295*38fd1498Szrj 
296*38fd1498Szrj       if (dump_file)
297*38fd1498Szrj 	fprintf (dump_file, "STC - round %d\n", i + 1);
298*38fd1498Szrj 
299*38fd1498Szrj       count_threshold = max_entry_count.apply_scale (exec_threshold[i], 1000);
300*38fd1498Szrj 
301*38fd1498Szrj       find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / 1000,
302*38fd1498Szrj 			   count_threshold, traces, n_traces, i, &heap,
303*38fd1498Szrj 			   number_of_rounds);
304*38fd1498Szrj     }
305*38fd1498Szrj   delete heap;
306*38fd1498Szrj 
307*38fd1498Szrj   if (dump_file)
308*38fd1498Szrj     {
309*38fd1498Szrj       for (i = 0; i < *n_traces; i++)
310*38fd1498Szrj 	{
311*38fd1498Szrj 	  basic_block bb;
312*38fd1498Szrj 	  fprintf (dump_file, "Trace %d (round %d):  ", i + 1,
313*38fd1498Szrj 		   traces[i].round + 1);
314*38fd1498Szrj 	  for (bb = traces[i].first;
315*38fd1498Szrj 	       bb != traces[i].last;
316*38fd1498Szrj 	       bb = (basic_block) bb->aux)
317*38fd1498Szrj 	    {
318*38fd1498Szrj 	      fprintf (dump_file, "%d [", bb->index);
319*38fd1498Szrj 	      bb->count.dump (dump_file);
320*38fd1498Szrj 	      fprintf (dump_file, "] ");
321*38fd1498Szrj 	    }
322*38fd1498Szrj 	  fprintf (dump_file, "%d [", bb->index);
323*38fd1498Szrj 	  bb->count.dump (dump_file);
324*38fd1498Szrj 	  fprintf (dump_file, "]\n");
325*38fd1498Szrj 	}
326*38fd1498Szrj       fflush (dump_file);
327*38fd1498Szrj     }
328*38fd1498Szrj }
329*38fd1498Szrj 
330*38fd1498Szrj /* Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE
331*38fd1498Szrj    (with sequential number TRACE_N).  */
332*38fd1498Szrj 
333*38fd1498Szrj static basic_block
334*38fd1498Szrj rotate_loop (edge back_edge, struct trace *trace, int trace_n)
335*38fd1498Szrj {
336*38fd1498Szrj   basic_block bb;
337*38fd1498Szrj 
338*38fd1498Szrj   /* Information about the best end (end after rotation) of the loop.  */
339*38fd1498Szrj   basic_block best_bb = NULL;
340*38fd1498Szrj   edge best_edge = NULL;
341*38fd1498Szrj   profile_count best_count = profile_count::uninitialized ();
342*38fd1498Szrj   /* The best edge is preferred when its destination is not visited yet
343*38fd1498Szrj      or is a start block of some trace.  */
344*38fd1498Szrj   bool is_preferred = false;
345*38fd1498Szrj 
346*38fd1498Szrj   /* Find the most frequent edge that goes out from current trace.  */
347*38fd1498Szrj   bb = back_edge->dest;
348*38fd1498Szrj   do
349*38fd1498Szrj     {
350*38fd1498Szrj       edge e;
351*38fd1498Szrj       edge_iterator ei;
352*38fd1498Szrj 
353*38fd1498Szrj       FOR_EACH_EDGE (e, ei, bb->succs)
354*38fd1498Szrj 	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
355*38fd1498Szrj 	    && bb_visited_trace (e->dest) != trace_n
356*38fd1498Szrj 	    && (e->flags & EDGE_CAN_FALLTHRU)
357*38fd1498Szrj 	    && !(e->flags & EDGE_COMPLEX))
358*38fd1498Szrj 	{
359*38fd1498Szrj 	  if (is_preferred)
360*38fd1498Szrj 	    {
361*38fd1498Szrj 	      /* The best edge is preferred.  */
362*38fd1498Szrj 	      if (!bb_visited_trace (e->dest)
363*38fd1498Szrj 		  || bbd[e->dest->index].start_of_trace >= 0)
364*38fd1498Szrj 		{
365*38fd1498Szrj 		  /* The current edge E is also preferred.  */
366*38fd1498Szrj 		  if (e->count () > best_count)
367*38fd1498Szrj 		    {
368*38fd1498Szrj 		      best_count = e->count ();
369*38fd1498Szrj 		      best_edge = e;
370*38fd1498Szrj 		      best_bb = bb;
371*38fd1498Szrj 		    }
372*38fd1498Szrj 		}
373*38fd1498Szrj 	    }
374*38fd1498Szrj 	  else
375*38fd1498Szrj 	    {
376*38fd1498Szrj 	      if (!bb_visited_trace (e->dest)
377*38fd1498Szrj 		  || bbd[e->dest->index].start_of_trace >= 0)
378*38fd1498Szrj 		{
379*38fd1498Szrj 		  /* The current edge E is preferred.  */
380*38fd1498Szrj 		  is_preferred = true;
381*38fd1498Szrj 		  best_count = e->count ();
382*38fd1498Szrj 		  best_edge = e;
383*38fd1498Szrj 		  best_bb = bb;
384*38fd1498Szrj 		}
385*38fd1498Szrj 	      else
386*38fd1498Szrj 		{
387*38fd1498Szrj 		  if (!best_edge || e->count () > best_count)
388*38fd1498Szrj 		    {
389*38fd1498Szrj 		      best_count = e->count ();
390*38fd1498Szrj 		      best_edge = e;
391*38fd1498Szrj 		      best_bb = bb;
392*38fd1498Szrj 		    }
393*38fd1498Szrj 		}
394*38fd1498Szrj 	    }
395*38fd1498Szrj 	}
396*38fd1498Szrj       bb = (basic_block) bb->aux;
397*38fd1498Szrj     }
398*38fd1498Szrj   while (bb != back_edge->dest);
399*38fd1498Szrj 
400*38fd1498Szrj   if (best_bb)
401*38fd1498Szrj     {
402*38fd1498Szrj       /* Rotate the loop so that the BEST_EDGE goes out from the last block of
403*38fd1498Szrj 	 the trace.  */
404*38fd1498Szrj       if (back_edge->dest == trace->first)
405*38fd1498Szrj 	{
406*38fd1498Szrj 	  trace->first = (basic_block) best_bb->aux;
407*38fd1498Szrj 	}
408*38fd1498Szrj       else
409*38fd1498Szrj 	{
410*38fd1498Szrj 	  basic_block prev_bb;
411*38fd1498Szrj 
412*38fd1498Szrj 	  for (prev_bb = trace->first;
413*38fd1498Szrj 	       prev_bb->aux != back_edge->dest;
414*38fd1498Szrj 	       prev_bb = (basic_block) prev_bb->aux)
415*38fd1498Szrj 	    ;
416*38fd1498Szrj 	  prev_bb->aux = best_bb->aux;
417*38fd1498Szrj 
418*38fd1498Szrj 	  /* Try to get rid of uncond jump to cond jump.  */
419*38fd1498Szrj 	  if (single_succ_p (prev_bb))
420*38fd1498Szrj 	    {
421*38fd1498Szrj 	      basic_block header = single_succ (prev_bb);
422*38fd1498Szrj 
423*38fd1498Szrj 	      /* Duplicate HEADER if it is a small block containing cond jump
424*38fd1498Szrj 		 in the end.  */
425*38fd1498Szrj 	      if (any_condjump_p (BB_END (header)) && copy_bb_p (header, 0)
426*38fd1498Szrj 		  && !CROSSING_JUMP_P (BB_END (header)))
427*38fd1498Szrj 		copy_bb (header, single_succ_edge (prev_bb), prev_bb, trace_n);
428*38fd1498Szrj 	    }
429*38fd1498Szrj 	}
430*38fd1498Szrj     }
431*38fd1498Szrj   else
432*38fd1498Szrj     {
433*38fd1498Szrj       /* We have not found suitable loop tail so do no rotation.  */
434*38fd1498Szrj       best_bb = back_edge->src;
435*38fd1498Szrj     }
436*38fd1498Szrj   best_bb->aux = NULL;
437*38fd1498Szrj   return best_bb;
438*38fd1498Szrj }
439*38fd1498Szrj 
440*38fd1498Szrj /* One round of finding traces.  Find traces for BRANCH_TH and EXEC_TH i.e. do
441*38fd1498Szrj    not include basic blocks whose probability is lower than BRANCH_TH or whose
442*38fd1498Szrj    count is lower than EXEC_TH into traces (or whose count is lower than
443*38fd1498Szrj    COUNT_TH).  Store the new traces into TRACES and modify the number of
444*38fd1498Szrj    traces *N_TRACES.  Set the round (which the trace belongs to) to ROUND.
445*38fd1498Szrj    The function expects starting basic blocks to be in *HEAP and will delete
446*38fd1498Szrj    *HEAP and store starting points for the next round into new *HEAP.  */
447*38fd1498Szrj 
448*38fd1498Szrj static void
449*38fd1498Szrj find_traces_1_round (int branch_th, profile_count count_th,
450*38fd1498Szrj 		     struct trace *traces, int *n_traces, int round,
451*38fd1498Szrj 		     bb_heap_t **heap, int number_of_rounds)
452*38fd1498Szrj {
453*38fd1498Szrj   /* Heap for discarded basic blocks which are possible starting points for
454*38fd1498Szrj      the next round.  */
455*38fd1498Szrj   bb_heap_t *new_heap = new bb_heap_t (LONG_MIN);
456*38fd1498Szrj   bool for_size = optimize_function_for_size_p (cfun);
457*38fd1498Szrj 
458*38fd1498Szrj   while (!(*heap)->empty ())
459*38fd1498Szrj     {
460*38fd1498Szrj       basic_block bb;
461*38fd1498Szrj       struct trace *trace;
462*38fd1498Szrj       edge best_edge, e;
463*38fd1498Szrj       long key;
464*38fd1498Szrj       edge_iterator ei;
465*38fd1498Szrj 
466*38fd1498Szrj       bb = (*heap)->extract_min ();
467*38fd1498Szrj       bbd[bb->index].heap = NULL;
468*38fd1498Szrj       bbd[bb->index].node = NULL;
469*38fd1498Szrj 
470*38fd1498Szrj       if (dump_file)
471*38fd1498Szrj 	fprintf (dump_file, "Getting bb %d\n", bb->index);
472*38fd1498Szrj 
473*38fd1498Szrj       /* If the BB's count is too low, send BB to the next round.  When
474*38fd1498Szrj 	 partitioning hot/cold blocks into separate sections, make sure all
475*38fd1498Szrj 	 the cold blocks (and ONLY the cold blocks) go into the (extra) final
476*38fd1498Szrj 	 round.  When optimizing for size, do not push to next round.  */
477*38fd1498Szrj 
478*38fd1498Szrj       if (!for_size
479*38fd1498Szrj 	  && push_to_next_round_p (bb, round, number_of_rounds,
480*38fd1498Szrj 				   count_th))
481*38fd1498Szrj 	{
482*38fd1498Szrj 	  int key = bb_to_key (bb);
483*38fd1498Szrj 	  bbd[bb->index].heap = new_heap;
484*38fd1498Szrj 	  bbd[bb->index].node = new_heap->insert (key, bb);
485*38fd1498Szrj 
486*38fd1498Szrj 	  if (dump_file)
487*38fd1498Szrj 	    fprintf (dump_file,
488*38fd1498Szrj 		     "  Possible start point of next round: %d (key: %d)\n",
489*38fd1498Szrj 		     bb->index, key);
490*38fd1498Szrj 	  continue;
491*38fd1498Szrj 	}
492*38fd1498Szrj 
493*38fd1498Szrj       trace = traces + *n_traces;
494*38fd1498Szrj       trace->first = bb;
495*38fd1498Szrj       trace->round = round;
496*38fd1498Szrj       trace->length = 0;
497*38fd1498Szrj       bbd[bb->index].in_trace = *n_traces;
498*38fd1498Szrj       (*n_traces)++;
499*38fd1498Szrj 
500*38fd1498Szrj       do
501*38fd1498Szrj 	{
502*38fd1498Szrj 	  bool ends_in_call;
503*38fd1498Szrj 
504*38fd1498Szrj 	  /* The probability and count of the best edge.  */
505*38fd1498Szrj 	  profile_probability best_prob = profile_probability::uninitialized ();
506*38fd1498Szrj 	  profile_count best_count = profile_count::uninitialized ();
507*38fd1498Szrj 
508*38fd1498Szrj 	  best_edge = NULL;
509*38fd1498Szrj 	  mark_bb_visited (bb, *n_traces);
510*38fd1498Szrj 	  trace->length++;
511*38fd1498Szrj 
512*38fd1498Szrj 	  if (dump_file)
513*38fd1498Szrj 	    fprintf (dump_file, "Basic block %d was visited in trace %d\n",
514*38fd1498Szrj 		     bb->index, *n_traces);
515*38fd1498Szrj 
516*38fd1498Szrj 	  ends_in_call = block_ends_with_call_p (bb);
517*38fd1498Szrj 
518*38fd1498Szrj 	  /* Select the successor that will be placed after BB.  */
519*38fd1498Szrj 	  FOR_EACH_EDGE (e, ei, bb->succs)
520*38fd1498Szrj 	    {
521*38fd1498Szrj 	      gcc_assert (!(e->flags & EDGE_FAKE));
522*38fd1498Szrj 
523*38fd1498Szrj 	      if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
524*38fd1498Szrj 		continue;
525*38fd1498Szrj 
526*38fd1498Szrj 	      if (bb_visited_trace (e->dest)
527*38fd1498Szrj 		  && bb_visited_trace (e->dest) != *n_traces)
528*38fd1498Szrj 		continue;
529*38fd1498Szrj 
530*38fd1498Szrj 	      /* If partitioning hot/cold basic blocks, don't consider edges
531*38fd1498Szrj 		 that cross section boundaries.  */
532*38fd1498Szrj 	      if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
533*38fd1498Szrj 		continue;
534*38fd1498Szrj 
535*38fd1498Szrj 	      profile_probability prob = e->probability;
536*38fd1498Szrj 	      profile_count count = e->dest->count;
537*38fd1498Szrj 
538*38fd1498Szrj 	      /* The only sensible preference for a call instruction is the
539*38fd1498Szrj 		 fallthru edge.  Don't bother selecting anything else.  */
540*38fd1498Szrj 	      if (ends_in_call)
541*38fd1498Szrj 		{
542*38fd1498Szrj 		  if (e->flags & EDGE_CAN_FALLTHRU)
543*38fd1498Szrj 		    {
544*38fd1498Szrj 		      best_edge = e;
545*38fd1498Szrj 		      best_prob = prob;
546*38fd1498Szrj 		      best_count = count;
547*38fd1498Szrj 		    }
548*38fd1498Szrj 		  continue;
549*38fd1498Szrj 		}
550*38fd1498Szrj 
551*38fd1498Szrj 	      /* Edge that cannot be fallthru or improbable or infrequent
552*38fd1498Szrj 		 successor (i.e. it is unsuitable successor).  When optimizing
553*38fd1498Szrj 		 for size, ignore the probability and count.  */
554*38fd1498Szrj 	      if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX)
555*38fd1498Szrj 		  || !prob.initialized_p ()
556*38fd1498Szrj 		  || ((prob.to_reg_br_prob_base () < branch_th
557*38fd1498Szrj 		      || e->count () < count_th) && (!for_size)))
558*38fd1498Szrj 		continue;
559*38fd1498Szrj 
560*38fd1498Szrj 	      if (better_edge_p (bb, e, prob, count, best_prob, best_count,
561*38fd1498Szrj 				 best_edge))
562*38fd1498Szrj 		{
563*38fd1498Szrj 		  best_edge = e;
564*38fd1498Szrj 		  best_prob = prob;
565*38fd1498Szrj 		  best_count = count;
566*38fd1498Szrj 		}
567*38fd1498Szrj 	    }
568*38fd1498Szrj 
569*38fd1498Szrj 	  /* If the best destination has multiple predecessors and can be
570*38fd1498Szrj 	     duplicated cheaper than a jump, don't allow it to be added to
571*38fd1498Szrj 	     a trace; we'll duplicate it when connecting the traces later.
572*38fd1498Szrj 	     However, we need to check that this duplication wouldn't leave
573*38fd1498Szrj 	     the best destination with only crossing predecessors, because
574*38fd1498Szrj 	     this would change its effective partition from hot to cold.  */
575*38fd1498Szrj 	  if (best_edge
576*38fd1498Szrj 	      && EDGE_COUNT (best_edge->dest->preds) >= 2
577*38fd1498Szrj 	      && copy_bb_p (best_edge->dest, 0))
578*38fd1498Szrj 	    {
579*38fd1498Szrj 	      bool only_crossing_preds = true;
580*38fd1498Szrj 	      edge e;
581*38fd1498Szrj 	      edge_iterator ei;
582*38fd1498Szrj 	      FOR_EACH_EDGE (e, ei, best_edge->dest->preds)
583*38fd1498Szrj 		if (e != best_edge && !(e->flags & EDGE_CROSSING))
584*38fd1498Szrj 		  {
585*38fd1498Szrj 		    only_crossing_preds = false;
586*38fd1498Szrj 		    break;
587*38fd1498Szrj 		  }
588*38fd1498Szrj 	      if (!only_crossing_preds)
589*38fd1498Szrj 		best_edge = NULL;
590*38fd1498Szrj 	    }
591*38fd1498Szrj 
592*38fd1498Szrj 	  /* If the best destination has multiple successors or predecessors,
593*38fd1498Szrj 	     don't allow it to be added when optimizing for size.  This makes
594*38fd1498Szrj 	     sure predecessors with smaller index are handled before the best
595*38fd1498Szrj 	     destinarion.  It breaks long trace and reduces long jumps.
596*38fd1498Szrj 
597*38fd1498Szrj 	     Take if-then-else as an example.
598*38fd1498Szrj 		A
599*38fd1498Szrj 	       / \
600*38fd1498Szrj 	      B   C
601*38fd1498Szrj 	       \ /
602*38fd1498Szrj 		D
603*38fd1498Szrj 	     If we do not remove the best edge B->D/C->D, the final order might
604*38fd1498Szrj 	     be A B D ... C.  C is at the end of the program.  If D's successors
605*38fd1498Szrj 	     and D are complicated, might need long jumps for A->C and C->D.
606*38fd1498Szrj 	     Similar issue for order: A C D ... B.
607*38fd1498Szrj 
608*38fd1498Szrj 	     After removing the best edge, the final result will be ABCD/ ACBD.
609*38fd1498Szrj 	     It does not add jump compared with the previous order.  But it
610*38fd1498Szrj 	     reduces the possibility of long jumps.  */
611*38fd1498Szrj 	  if (best_edge && for_size
612*38fd1498Szrj 	      && (EDGE_COUNT (best_edge->dest->succs) > 1
613*38fd1498Szrj 		 || EDGE_COUNT (best_edge->dest->preds) > 1))
614*38fd1498Szrj 	    best_edge = NULL;
615*38fd1498Szrj 
616*38fd1498Szrj 	  /* Add all non-selected successors to the heaps.  */
617*38fd1498Szrj 	  FOR_EACH_EDGE (e, ei, bb->succs)
618*38fd1498Szrj 	    {
619*38fd1498Szrj 	      if (e == best_edge
620*38fd1498Szrj 		  || e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
621*38fd1498Szrj 		  || bb_visited_trace (e->dest))
622*38fd1498Szrj 		continue;
623*38fd1498Szrj 
624*38fd1498Szrj 	      key = bb_to_key (e->dest);
625*38fd1498Szrj 
626*38fd1498Szrj 	      if (bbd[e->dest->index].heap)
627*38fd1498Szrj 		{
628*38fd1498Szrj 		  /* E->DEST is already in some heap.  */
629*38fd1498Szrj 		  if (key != bbd[e->dest->index].node->get_key ())
630*38fd1498Szrj 		    {
631*38fd1498Szrj 		      if (dump_file)
632*38fd1498Szrj 			{
633*38fd1498Szrj 			  fprintf (dump_file,
634*38fd1498Szrj 				   "Changing key for bb %d from %ld to %ld.\n",
635*38fd1498Szrj 				   e->dest->index,
636*38fd1498Szrj 				   (long) bbd[e->dest->index].node->get_key (),
637*38fd1498Szrj 				   key);
638*38fd1498Szrj 			}
639*38fd1498Szrj 		      bbd[e->dest->index].heap->replace_key
640*38fd1498Szrj 		        (bbd[e->dest->index].node, key);
641*38fd1498Szrj 		    }
642*38fd1498Szrj 		}
643*38fd1498Szrj 	      else
644*38fd1498Szrj 		{
645*38fd1498Szrj 		  bb_heap_t *which_heap = *heap;
646*38fd1498Szrj 
647*38fd1498Szrj 		  profile_probability prob = e->probability;
648*38fd1498Szrj 
649*38fd1498Szrj 		  if (!(e->flags & EDGE_CAN_FALLTHRU)
650*38fd1498Szrj 		      || (e->flags & EDGE_COMPLEX)
651*38fd1498Szrj 		      || !prob.initialized_p ()
652*38fd1498Szrj 		      || prob.to_reg_br_prob_base () < branch_th
653*38fd1498Szrj 		      || e->count () < count_th)
654*38fd1498Szrj 		    {
655*38fd1498Szrj 		      /* When partitioning hot/cold basic blocks, make sure
656*38fd1498Szrj 			 the cold blocks (and only the cold blocks) all get
657*38fd1498Szrj 			 pushed to the last round of trace collection.  When
658*38fd1498Szrj 			 optimizing for size, do not push to next round.  */
659*38fd1498Szrj 
660*38fd1498Szrj 		      if (!for_size && push_to_next_round_p (e->dest, round,
661*38fd1498Szrj 							     number_of_rounds,
662*38fd1498Szrj 							     count_th))
663*38fd1498Szrj 			which_heap = new_heap;
664*38fd1498Szrj 		    }
665*38fd1498Szrj 
666*38fd1498Szrj 		  bbd[e->dest->index].heap = which_heap;
667*38fd1498Szrj 		  bbd[e->dest->index].node = which_heap->insert (key, e->dest);
668*38fd1498Szrj 
669*38fd1498Szrj 		  if (dump_file)
670*38fd1498Szrj 		    {
671*38fd1498Szrj 		      fprintf (dump_file,
672*38fd1498Szrj 			       "  Possible start of %s round: %d (key: %ld)\n",
673*38fd1498Szrj 			       (which_heap == new_heap) ? "next" : "this",
674*38fd1498Szrj 			       e->dest->index, (long) key);
675*38fd1498Szrj 		    }
676*38fd1498Szrj 
677*38fd1498Szrj 		}
678*38fd1498Szrj 	    }
679*38fd1498Szrj 
680*38fd1498Szrj 	  if (best_edge) /* Suitable successor was found.  */
681*38fd1498Szrj 	    {
682*38fd1498Szrj 	      if (bb_visited_trace (best_edge->dest) == *n_traces)
683*38fd1498Szrj 		{
684*38fd1498Szrj 		  /* We do nothing with one basic block loops.  */
685*38fd1498Szrj 		  if (best_edge->dest != bb)
686*38fd1498Szrj 		    {
687*38fd1498Szrj 		      if (best_edge->count ()
688*38fd1498Szrj 			  > best_edge->dest->count.apply_scale (4, 5))
689*38fd1498Szrj 			{
690*38fd1498Szrj 			  /* The loop has at least 4 iterations.  If the loop
691*38fd1498Szrj 			     header is not the first block of the function
692*38fd1498Szrj 			     we can rotate the loop.  */
693*38fd1498Szrj 
694*38fd1498Szrj 			  if (best_edge->dest
695*38fd1498Szrj 			      != ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
696*38fd1498Szrj 			    {
697*38fd1498Szrj 			      if (dump_file)
698*38fd1498Szrj 				{
699*38fd1498Szrj 				  fprintf (dump_file,
700*38fd1498Szrj 					   "Rotating loop %d - %d\n",
701*38fd1498Szrj 					   best_edge->dest->index, bb->index);
702*38fd1498Szrj 				}
703*38fd1498Szrj 			      bb->aux = best_edge->dest;
704*38fd1498Szrj 			      bbd[best_edge->dest->index].in_trace =
705*38fd1498Szrj 							     (*n_traces) - 1;
706*38fd1498Szrj 			      bb = rotate_loop (best_edge, trace, *n_traces);
707*38fd1498Szrj 			    }
708*38fd1498Szrj 			}
709*38fd1498Szrj 		      else
710*38fd1498Szrj 			{
711*38fd1498Szrj 			  /* The loop has less than 4 iterations.  */
712*38fd1498Szrj 
713*38fd1498Szrj 			  if (single_succ_p (bb)
714*38fd1498Szrj 			      && copy_bb_p (best_edge->dest,
715*38fd1498Szrj 			      		    optimize_edge_for_speed_p
716*38fd1498Szrj 			      		    (best_edge)))
717*38fd1498Szrj 			    {
718*38fd1498Szrj 			      bb = copy_bb (best_edge->dest, best_edge, bb,
719*38fd1498Szrj 					    *n_traces);
720*38fd1498Szrj 			      trace->length++;
721*38fd1498Szrj 			    }
722*38fd1498Szrj 			}
723*38fd1498Szrj 		    }
724*38fd1498Szrj 
725*38fd1498Szrj 		  /* Terminate the trace.  */
726*38fd1498Szrj 		  break;
727*38fd1498Szrj 		}
728*38fd1498Szrj 	      else
729*38fd1498Szrj 		{
730*38fd1498Szrj 		  /* Check for a situation
731*38fd1498Szrj 
732*38fd1498Szrj 		    A
733*38fd1498Szrj 		   /|
734*38fd1498Szrj 		  B |
735*38fd1498Szrj 		   \|
736*38fd1498Szrj 		    C
737*38fd1498Szrj 
738*38fd1498Szrj 		  where
739*38fd1498Szrj 		  AB->count () + BC->count () >= AC->count ().
740*38fd1498Szrj 		  (i.e. 2 * B->count >= AC->count )
741*38fd1498Szrj 		  Best ordering is then A B C.
742*38fd1498Szrj 
743*38fd1498Szrj 		  When optimizing for size, A B C is always the best order.
744*38fd1498Szrj 
745*38fd1498Szrj 		  This situation is created for example by:
746*38fd1498Szrj 
747*38fd1498Szrj 		  if (A) B;
748*38fd1498Szrj 		  C;
749*38fd1498Szrj 
750*38fd1498Szrj 		  */
751*38fd1498Szrj 
752*38fd1498Szrj 		  FOR_EACH_EDGE (e, ei, bb->succs)
753*38fd1498Szrj 		    if (e != best_edge
754*38fd1498Szrj 			&& (e->flags & EDGE_CAN_FALLTHRU)
755*38fd1498Szrj 			&& !(e->flags & EDGE_COMPLEX)
756*38fd1498Szrj 			&& !bb_visited_trace (e->dest)
757*38fd1498Szrj 			&& single_pred_p (e->dest)
758*38fd1498Szrj 			&& !(e->flags & EDGE_CROSSING)
759*38fd1498Szrj 			&& single_succ_p (e->dest)
760*38fd1498Szrj 			&& (single_succ_edge (e->dest)->flags
761*38fd1498Szrj 			    & EDGE_CAN_FALLTHRU)
762*38fd1498Szrj 			&& !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX)
763*38fd1498Szrj 			&& single_succ (e->dest) == best_edge->dest
764*38fd1498Szrj 			&& (e->dest->count.apply_scale (2, 1)
765*38fd1498Szrj 			    >= best_edge->count () || for_size))
766*38fd1498Szrj 		      {
767*38fd1498Szrj 			best_edge = e;
768*38fd1498Szrj 			if (dump_file)
769*38fd1498Szrj 			  fprintf (dump_file, "Selecting BB %d\n",
770*38fd1498Szrj 				   best_edge->dest->index);
771*38fd1498Szrj 			break;
772*38fd1498Szrj 		      }
773*38fd1498Szrj 
774*38fd1498Szrj 		  bb->aux = best_edge->dest;
775*38fd1498Szrj 		  bbd[best_edge->dest->index].in_trace = (*n_traces) - 1;
776*38fd1498Szrj 		  bb = best_edge->dest;
777*38fd1498Szrj 		}
778*38fd1498Szrj 	    }
779*38fd1498Szrj 	}
780*38fd1498Szrj       while (best_edge);
781*38fd1498Szrj       trace->last = bb;
782*38fd1498Szrj       bbd[trace->first->index].start_of_trace = *n_traces - 1;
783*38fd1498Szrj       if (bbd[trace->last->index].end_of_trace != *n_traces - 1)
784*38fd1498Szrj 	{
785*38fd1498Szrj 	  bbd[trace->last->index].end_of_trace = *n_traces - 1;
786*38fd1498Szrj 	  /* Update the cached maximum frequency for interesting predecessor
787*38fd1498Szrj 	     edges for successors of the new trace end.  */
788*38fd1498Szrj 	  FOR_EACH_EDGE (e, ei, trace->last->succs)
789*38fd1498Szrj 	    if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority)
790*38fd1498Szrj 	      bbd[e->dest->index].priority = EDGE_FREQUENCY (e);
791*38fd1498Szrj 	}
792*38fd1498Szrj 
793*38fd1498Szrj       /* The trace is terminated so we have to recount the keys in heap
794*38fd1498Szrj 	 (some block can have a lower key because now one of its predecessors
795*38fd1498Szrj 	 is an end of the trace).  */
796*38fd1498Szrj       FOR_EACH_EDGE (e, ei, bb->succs)
797*38fd1498Szrj 	{
798*38fd1498Szrj 	  if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
799*38fd1498Szrj 	      || bb_visited_trace (e->dest))
800*38fd1498Szrj 	    continue;
801*38fd1498Szrj 
802*38fd1498Szrj 	  if (bbd[e->dest->index].heap)
803*38fd1498Szrj 	    {
804*38fd1498Szrj 	      key = bb_to_key (e->dest);
805*38fd1498Szrj 	      if (key != bbd[e->dest->index].node->get_key ())
806*38fd1498Szrj 		{
807*38fd1498Szrj 		  if (dump_file)
808*38fd1498Szrj 		    {
809*38fd1498Szrj 		      fprintf (dump_file,
810*38fd1498Szrj 			       "Changing key for bb %d from %ld to %ld.\n",
811*38fd1498Szrj 			       e->dest->index,
812*38fd1498Szrj 			       (long) bbd[e->dest->index].node->get_key (), key);
813*38fd1498Szrj 		    }
814*38fd1498Szrj 		  bbd[e->dest->index].heap->replace_key
815*38fd1498Szrj 		    (bbd[e->dest->index].node, key);
816*38fd1498Szrj 		}
817*38fd1498Szrj 	    }
818*38fd1498Szrj 	}
819*38fd1498Szrj     }
820*38fd1498Szrj 
821*38fd1498Szrj   delete (*heap);
822*38fd1498Szrj 
823*38fd1498Szrj   /* "Return" the new heap.  */
824*38fd1498Szrj   *heap = new_heap;
825*38fd1498Szrj }
826*38fd1498Szrj 
827*38fd1498Szrj /* Create a duplicate of the basic block OLD_BB and redirect edge E to it, add
828*38fd1498Szrj    it to trace after BB, mark OLD_BB visited and update pass' data structures
829*38fd1498Szrj    (TRACE is a number of trace which OLD_BB is duplicated to).  */
830*38fd1498Szrj 
831*38fd1498Szrj static basic_block
832*38fd1498Szrj copy_bb (basic_block old_bb, edge e, basic_block bb, int trace)
833*38fd1498Szrj {
834*38fd1498Szrj   basic_block new_bb;
835*38fd1498Szrj 
836*38fd1498Szrj   new_bb = duplicate_block (old_bb, e, bb);
837*38fd1498Szrj   BB_COPY_PARTITION (new_bb, old_bb);
838*38fd1498Szrj 
839*38fd1498Szrj   gcc_assert (e->dest == new_bb);
840*38fd1498Szrj 
841*38fd1498Szrj   if (dump_file)
842*38fd1498Szrj     fprintf (dump_file,
843*38fd1498Szrj 	     "Duplicated bb %d (created bb %d)\n",
844*38fd1498Szrj 	     old_bb->index, new_bb->index);
845*38fd1498Szrj 
846*38fd1498Szrj   if (new_bb->index >= array_size
847*38fd1498Szrj       || last_basic_block_for_fn (cfun) > array_size)
848*38fd1498Szrj     {
849*38fd1498Szrj       int i;
850*38fd1498Szrj       int new_size;
851*38fd1498Szrj 
852*38fd1498Szrj       new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + 1);
853*38fd1498Szrj       new_size = GET_ARRAY_SIZE (new_size);
854*38fd1498Szrj       bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size);
855*38fd1498Szrj       for (i = array_size; i < new_size; i++)
856*38fd1498Szrj 	{
857*38fd1498Szrj 	  bbd[i].start_of_trace = -1;
858*38fd1498Szrj 	  bbd[i].end_of_trace = -1;
859*38fd1498Szrj 	  bbd[i].in_trace = -1;
860*38fd1498Szrj 	  bbd[i].visited = 0;
861*38fd1498Szrj 	  bbd[i].priority = -1;
862*38fd1498Szrj 	  bbd[i].heap = NULL;
863*38fd1498Szrj 	  bbd[i].node = NULL;
864*38fd1498Szrj 	}
865*38fd1498Szrj       array_size = new_size;
866*38fd1498Szrj 
867*38fd1498Szrj       if (dump_file)
868*38fd1498Szrj 	{
869*38fd1498Szrj 	  fprintf (dump_file,
870*38fd1498Szrj 		   "Growing the dynamic array to %d elements.\n",
871*38fd1498Szrj 		   array_size);
872*38fd1498Szrj 	}
873*38fd1498Szrj     }
874*38fd1498Szrj 
875*38fd1498Szrj   gcc_assert (!bb_visited_trace (e->dest));
876*38fd1498Szrj   mark_bb_visited (new_bb, trace);
877*38fd1498Szrj   new_bb->aux = bb->aux;
878*38fd1498Szrj   bb->aux = new_bb;
879*38fd1498Szrj 
880*38fd1498Szrj   bbd[new_bb->index].in_trace = trace;
881*38fd1498Szrj 
882*38fd1498Szrj   return new_bb;
883*38fd1498Szrj }
884*38fd1498Szrj 
885*38fd1498Szrj /* Compute and return the key (for the heap) of the basic block BB.  */
886*38fd1498Szrj 
887*38fd1498Szrj static long
888*38fd1498Szrj bb_to_key (basic_block bb)
889*38fd1498Szrj {
890*38fd1498Szrj   edge e;
891*38fd1498Szrj   edge_iterator ei;
892*38fd1498Szrj 
893*38fd1498Szrj   /* Use index as key to align with its original order.  */
894*38fd1498Szrj   if (optimize_function_for_size_p (cfun))
895*38fd1498Szrj     return bb->index;
896*38fd1498Szrj 
897*38fd1498Szrj   /* Do not start in probably never executed blocks.  */
898*38fd1498Szrj 
899*38fd1498Szrj   if (BB_PARTITION (bb) == BB_COLD_PARTITION
900*38fd1498Szrj       || probably_never_executed_bb_p (cfun, bb))
901*38fd1498Szrj     return BB_FREQ_MAX;
902*38fd1498Szrj 
903*38fd1498Szrj   /* Prefer blocks whose predecessor is an end of some trace
904*38fd1498Szrj      or whose predecessor edge is EDGE_DFS_BACK.  */
905*38fd1498Szrj   int priority = bbd[bb->index].priority;
906*38fd1498Szrj   if (priority == -1)
907*38fd1498Szrj     {
908*38fd1498Szrj       priority = 0;
909*38fd1498Szrj       FOR_EACH_EDGE (e, ei, bb->preds)
910*38fd1498Szrj 	{
911*38fd1498Szrj 	  if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
912*38fd1498Szrj 	       && bbd[e->src->index].end_of_trace >= 0)
913*38fd1498Szrj 	      || (e->flags & EDGE_DFS_BACK))
914*38fd1498Szrj 	    {
915*38fd1498Szrj 	      int edge_freq = EDGE_FREQUENCY (e);
916*38fd1498Szrj 
917*38fd1498Szrj 	      if (edge_freq > priority)
918*38fd1498Szrj 		priority = edge_freq;
919*38fd1498Szrj 	    }
920*38fd1498Szrj 	}
921*38fd1498Szrj       bbd[bb->index].priority = priority;
922*38fd1498Szrj     }
923*38fd1498Szrj 
924*38fd1498Szrj   if (priority)
925*38fd1498Szrj     /* The block with priority should have significantly lower key.  */
926*38fd1498Szrj     return -(100 * BB_FREQ_MAX + 100 * priority + bb->count.to_frequency (cfun));
927*38fd1498Szrj 
928*38fd1498Szrj   return -bb->count.to_frequency (cfun);
929*38fd1498Szrj }
930*38fd1498Szrj 
931*38fd1498Szrj /* Return true when the edge E from basic block BB is better than the temporary
932*38fd1498Szrj    best edge (details are in function).  The probability of edge E is PROB. The
933*38fd1498Szrj    count of the successor is COUNT.  The current best probability is
934*38fd1498Szrj    BEST_PROB, the best count is BEST_COUNT.
935*38fd1498Szrj    The edge is considered to be equivalent when PROB does not differ much from
936*38fd1498Szrj    BEST_PROB; similarly for count.  */
937*38fd1498Szrj 
938*38fd1498Szrj static bool
939*38fd1498Szrj better_edge_p (const_basic_block bb, const_edge e, profile_probability prob,
940*38fd1498Szrj 	       profile_count count, profile_probability best_prob,
941*38fd1498Szrj 	       profile_count best_count, const_edge cur_best_edge)
942*38fd1498Szrj {
943*38fd1498Szrj   bool is_better_edge;
944*38fd1498Szrj 
945*38fd1498Szrj   /* The BEST_* values do not have to be best, but can be a bit smaller than
946*38fd1498Szrj      maximum values.  */
947*38fd1498Szrj   profile_probability diff_prob = best_prob.apply_scale (1, 10);
948*38fd1498Szrj 
949*38fd1498Szrj   /* The smaller one is better to keep the original order.  */
950*38fd1498Szrj   if (optimize_function_for_size_p (cfun))
951*38fd1498Szrj     return !cur_best_edge
952*38fd1498Szrj 	   || cur_best_edge->dest->index > e->dest->index;
953*38fd1498Szrj 
954*38fd1498Szrj   /* Those edges are so expensive that continuing a trace is not useful
955*38fd1498Szrj      performance wise.  */
956*38fd1498Szrj   if (e->flags & (EDGE_ABNORMAL | EDGE_EH))
957*38fd1498Szrj     return false;
958*38fd1498Szrj 
959*38fd1498Szrj   if (prob > best_prob + diff_prob
960*38fd1498Szrj       || (!best_prob.initialized_p ()
961*38fd1498Szrj 	  && prob > profile_probability::guessed_never ()))
962*38fd1498Szrj     /* The edge has higher probability than the temporary best edge.  */
963*38fd1498Szrj     is_better_edge = true;
964*38fd1498Szrj   else if (prob < best_prob - diff_prob)
965*38fd1498Szrj     /* The edge has lower probability than the temporary best edge.  */
966*38fd1498Szrj     is_better_edge = false;
967*38fd1498Szrj   else
968*38fd1498Szrj     {
969*38fd1498Szrj       profile_count diff_count = best_count.apply_scale (1, 10);
970*38fd1498Szrj       if (count < best_count - diff_count
971*38fd1498Szrj 	  || (!best_count.initialized_p ()
972*38fd1498Szrj 	      && count.nonzero_p ()))
973*38fd1498Szrj 	/* The edge and the temporary best edge  have almost equivalent
974*38fd1498Szrj 	   probabilities.  The higher countuency of a successor now means
975*38fd1498Szrj 	   that there is another edge going into that successor.
976*38fd1498Szrj 	   This successor has lower countuency so it is better.  */
977*38fd1498Szrj 	is_better_edge = true;
978*38fd1498Szrj       else if (count > best_count + diff_count)
979*38fd1498Szrj 	/* This successor has higher countuency so it is worse.  */
980*38fd1498Szrj 	is_better_edge = false;
981*38fd1498Szrj       else if (e->dest->prev_bb == bb)
982*38fd1498Szrj 	/* The edges have equivalent probabilities and the successors
983*38fd1498Szrj 	   have equivalent frequencies.  Select the previous successor.  */
984*38fd1498Szrj 	is_better_edge = true;
985*38fd1498Szrj       else
986*38fd1498Szrj 	is_better_edge = false;
987*38fd1498Szrj     }
988*38fd1498Szrj 
989*38fd1498Szrj   return is_better_edge;
990*38fd1498Szrj }
991*38fd1498Szrj 
992*38fd1498Szrj /* Return true when the edge E is better than the temporary best edge
993*38fd1498Szrj    CUR_BEST_EDGE.  If SRC_INDEX_P is true, the function compares the src bb of
994*38fd1498Szrj    E and CUR_BEST_EDGE; otherwise it will compare the dest bb.
995*38fd1498Szrj    BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE.
996*38fd1498Szrj    TRACES record the information about traces.
997*38fd1498Szrj    When optimizing for size, the edge with smaller index is better.
998*38fd1498Szrj    When optimizing for speed, the edge with bigger probability or longer trace
999*38fd1498Szrj    is better.  */
1000*38fd1498Szrj 
1001*38fd1498Szrj static bool
1002*38fd1498Szrj connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
1003*38fd1498Szrj 		       const_edge cur_best_edge, struct trace *traces)
1004*38fd1498Szrj {
1005*38fd1498Szrj   int e_index;
1006*38fd1498Szrj   int b_index;
1007*38fd1498Szrj   bool is_better_edge;
1008*38fd1498Szrj 
1009*38fd1498Szrj   if (!cur_best_edge)
1010*38fd1498Szrj     return true;
1011*38fd1498Szrj 
1012*38fd1498Szrj   if (optimize_function_for_size_p (cfun))
1013*38fd1498Szrj     {
1014*38fd1498Szrj       e_index = src_index_p ? e->src->index : e->dest->index;
1015*38fd1498Szrj       b_index = src_index_p ? cur_best_edge->src->index
1016*38fd1498Szrj 			      : cur_best_edge->dest->index;
1017*38fd1498Szrj       /* The smaller one is better to keep the original order.  */
1018*38fd1498Szrj       return b_index > e_index;
1019*38fd1498Szrj     }
1020*38fd1498Szrj 
1021*38fd1498Szrj   if (src_index_p)
1022*38fd1498Szrj     {
1023*38fd1498Szrj       e_index = e->src->index;
1024*38fd1498Szrj 
1025*38fd1498Szrj       /* We are looking for predecessor, so probabilities are not that
1026*38fd1498Szrj 	 informative.  We do not want to connect A to B becuse A has
1027*38fd1498Szrj 	 only one sucessor (probablity is 100%) while there is edge
1028*38fd1498Szrj 	 A' to B where probability is 90% but which is much more frequent.  */
1029*38fd1498Szrj       if (e->count () > cur_best_edge->count ())
1030*38fd1498Szrj 	/* The edge has higher probability than the temporary best edge.  */
1031*38fd1498Szrj 	is_better_edge = true;
1032*38fd1498Szrj       else if (e->count () < cur_best_edge->count ())
1033*38fd1498Szrj 	/* The edge has lower probability than the temporary best edge.  */
1034*38fd1498Szrj 	is_better_edge = false;
1035*38fd1498Szrj       if (e->probability > cur_best_edge->probability)
1036*38fd1498Szrj 	/* The edge has higher probability than the temporary best edge.  */
1037*38fd1498Szrj 	is_better_edge = true;
1038*38fd1498Szrj       else if (e->probability < cur_best_edge->probability)
1039*38fd1498Szrj 	/* The edge has lower probability than the temporary best edge.  */
1040*38fd1498Szrj 	is_better_edge = false;
1041*38fd1498Szrj       else if (traces[bbd[e_index].end_of_trace].length > best_len)
1042*38fd1498Szrj 	/* The edge and the temporary best edge have equivalent probabilities.
1043*38fd1498Szrj 	   The edge with longer trace is better.  */
1044*38fd1498Szrj 	is_better_edge = true;
1045*38fd1498Szrj       else
1046*38fd1498Szrj 	is_better_edge = false;
1047*38fd1498Szrj     }
1048*38fd1498Szrj   else
1049*38fd1498Szrj     {
1050*38fd1498Szrj       e_index = e->dest->index;
1051*38fd1498Szrj 
1052*38fd1498Szrj       if (e->probability > cur_best_edge->probability)
1053*38fd1498Szrj 	/* The edge has higher probability than the temporary best edge.  */
1054*38fd1498Szrj 	is_better_edge = true;
1055*38fd1498Szrj       else if (e->probability < cur_best_edge->probability)
1056*38fd1498Szrj 	/* The edge has lower probability than the temporary best edge.  */
1057*38fd1498Szrj 	is_better_edge = false;
1058*38fd1498Szrj       else if (traces[bbd[e_index].start_of_trace].length > best_len)
1059*38fd1498Szrj 	/* The edge and the temporary best edge have equivalent probabilities.
1060*38fd1498Szrj 	   The edge with longer trace is better.  */
1061*38fd1498Szrj 	is_better_edge = true;
1062*38fd1498Szrj       else
1063*38fd1498Szrj 	is_better_edge = false;
1064*38fd1498Szrj     }
1065*38fd1498Szrj 
1066*38fd1498Szrj   return is_better_edge;
1067*38fd1498Szrj }
1068*38fd1498Szrj 
1069*38fd1498Szrj /* Connect traces in array TRACES, N_TRACES is the count of traces.  */
1070*38fd1498Szrj 
1071*38fd1498Szrj static void
1072*38fd1498Szrj connect_traces (int n_traces, struct trace *traces)
1073*38fd1498Szrj {
1074*38fd1498Szrj   int i;
1075*38fd1498Szrj   bool *connected;
1076*38fd1498Szrj   bool two_passes;
1077*38fd1498Szrj   int last_trace;
1078*38fd1498Szrj   int current_pass;
1079*38fd1498Szrj   int current_partition;
1080*38fd1498Szrj   profile_count count_threshold;
1081*38fd1498Szrj   bool for_size = optimize_function_for_size_p (cfun);
1082*38fd1498Szrj 
1083*38fd1498Szrj   count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, 1000);
1084*38fd1498Szrj 
1085*38fd1498Szrj   connected = XCNEWVEC (bool, n_traces);
1086*38fd1498Szrj   last_trace = -1;
1087*38fd1498Szrj   current_pass = 1;
1088*38fd1498Szrj   current_partition = BB_PARTITION (traces[0].first);
1089*38fd1498Szrj   two_passes = false;
1090*38fd1498Szrj 
1091*38fd1498Szrj   if (crtl->has_bb_partition)
1092*38fd1498Szrj     for (i = 0; i < n_traces && !two_passes; i++)
1093*38fd1498Szrj       if (BB_PARTITION (traces[0].first)
1094*38fd1498Szrj 	  != BB_PARTITION (traces[i].first))
1095*38fd1498Szrj 	two_passes = true;
1096*38fd1498Szrj 
1097*38fd1498Szrj   for (i = 0; i < n_traces || (two_passes && current_pass == 1) ; i++)
1098*38fd1498Szrj     {
1099*38fd1498Szrj       int t = i;
1100*38fd1498Szrj       int t2;
1101*38fd1498Szrj       edge e, best;
1102*38fd1498Szrj       int best_len;
1103*38fd1498Szrj 
1104*38fd1498Szrj       if (i >= n_traces)
1105*38fd1498Szrj 	{
1106*38fd1498Szrj 	  gcc_assert (two_passes && current_pass == 1);
1107*38fd1498Szrj 	  i = 0;
1108*38fd1498Szrj 	  t = i;
1109*38fd1498Szrj 	  current_pass = 2;
1110*38fd1498Szrj 	  if (current_partition == BB_HOT_PARTITION)
1111*38fd1498Szrj 	    current_partition = BB_COLD_PARTITION;
1112*38fd1498Szrj 	  else
1113*38fd1498Szrj 	    current_partition = BB_HOT_PARTITION;
1114*38fd1498Szrj 	}
1115*38fd1498Szrj 
1116*38fd1498Szrj       if (connected[t])
1117*38fd1498Szrj 	continue;
1118*38fd1498Szrj 
1119*38fd1498Szrj       if (two_passes
1120*38fd1498Szrj 	  && BB_PARTITION (traces[t].first) != current_partition)
1121*38fd1498Szrj 	continue;
1122*38fd1498Szrj 
1123*38fd1498Szrj       connected[t] = true;
1124*38fd1498Szrj 
1125*38fd1498Szrj       /* Find the predecessor traces.  */
1126*38fd1498Szrj       for (t2 = t; t2 > 0;)
1127*38fd1498Szrj 	{
1128*38fd1498Szrj 	  edge_iterator ei;
1129*38fd1498Szrj 	  best = NULL;
1130*38fd1498Szrj 	  best_len = 0;
1131*38fd1498Szrj 	  FOR_EACH_EDGE (e, ei, traces[t2].first->preds)
1132*38fd1498Szrj 	    {
1133*38fd1498Szrj 	      int si = e->src->index;
1134*38fd1498Szrj 
1135*38fd1498Szrj 	      if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1136*38fd1498Szrj 		  && (e->flags & EDGE_CAN_FALLTHRU)
1137*38fd1498Szrj 		  && !(e->flags & EDGE_COMPLEX)
1138*38fd1498Szrj 		  && bbd[si].end_of_trace >= 0
1139*38fd1498Szrj 		  && !connected[bbd[si].end_of_trace]
1140*38fd1498Szrj 		  && (BB_PARTITION (e->src) == current_partition)
1141*38fd1498Szrj 		  && connect_better_edge_p (e, true, best_len, best, traces))
1142*38fd1498Szrj 		{
1143*38fd1498Szrj 		  best = e;
1144*38fd1498Szrj 		  best_len = traces[bbd[si].end_of_trace].length;
1145*38fd1498Szrj 		}
1146*38fd1498Szrj 	    }
1147*38fd1498Szrj 	  if (best)
1148*38fd1498Szrj 	    {
1149*38fd1498Szrj 	      best->src->aux = best->dest;
1150*38fd1498Szrj 	      t2 = bbd[best->src->index].end_of_trace;
1151*38fd1498Szrj 	      connected[t2] = true;
1152*38fd1498Szrj 
1153*38fd1498Szrj 	      if (dump_file)
1154*38fd1498Szrj 		{
1155*38fd1498Szrj 		  fprintf (dump_file, "Connection: %d %d\n",
1156*38fd1498Szrj 			   best->src->index, best->dest->index);
1157*38fd1498Szrj 		}
1158*38fd1498Szrj 	    }
1159*38fd1498Szrj 	  else
1160*38fd1498Szrj 	    break;
1161*38fd1498Szrj 	}
1162*38fd1498Szrj 
1163*38fd1498Szrj       if (last_trace >= 0)
1164*38fd1498Szrj 	traces[last_trace].last->aux = traces[t2].first;
1165*38fd1498Szrj       last_trace = t;
1166*38fd1498Szrj 
1167*38fd1498Szrj       /* Find the successor traces.  */
1168*38fd1498Szrj       while (1)
1169*38fd1498Szrj 	{
1170*38fd1498Szrj 	  /* Find the continuation of the chain.  */
1171*38fd1498Szrj 	  edge_iterator ei;
1172*38fd1498Szrj 	  best = NULL;
1173*38fd1498Szrj 	  best_len = 0;
1174*38fd1498Szrj 	  FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1175*38fd1498Szrj 	    {
1176*38fd1498Szrj 	      int di = e->dest->index;
1177*38fd1498Szrj 
1178*38fd1498Szrj 	      if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1179*38fd1498Szrj 		  && (e->flags & EDGE_CAN_FALLTHRU)
1180*38fd1498Szrj 		  && !(e->flags & EDGE_COMPLEX)
1181*38fd1498Szrj 		  && bbd[di].start_of_trace >= 0
1182*38fd1498Szrj 		  && !connected[bbd[di].start_of_trace]
1183*38fd1498Szrj 		  && (BB_PARTITION (e->dest) == current_partition)
1184*38fd1498Szrj 		  && connect_better_edge_p (e, false, best_len, best, traces))
1185*38fd1498Szrj 		{
1186*38fd1498Szrj 		  best = e;
1187*38fd1498Szrj 		  best_len = traces[bbd[di].start_of_trace].length;
1188*38fd1498Szrj 		}
1189*38fd1498Szrj 	    }
1190*38fd1498Szrj 
1191*38fd1498Szrj 	  if (for_size)
1192*38fd1498Szrj 	    {
1193*38fd1498Szrj 	      if (!best)
1194*38fd1498Szrj 		/* Stop finding the successor traces.  */
1195*38fd1498Szrj 		break;
1196*38fd1498Szrj 
1197*38fd1498Szrj 	      /* It is OK to connect block n with block n + 1 or a block
1198*38fd1498Szrj 		 before n.  For others, only connect to the loop header.  */
1199*38fd1498Szrj 	      if (best->dest->index > (traces[t].last->index + 1))
1200*38fd1498Szrj 		{
1201*38fd1498Szrj 		  int count = EDGE_COUNT (best->dest->preds);
1202*38fd1498Szrj 
1203*38fd1498Szrj 		  FOR_EACH_EDGE (e, ei, best->dest->preds)
1204*38fd1498Szrj 		    if (e->flags & EDGE_DFS_BACK)
1205*38fd1498Szrj 		      count--;
1206*38fd1498Szrj 
1207*38fd1498Szrj 		  /* If dest has multiple predecessors, skip it.  We expect
1208*38fd1498Szrj 		     that one predecessor with smaller index connects with it
1209*38fd1498Szrj 		     later.  */
1210*38fd1498Szrj 		  if (count != 1)
1211*38fd1498Szrj 		    break;
1212*38fd1498Szrj 		}
1213*38fd1498Szrj 
1214*38fd1498Szrj 	      /* Only connect Trace n with Trace n + 1.  It is conservative
1215*38fd1498Szrj 		 to keep the order as close as possible to the original order.
1216*38fd1498Szrj 		 It also helps to reduce long jumps.  */
1217*38fd1498Szrj 	      if (last_trace != bbd[best->dest->index].start_of_trace - 1)
1218*38fd1498Szrj 		break;
1219*38fd1498Szrj 
1220*38fd1498Szrj 	      if (dump_file)
1221*38fd1498Szrj 		fprintf (dump_file, "Connection: %d %d\n",
1222*38fd1498Szrj 			 best->src->index, best->dest->index);
1223*38fd1498Szrj 
1224*38fd1498Szrj 	      t = bbd[best->dest->index].start_of_trace;
1225*38fd1498Szrj 	      traces[last_trace].last->aux = traces[t].first;
1226*38fd1498Szrj 	      connected[t] = true;
1227*38fd1498Szrj 	      last_trace = t;
1228*38fd1498Szrj 	    }
1229*38fd1498Szrj 	  else if (best)
1230*38fd1498Szrj 	    {
1231*38fd1498Szrj 	      if (dump_file)
1232*38fd1498Szrj 		{
1233*38fd1498Szrj 		  fprintf (dump_file, "Connection: %d %d\n",
1234*38fd1498Szrj 			   best->src->index, best->dest->index);
1235*38fd1498Szrj 		}
1236*38fd1498Szrj 	      t = bbd[best->dest->index].start_of_trace;
1237*38fd1498Szrj 	      traces[last_trace].last->aux = traces[t].first;
1238*38fd1498Szrj 	      connected[t] = true;
1239*38fd1498Szrj 	      last_trace = t;
1240*38fd1498Szrj 	    }
1241*38fd1498Szrj 	  else
1242*38fd1498Szrj 	    {
1243*38fd1498Szrj 	      /* Try to connect the traces by duplication of 1 block.  */
1244*38fd1498Szrj 	      edge e2;
1245*38fd1498Szrj 	      basic_block next_bb = NULL;
1246*38fd1498Szrj 	      bool try_copy = false;
1247*38fd1498Szrj 
1248*38fd1498Szrj 	      FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1249*38fd1498Szrj 		if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1250*38fd1498Szrj 		    && (e->flags & EDGE_CAN_FALLTHRU)
1251*38fd1498Szrj 		    && !(e->flags & EDGE_COMPLEX)
1252*38fd1498Szrj 		    && (!best || e->probability > best->probability))
1253*38fd1498Szrj 		  {
1254*38fd1498Szrj 		    edge_iterator ei;
1255*38fd1498Szrj 		    edge best2 = NULL;
1256*38fd1498Szrj 		    int best2_len = 0;
1257*38fd1498Szrj 
1258*38fd1498Szrj 		    /* If the destination is a start of a trace which is only
1259*38fd1498Szrj 		       one block long, then no need to search the successor
1260*38fd1498Szrj 		       blocks of the trace.  Accept it.  */
1261*38fd1498Szrj 		    if (bbd[e->dest->index].start_of_trace >= 0
1262*38fd1498Szrj 			&& traces[bbd[e->dest->index].start_of_trace].length
1263*38fd1498Szrj 			   == 1)
1264*38fd1498Szrj 		      {
1265*38fd1498Szrj 			best = e;
1266*38fd1498Szrj 			try_copy = true;
1267*38fd1498Szrj 			continue;
1268*38fd1498Szrj 		      }
1269*38fd1498Szrj 
1270*38fd1498Szrj 		    FOR_EACH_EDGE (e2, ei, e->dest->succs)
1271*38fd1498Szrj 		      {
1272*38fd1498Szrj 			int di = e2->dest->index;
1273*38fd1498Szrj 
1274*38fd1498Szrj 			if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
1275*38fd1498Szrj 			    || ((e2->flags & EDGE_CAN_FALLTHRU)
1276*38fd1498Szrj 				&& !(e2->flags & EDGE_COMPLEX)
1277*38fd1498Szrj 				&& bbd[di].start_of_trace >= 0
1278*38fd1498Szrj 				&& !connected[bbd[di].start_of_trace]
1279*38fd1498Szrj 				&& BB_PARTITION (e2->dest) == current_partition
1280*38fd1498Szrj 				&& e2->count () >= count_threshold
1281*38fd1498Szrj 				&& (!best2
1282*38fd1498Szrj 				    || e2->probability > best2->probability
1283*38fd1498Szrj 				    || (e2->probability == best2->probability
1284*38fd1498Szrj 					&& traces[bbd[di].start_of_trace].length
1285*38fd1498Szrj 					   > best2_len))))
1286*38fd1498Szrj 			  {
1287*38fd1498Szrj 			    best = e;
1288*38fd1498Szrj 			    best2 = e2;
1289*38fd1498Szrj 			    if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
1290*38fd1498Szrj 			      best2_len = traces[bbd[di].start_of_trace].length;
1291*38fd1498Szrj 			    else
1292*38fd1498Szrj 			      best2_len = INT_MAX;
1293*38fd1498Szrj 			    next_bb = e2->dest;
1294*38fd1498Szrj 			    try_copy = true;
1295*38fd1498Szrj 			  }
1296*38fd1498Szrj 		      }
1297*38fd1498Szrj 		  }
1298*38fd1498Szrj 
1299*38fd1498Szrj 	      /* Copy tiny blocks always; copy larger blocks only when the
1300*38fd1498Szrj 		 edge is traversed frequently enough.  */
1301*38fd1498Szrj 	      if (try_copy
1302*38fd1498Szrj 		  && BB_PARTITION (best->src) == BB_PARTITION (best->dest)
1303*38fd1498Szrj 		  && copy_bb_p (best->dest,
1304*38fd1498Szrj 				optimize_edge_for_speed_p (best)
1305*38fd1498Szrj 				&& (!best->count ().initialized_p ()
1306*38fd1498Szrj 				    || best->count () >= count_threshold)))
1307*38fd1498Szrj 		{
1308*38fd1498Szrj 		  basic_block new_bb;
1309*38fd1498Szrj 
1310*38fd1498Szrj 		  if (dump_file)
1311*38fd1498Szrj 		    {
1312*38fd1498Szrj 		      fprintf (dump_file, "Connection: %d %d ",
1313*38fd1498Szrj 			       traces[t].last->index, best->dest->index);
1314*38fd1498Szrj 		      if (!next_bb)
1315*38fd1498Szrj 			fputc ('\n', dump_file);
1316*38fd1498Szrj 		      else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
1317*38fd1498Szrj 			fprintf (dump_file, "exit\n");
1318*38fd1498Szrj 		      else
1319*38fd1498Szrj 			fprintf (dump_file, "%d\n", next_bb->index);
1320*38fd1498Szrj 		    }
1321*38fd1498Szrj 
1322*38fd1498Szrj 		  new_bb = copy_bb (best->dest, best, traces[t].last, t);
1323*38fd1498Szrj 		  traces[t].last = new_bb;
1324*38fd1498Szrj 		  if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
1325*38fd1498Szrj 		    {
1326*38fd1498Szrj 		      t = bbd[next_bb->index].start_of_trace;
1327*38fd1498Szrj 		      traces[last_trace].last->aux = traces[t].first;
1328*38fd1498Szrj 		      connected[t] = true;
1329*38fd1498Szrj 		      last_trace = t;
1330*38fd1498Szrj 		    }
1331*38fd1498Szrj 		  else
1332*38fd1498Szrj 		    break;	/* Stop finding the successor traces.  */
1333*38fd1498Szrj 		}
1334*38fd1498Szrj 	      else
1335*38fd1498Szrj 		break;	/* Stop finding the successor traces.  */
1336*38fd1498Szrj 	    }
1337*38fd1498Szrj 	}
1338*38fd1498Szrj     }
1339*38fd1498Szrj 
1340*38fd1498Szrj   if (dump_file)
1341*38fd1498Szrj     {
1342*38fd1498Szrj       basic_block bb;
1343*38fd1498Szrj 
1344*38fd1498Szrj       fprintf (dump_file, "Final order:\n");
1345*38fd1498Szrj       for (bb = traces[0].first; bb; bb = (basic_block) bb->aux)
1346*38fd1498Szrj 	fprintf (dump_file, "%d ", bb->index);
1347*38fd1498Szrj       fprintf (dump_file, "\n");
1348*38fd1498Szrj       fflush (dump_file);
1349*38fd1498Szrj     }
1350*38fd1498Szrj 
1351*38fd1498Szrj   FREE (connected);
1352*38fd1498Szrj }
1353*38fd1498Szrj 
1354*38fd1498Szrj /* Return true when BB can and should be copied. CODE_MAY_GROW is true
1355*38fd1498Szrj    when code size is allowed to grow by duplication.  */
1356*38fd1498Szrj 
1357*38fd1498Szrj static bool
1358*38fd1498Szrj copy_bb_p (const_basic_block bb, int code_may_grow)
1359*38fd1498Szrj {
1360*38fd1498Szrj   int size = 0;
1361*38fd1498Szrj   int max_size = uncond_jump_length;
1362*38fd1498Szrj   rtx_insn *insn;
1363*38fd1498Szrj 
1364*38fd1498Szrj   if (EDGE_COUNT (bb->preds) < 2)
1365*38fd1498Szrj     return false;
1366*38fd1498Szrj   if (!can_duplicate_block_p (bb))
1367*38fd1498Szrj     return false;
1368*38fd1498Szrj 
1369*38fd1498Szrj   /* Avoid duplicating blocks which have many successors (PR/13430).  */
1370*38fd1498Szrj   if (EDGE_COUNT (bb->succs) > 8)
1371*38fd1498Szrj     return false;
1372*38fd1498Szrj 
1373*38fd1498Szrj   if (code_may_grow && optimize_bb_for_speed_p (bb))
1374*38fd1498Szrj     max_size *= PARAM_VALUE (PARAM_MAX_GROW_COPY_BB_INSNS);
1375*38fd1498Szrj 
1376*38fd1498Szrj   FOR_BB_INSNS (bb, insn)
1377*38fd1498Szrj     {
1378*38fd1498Szrj       if (INSN_P (insn))
1379*38fd1498Szrj 	size += get_attr_min_length (insn);
1380*38fd1498Szrj     }
1381*38fd1498Szrj 
1382*38fd1498Szrj   if (size <= max_size)
1383*38fd1498Szrj     return true;
1384*38fd1498Szrj 
1385*38fd1498Szrj   if (dump_file)
1386*38fd1498Szrj     {
1387*38fd1498Szrj       fprintf (dump_file,
1388*38fd1498Szrj 	       "Block %d can't be copied because its size = %d.\n",
1389*38fd1498Szrj 	       bb->index, size);
1390*38fd1498Szrj     }
1391*38fd1498Szrj 
1392*38fd1498Szrj   return false;
1393*38fd1498Szrj }
1394*38fd1498Szrj 
1395*38fd1498Szrj /* Return the length of unconditional jump instruction.  */
1396*38fd1498Szrj 
1397*38fd1498Szrj int
1398*38fd1498Szrj get_uncond_jump_length (void)
1399*38fd1498Szrj {
1400*38fd1498Szrj   int length;
1401*38fd1498Szrj 
1402*38fd1498Szrj   start_sequence ();
1403*38fd1498Szrj   rtx_code_label *label = emit_label (gen_label_rtx ());
1404*38fd1498Szrj   rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label));
1405*38fd1498Szrj   length = get_attr_min_length (jump);
1406*38fd1498Szrj   end_sequence ();
1407*38fd1498Szrj 
1408*38fd1498Szrj   return length;
1409*38fd1498Szrj }
1410*38fd1498Szrj 
1411*38fd1498Szrj /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
1412*38fd1498Szrj    Add a new landing pad that will just jump to the old one and split the
1413*38fd1498Szrj    edges so that no EH edge crosses partitions.  */
1414*38fd1498Szrj 
1415*38fd1498Szrj static void
1416*38fd1498Szrj fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
1417*38fd1498Szrj {
1418*38fd1498Szrj   eh_landing_pad new_lp;
1419*38fd1498Szrj   basic_block new_bb, last_bb;
1420*38fd1498Szrj   rtx_insn *jump;
1421*38fd1498Szrj   unsigned new_partition;
1422*38fd1498Szrj   edge_iterator ei;
1423*38fd1498Szrj   edge e;
1424*38fd1498Szrj 
1425*38fd1498Szrj   /* Generate the new landing-pad structure.  */
1426*38fd1498Szrj   new_lp = gen_eh_landing_pad (old_lp->region);
1427*38fd1498Szrj   new_lp->post_landing_pad = old_lp->post_landing_pad;
1428*38fd1498Szrj   new_lp->landing_pad = gen_label_rtx ();
1429*38fd1498Szrj   LABEL_PRESERVE_P (new_lp->landing_pad) = 1;
1430*38fd1498Szrj 
1431*38fd1498Szrj   /* Put appropriate instructions in new bb.  */
1432*38fd1498Szrj   rtx_code_label *new_label = emit_label (new_lp->landing_pad);
1433*38fd1498Szrj 
1434*38fd1498Szrj   rtx_code_label *old_label = block_label (old_bb);
1435*38fd1498Szrj   jump = emit_jump_insn (targetm.gen_jump (old_label));
1436*38fd1498Szrj   JUMP_LABEL (jump) = old_label;
1437*38fd1498Szrj 
1438*38fd1498Szrj   /* Create new basic block to be dest for lp.  */
1439*38fd1498Szrj   last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
1440*38fd1498Szrj   new_bb = create_basic_block (new_label, jump, last_bb);
1441*38fd1498Szrj   new_bb->aux = last_bb->aux;
1442*38fd1498Szrj   new_bb->count = old_bb->count;
1443*38fd1498Szrj   last_bb->aux = new_bb;
1444*38fd1498Szrj 
1445*38fd1498Szrj   emit_barrier_after_bb (new_bb);
1446*38fd1498Szrj 
1447*38fd1498Szrj   make_single_succ_edge (new_bb, old_bb, 0);
1448*38fd1498Szrj 
1449*38fd1498Szrj   /* Make sure new bb is in the other partition.  */
1450*38fd1498Szrj   new_partition = BB_PARTITION (old_bb);
1451*38fd1498Szrj   new_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1452*38fd1498Szrj   BB_SET_PARTITION (new_bb, new_partition);
1453*38fd1498Szrj 
1454*38fd1498Szrj   /* Fix up the edges.  */
1455*38fd1498Szrj   for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1456*38fd1498Szrj     if (e->src != new_bb && BB_PARTITION (e->src) == new_partition)
1457*38fd1498Szrj       {
1458*38fd1498Szrj 	rtx_insn *insn = BB_END (e->src);
1459*38fd1498Szrj 	rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1460*38fd1498Szrj 
1461*38fd1498Szrj 	gcc_assert (note != NULL);
1462*38fd1498Szrj 	gcc_checking_assert (INTVAL (XEXP (note, 0)) == old_lp->index);
1463*38fd1498Szrj 	XEXP (note, 0) = GEN_INT (new_lp->index);
1464*38fd1498Szrj 
1465*38fd1498Szrj 	/* Adjust the edge to the new destination.  */
1466*38fd1498Szrj 	redirect_edge_succ (e, new_bb);
1467*38fd1498Szrj       }
1468*38fd1498Szrj     else
1469*38fd1498Szrj       ei_next (&ei);
1470*38fd1498Szrj }
1471*38fd1498Szrj 
1472*38fd1498Szrj 
1473*38fd1498Szrj /* Ensure that all hot bbs are included in a hot path through the
1474*38fd1498Szrj    procedure. This is done by calling this function twice, once
1475*38fd1498Szrj    with WALK_UP true (to look for paths from the entry to hot bbs) and
1476*38fd1498Szrj    once with WALK_UP false (to look for paths from hot bbs to the exit).
1477*38fd1498Szrj    Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
1478*38fd1498Szrj    to BBS_IN_HOT_PARTITION.  */
1479*38fd1498Szrj 
1480*38fd1498Szrj static unsigned int
1481*38fd1498Szrj sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
1482*38fd1498Szrj                     vec<basic_block> *bbs_in_hot_partition)
1483*38fd1498Szrj {
1484*38fd1498Szrj   /* Callers check this.  */
1485*38fd1498Szrj   gcc_checking_assert (cold_bb_count);
1486*38fd1498Szrj 
1487*38fd1498Szrj   /* Keep examining hot bbs while we still have some left to check
1488*38fd1498Szrj      and there are remaining cold bbs.  */
1489*38fd1498Szrj   vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
1490*38fd1498Szrj   while (! hot_bbs_to_check.is_empty ()
1491*38fd1498Szrj          && cold_bb_count)
1492*38fd1498Szrj     {
1493*38fd1498Szrj       basic_block bb = hot_bbs_to_check.pop ();
1494*38fd1498Szrj       vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
1495*38fd1498Szrj       edge e;
1496*38fd1498Szrj       edge_iterator ei;
1497*38fd1498Szrj       profile_probability highest_probability
1498*38fd1498Szrj 				 = profile_probability::uninitialized ();
1499*38fd1498Szrj       profile_count highest_count = profile_count::uninitialized ();
1500*38fd1498Szrj       bool found = false;
1501*38fd1498Szrj 
1502*38fd1498Szrj       /* Walk the preds/succs and check if there is at least one already
1503*38fd1498Szrj          marked hot. Keep track of the most frequent pred/succ so that we
1504*38fd1498Szrj          can mark it hot if we don't find one.  */
1505*38fd1498Szrj       FOR_EACH_EDGE (e, ei, edges)
1506*38fd1498Szrj         {
1507*38fd1498Szrj           basic_block reach_bb = walk_up ? e->src : e->dest;
1508*38fd1498Szrj 
1509*38fd1498Szrj           if (e->flags & EDGE_DFS_BACK)
1510*38fd1498Szrj             continue;
1511*38fd1498Szrj 
1512*38fd1498Szrj 	  /* Do not expect profile insanities when profile was not adjusted.  */
1513*38fd1498Szrj 	  if (e->probability == profile_probability::never ()
1514*38fd1498Szrj 	      || e->count () == profile_count::zero ())
1515*38fd1498Szrj 	    continue;
1516*38fd1498Szrj 
1517*38fd1498Szrj           if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
1518*38fd1498Szrj           {
1519*38fd1498Szrj             found = true;
1520*38fd1498Szrj             break;
1521*38fd1498Szrj           }
1522*38fd1498Szrj           /* The following loop will look for the hottest edge via
1523*38fd1498Szrj              the edge count, if it is non-zero, then fallback to
1524*38fd1498Szrj              the edge probability.  */
1525*38fd1498Szrj           if (!(e->count () > highest_count))
1526*38fd1498Szrj             highest_count = e->count ();
1527*38fd1498Szrj           if (!highest_probability.initialized_p ()
1528*38fd1498Szrj 	      || e->probability > highest_probability)
1529*38fd1498Szrj             highest_probability = e->probability;
1530*38fd1498Szrj         }
1531*38fd1498Szrj 
1532*38fd1498Szrj       /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
1533*38fd1498Szrj          block (or unpartitioned, e.g. the entry block) then it is ok. If not,
1534*38fd1498Szrj          then the most frequent pred (or succ) needs to be adjusted.  In the
1535*38fd1498Szrj          case where multiple preds/succs have the same frequency (e.g. a
1536*38fd1498Szrj          50-50 branch), then both will be adjusted.  */
1537*38fd1498Szrj       if (found)
1538*38fd1498Szrj         continue;
1539*38fd1498Szrj 
1540*38fd1498Szrj       FOR_EACH_EDGE (e, ei, edges)
1541*38fd1498Szrj         {
1542*38fd1498Szrj           if (e->flags & EDGE_DFS_BACK)
1543*38fd1498Szrj             continue;
1544*38fd1498Szrj 	  /* Do not expect profile insanities when profile was not adjusted.  */
1545*38fd1498Szrj 	  if (e->probability == profile_probability::never ()
1546*38fd1498Szrj 	      || e->count () == profile_count::zero ())
1547*38fd1498Szrj 	    continue;
1548*38fd1498Szrj           /* Select the hottest edge using the edge count, if it is non-zero,
1549*38fd1498Szrj              then fallback to the edge probability.  */
1550*38fd1498Szrj           if (highest_count.initialized_p ())
1551*38fd1498Szrj             {
1552*38fd1498Szrj               if (!(e->count () >= highest_count))
1553*38fd1498Szrj                 continue;
1554*38fd1498Szrj             }
1555*38fd1498Szrj           else if (!(e->probability >= highest_probability))
1556*38fd1498Szrj             continue;
1557*38fd1498Szrj 
1558*38fd1498Szrj           basic_block reach_bb = walk_up ? e->src : e->dest;
1559*38fd1498Szrj 
1560*38fd1498Szrj           /* We have a hot bb with an immediate dominator that is cold.
1561*38fd1498Szrj              The dominator needs to be re-marked hot.  */
1562*38fd1498Szrj           BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
1563*38fd1498Szrj 	  if (dump_file)
1564*38fd1498Szrj 	    fprintf (dump_file, "Promoting bb %i to hot partition to sanitize "
1565*38fd1498Szrj 		     "profile of bb %i in %s walk\n", reach_bb->index,
1566*38fd1498Szrj 		     bb->index, walk_up ? "backward" : "forward");
1567*38fd1498Szrj           cold_bb_count--;
1568*38fd1498Szrj 
1569*38fd1498Szrj           /* Now we need to examine newly-hot reach_bb to see if it is also
1570*38fd1498Szrj              dominated by a cold bb.  */
1571*38fd1498Szrj           bbs_in_hot_partition->safe_push (reach_bb);
1572*38fd1498Szrj           hot_bbs_to_check.safe_push (reach_bb);
1573*38fd1498Szrj         }
1574*38fd1498Szrj     }
1575*38fd1498Szrj 
1576*38fd1498Szrj   return cold_bb_count;
1577*38fd1498Szrj }
1578*38fd1498Szrj 
1579*38fd1498Szrj 
1580*38fd1498Szrj /* Find the basic blocks that are rarely executed and need to be moved to
1581*38fd1498Szrj    a separate section of the .o file (to cut down on paging and improve
1582*38fd1498Szrj    cache locality).  Return a vector of all edges that cross.  */
1583*38fd1498Szrj 
1584*38fd1498Szrj static vec<edge>
1585*38fd1498Szrj find_rarely_executed_basic_blocks_and_crossing_edges (void)
1586*38fd1498Szrj {
1587*38fd1498Szrj   vec<edge> crossing_edges = vNULL;
1588*38fd1498Szrj   basic_block bb;
1589*38fd1498Szrj   edge e;
1590*38fd1498Szrj   edge_iterator ei;
1591*38fd1498Szrj   unsigned int cold_bb_count = 0;
1592*38fd1498Szrj   auto_vec<basic_block> bbs_in_hot_partition;
1593*38fd1498Szrj 
1594*38fd1498Szrj   propagate_unlikely_bbs_forward ();
1595*38fd1498Szrj 
1596*38fd1498Szrj   /* Mark which partition (hot/cold) each basic block belongs in.  */
1597*38fd1498Szrj   FOR_EACH_BB_FN (bb, cfun)
1598*38fd1498Szrj     {
1599*38fd1498Szrj       bool cold_bb = false;
1600*38fd1498Szrj 
1601*38fd1498Szrj       if (probably_never_executed_bb_p (cfun, bb))
1602*38fd1498Szrj         {
1603*38fd1498Szrj           /* Handle profile insanities created by upstream optimizations
1604*38fd1498Szrj              by also checking the incoming edge weights. If there is a non-cold
1605*38fd1498Szrj              incoming edge, conservatively prevent this block from being split
1606*38fd1498Szrj              into the cold section.  */
1607*38fd1498Szrj           cold_bb = true;
1608*38fd1498Szrj           FOR_EACH_EDGE (e, ei, bb->preds)
1609*38fd1498Szrj             if (!probably_never_executed_edge_p (cfun, e))
1610*38fd1498Szrj               {
1611*38fd1498Szrj                 cold_bb = false;
1612*38fd1498Szrj                 break;
1613*38fd1498Szrj               }
1614*38fd1498Szrj         }
1615*38fd1498Szrj       if (cold_bb)
1616*38fd1498Szrj         {
1617*38fd1498Szrj           BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1618*38fd1498Szrj           cold_bb_count++;
1619*38fd1498Szrj         }
1620*38fd1498Szrj       else
1621*38fd1498Szrj         {
1622*38fd1498Szrj           BB_SET_PARTITION (bb, BB_HOT_PARTITION);
1623*38fd1498Szrj           bbs_in_hot_partition.safe_push (bb);
1624*38fd1498Szrj         }
1625*38fd1498Szrj     }
1626*38fd1498Szrj 
1627*38fd1498Szrj   /* Ensure that hot bbs are included along a hot path from the entry to exit.
1628*38fd1498Szrj      Several different possibilities may include cold bbs along all paths
1629*38fd1498Szrj      to/from a hot bb. One is that there are edge weight insanities
1630*38fd1498Szrj      due to optimization phases that do not properly update basic block profile
1631*38fd1498Szrj      counts. The second is that the entry of the function may not be hot, because
1632*38fd1498Szrj      it is entered fewer times than the number of profile training runs, but there
1633*38fd1498Szrj      is a loop inside the function that causes blocks within the function to be
1634*38fd1498Szrj      above the threshold for hotness. This is fixed by walking up from hot bbs
1635*38fd1498Szrj      to the entry block, and then down from hot bbs to the exit, performing
1636*38fd1498Szrj      partitioning fixups as necessary.  */
1637*38fd1498Szrj   if (cold_bb_count)
1638*38fd1498Szrj     {
1639*38fd1498Szrj       mark_dfs_back_edges ();
1640*38fd1498Szrj       cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
1641*38fd1498Szrj                                           &bbs_in_hot_partition);
1642*38fd1498Szrj       if (cold_bb_count)
1643*38fd1498Szrj         sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
1644*38fd1498Szrj 
1645*38fd1498Szrj       hash_set <basic_block> set;
1646*38fd1498Szrj       find_bbs_reachable_by_hot_paths (&set);
1647*38fd1498Szrj       FOR_EACH_BB_FN (bb, cfun)
1648*38fd1498Szrj 	if (!set.contains (bb))
1649*38fd1498Szrj 	  BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1650*38fd1498Szrj     }
1651*38fd1498Szrj 
1652*38fd1498Szrj   /* The format of .gcc_except_table does not allow landing pads to
1653*38fd1498Szrj      be in a different partition as the throw.  Fix this by either
1654*38fd1498Szrj      moving or duplicating the landing pads.  */
1655*38fd1498Szrj   if (cfun->eh->lp_array)
1656*38fd1498Szrj     {
1657*38fd1498Szrj       unsigned i;
1658*38fd1498Szrj       eh_landing_pad lp;
1659*38fd1498Szrj 
1660*38fd1498Szrj       FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp)
1661*38fd1498Szrj 	{
1662*38fd1498Szrj 	  bool all_same, all_diff;
1663*38fd1498Szrj 
1664*38fd1498Szrj 	  if (lp == NULL
1665*38fd1498Szrj 	      || lp->landing_pad == NULL_RTX
1666*38fd1498Szrj 	      || !LABEL_P (lp->landing_pad))
1667*38fd1498Szrj 	    continue;
1668*38fd1498Szrj 
1669*38fd1498Szrj 	  all_same = all_diff = true;
1670*38fd1498Szrj 	  bb = BLOCK_FOR_INSN (lp->landing_pad);
1671*38fd1498Szrj 	  FOR_EACH_EDGE (e, ei, bb->preds)
1672*38fd1498Szrj 	    {
1673*38fd1498Szrj 	      gcc_assert (e->flags & EDGE_EH);
1674*38fd1498Szrj 	      if (BB_PARTITION (bb) == BB_PARTITION (e->src))
1675*38fd1498Szrj 		all_diff = false;
1676*38fd1498Szrj 	      else
1677*38fd1498Szrj 		all_same = false;
1678*38fd1498Szrj 	    }
1679*38fd1498Szrj 
1680*38fd1498Szrj 	  if (all_same)
1681*38fd1498Szrj 	    ;
1682*38fd1498Szrj 	  else if (all_diff)
1683*38fd1498Szrj 	    {
1684*38fd1498Szrj 	      int which = BB_PARTITION (bb);
1685*38fd1498Szrj 	      which ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1686*38fd1498Szrj 	      BB_SET_PARTITION (bb, which);
1687*38fd1498Szrj 	    }
1688*38fd1498Szrj 	  else
1689*38fd1498Szrj 	    fix_up_crossing_landing_pad (lp, bb);
1690*38fd1498Szrj 	}
1691*38fd1498Szrj     }
1692*38fd1498Szrj 
1693*38fd1498Szrj   /* Mark every edge that crosses between sections.  */
1694*38fd1498Szrj 
1695*38fd1498Szrj   FOR_EACH_BB_FN (bb, cfun)
1696*38fd1498Szrj     FOR_EACH_EDGE (e, ei, bb->succs)
1697*38fd1498Szrj       {
1698*38fd1498Szrj 	unsigned int flags = e->flags;
1699*38fd1498Szrj 
1700*38fd1498Szrj         /* We should never have EDGE_CROSSING set yet.  */
1701*38fd1498Szrj 	gcc_checking_assert ((flags & EDGE_CROSSING) == 0);
1702*38fd1498Szrj 
1703*38fd1498Szrj 	if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1704*38fd1498Szrj 	    && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1705*38fd1498Szrj 	    && BB_PARTITION (e->src) != BB_PARTITION (e->dest))
1706*38fd1498Szrj 	  {
1707*38fd1498Szrj 	    crossing_edges.safe_push (e);
1708*38fd1498Szrj 	    flags |= EDGE_CROSSING;
1709*38fd1498Szrj 	  }
1710*38fd1498Szrj 
1711*38fd1498Szrj 	/* Now that we've split eh edges as appropriate, allow landing pads
1712*38fd1498Szrj 	   to be merged with the post-landing pads.  */
1713*38fd1498Szrj 	flags &= ~EDGE_PRESERVE;
1714*38fd1498Szrj 
1715*38fd1498Szrj 	e->flags = flags;
1716*38fd1498Szrj       }
1717*38fd1498Szrj 
1718*38fd1498Szrj   return crossing_edges;
1719*38fd1498Szrj }
1720*38fd1498Szrj 
1721*38fd1498Szrj /* Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru.  */
1722*38fd1498Szrj 
1723*38fd1498Szrj static void
1724*38fd1498Szrj set_edge_can_fallthru_flag (void)
1725*38fd1498Szrj {
1726*38fd1498Szrj   basic_block bb;
1727*38fd1498Szrj 
1728*38fd1498Szrj   FOR_EACH_BB_FN (bb, cfun)
1729*38fd1498Szrj     {
1730*38fd1498Szrj       edge e;
1731*38fd1498Szrj       edge_iterator ei;
1732*38fd1498Szrj 
1733*38fd1498Szrj       FOR_EACH_EDGE (e, ei, bb->succs)
1734*38fd1498Szrj 	{
1735*38fd1498Szrj 	  e->flags &= ~EDGE_CAN_FALLTHRU;
1736*38fd1498Szrj 
1737*38fd1498Szrj 	  /* The FALLTHRU edge is also CAN_FALLTHRU edge.  */
1738*38fd1498Szrj 	  if (e->flags & EDGE_FALLTHRU)
1739*38fd1498Szrj 	    e->flags |= EDGE_CAN_FALLTHRU;
1740*38fd1498Szrj 	}
1741*38fd1498Szrj 
1742*38fd1498Szrj       /* If the BB ends with an invertible condjump all (2) edges are
1743*38fd1498Szrj 	 CAN_FALLTHRU edges.  */
1744*38fd1498Szrj       if (EDGE_COUNT (bb->succs) != 2)
1745*38fd1498Szrj 	continue;
1746*38fd1498Szrj       if (!any_condjump_p (BB_END (bb)))
1747*38fd1498Szrj 	continue;
1748*38fd1498Szrj 
1749*38fd1498Szrj       rtx_jump_insn *bb_end_jump = as_a <rtx_jump_insn *> (BB_END (bb));
1750*38fd1498Szrj       if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0))
1751*38fd1498Szrj 	continue;
1752*38fd1498Szrj       invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0);
1753*38fd1498Szrj       EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU;
1754*38fd1498Szrj       EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU;
1755*38fd1498Szrj     }
1756*38fd1498Szrj }
1757*38fd1498Szrj 
1758*38fd1498Szrj /* If any destination of a crossing edge does not have a label, add label;
1759*38fd1498Szrj    Convert any easy fall-through crossing edges to unconditional jumps.  */
1760*38fd1498Szrj 
1761*38fd1498Szrj static void
1762*38fd1498Szrj add_labels_and_missing_jumps (vec<edge> crossing_edges)
1763*38fd1498Szrj {
1764*38fd1498Szrj   size_t i;
1765*38fd1498Szrj   edge e;
1766*38fd1498Szrj 
1767*38fd1498Szrj   FOR_EACH_VEC_ELT (crossing_edges, i, e)
1768*38fd1498Szrj     {
1769*38fd1498Szrj       basic_block src = e->src;
1770*38fd1498Szrj       basic_block dest = e->dest;
1771*38fd1498Szrj       rtx_jump_insn *new_jump;
1772*38fd1498Szrj 
1773*38fd1498Szrj       if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
1774*38fd1498Szrj 	continue;
1775*38fd1498Szrj 
1776*38fd1498Szrj       /* Make sure dest has a label.  */
1777*38fd1498Szrj       rtx_code_label *label = block_label (dest);
1778*38fd1498Szrj 
1779*38fd1498Szrj       /* Nothing to do for non-fallthru edges.  */
1780*38fd1498Szrj       if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun))
1781*38fd1498Szrj 	continue;
1782*38fd1498Szrj       if ((e->flags & EDGE_FALLTHRU) == 0)
1783*38fd1498Szrj 	continue;
1784*38fd1498Szrj 
1785*38fd1498Szrj       /* If the block does not end with a control flow insn, then we
1786*38fd1498Szrj 	 can trivially add a jump to the end to fixup the crossing.
1787*38fd1498Szrj 	 Otherwise the jump will have to go in a new bb, which will
1788*38fd1498Szrj 	 be handled by fix_up_fall_thru_edges function.  */
1789*38fd1498Szrj       if (control_flow_insn_p (BB_END (src)))
1790*38fd1498Szrj 	continue;
1791*38fd1498Szrj 
1792*38fd1498Szrj       /* Make sure there's only one successor.  */
1793*38fd1498Szrj       gcc_assert (single_succ_p (src));
1794*38fd1498Szrj 
1795*38fd1498Szrj       new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src));
1796*38fd1498Szrj       BB_END (src) = new_jump;
1797*38fd1498Szrj       JUMP_LABEL (new_jump) = label;
1798*38fd1498Szrj       LABEL_NUSES (label) += 1;
1799*38fd1498Szrj 
1800*38fd1498Szrj       emit_barrier_after_bb (src);
1801*38fd1498Szrj 
1802*38fd1498Szrj       /* Mark edge as non-fallthru.  */
1803*38fd1498Szrj       e->flags &= ~EDGE_FALLTHRU;
1804*38fd1498Szrj     }
1805*38fd1498Szrj }
1806*38fd1498Szrj 
1807*38fd1498Szrj /* Find any bb's where the fall-through edge is a crossing edge (note that
1808*38fd1498Szrj    these bb's must also contain a conditional jump or end with a call
1809*38fd1498Szrj    instruction; we've already dealt with fall-through edges for blocks
1810*38fd1498Szrj    that didn't have a conditional jump or didn't end with call instruction
1811*38fd1498Szrj    in the call to add_labels_and_missing_jumps).  Convert the fall-through
1812*38fd1498Szrj    edge to non-crossing edge by inserting a new bb to fall-through into.
1813*38fd1498Szrj    The new bb will contain an unconditional jump (crossing edge) to the
1814*38fd1498Szrj    original fall through destination.  */
1815*38fd1498Szrj 
1816*38fd1498Szrj static void
1817*38fd1498Szrj fix_up_fall_thru_edges (void)
1818*38fd1498Szrj {
1819*38fd1498Szrj   basic_block cur_bb;
1820*38fd1498Szrj 
1821*38fd1498Szrj   FOR_EACH_BB_FN (cur_bb, cfun)
1822*38fd1498Szrj     {
1823*38fd1498Szrj       edge succ1;
1824*38fd1498Szrj       edge succ2;
1825*38fd1498Szrj       edge fall_thru = NULL;
1826*38fd1498Szrj       edge cond_jump = NULL;
1827*38fd1498Szrj 
1828*38fd1498Szrj       fall_thru = NULL;
1829*38fd1498Szrj       if (EDGE_COUNT (cur_bb->succs) > 0)
1830*38fd1498Szrj 	succ1 = EDGE_SUCC (cur_bb, 0);
1831*38fd1498Szrj       else
1832*38fd1498Szrj 	succ1 = NULL;
1833*38fd1498Szrj 
1834*38fd1498Szrj       if (EDGE_COUNT (cur_bb->succs) > 1)
1835*38fd1498Szrj 	succ2 = EDGE_SUCC (cur_bb, 1);
1836*38fd1498Szrj       else
1837*38fd1498Szrj 	succ2 = NULL;
1838*38fd1498Szrj 
1839*38fd1498Szrj       /* Find the fall-through edge.  */
1840*38fd1498Szrj 
1841*38fd1498Szrj       if (succ1
1842*38fd1498Szrj 	  && (succ1->flags & EDGE_FALLTHRU))
1843*38fd1498Szrj 	{
1844*38fd1498Szrj 	  fall_thru = succ1;
1845*38fd1498Szrj 	  cond_jump = succ2;
1846*38fd1498Szrj 	}
1847*38fd1498Szrj       else if (succ2
1848*38fd1498Szrj 	       && (succ2->flags & EDGE_FALLTHRU))
1849*38fd1498Szrj 	{
1850*38fd1498Szrj 	  fall_thru = succ2;
1851*38fd1498Szrj 	  cond_jump = succ1;
1852*38fd1498Szrj 	}
1853*38fd1498Szrj       else if (succ2 && EDGE_COUNT (cur_bb->succs) > 2)
1854*38fd1498Szrj 	fall_thru = find_fallthru_edge (cur_bb->succs);
1855*38fd1498Szrj 
1856*38fd1498Szrj       if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)))
1857*38fd1498Szrj 	{
1858*38fd1498Szrj 	  /* Check to see if the fall-thru edge is a crossing edge.  */
1859*38fd1498Szrj 
1860*38fd1498Szrj 	  if (fall_thru->flags & EDGE_CROSSING)
1861*38fd1498Szrj 	    {
1862*38fd1498Szrj 	      /* The fall_thru edge crosses; now check the cond jump edge, if
1863*38fd1498Szrj 		 it exists.  */
1864*38fd1498Szrj 
1865*38fd1498Szrj 	      bool cond_jump_crosses = true;
1866*38fd1498Szrj 	      int invert_worked = 0;
1867*38fd1498Szrj 	      rtx_insn *old_jump = BB_END (cur_bb);
1868*38fd1498Szrj 
1869*38fd1498Szrj 	      /* Find the jump instruction, if there is one.  */
1870*38fd1498Szrj 
1871*38fd1498Szrj 	      if (cond_jump)
1872*38fd1498Szrj 		{
1873*38fd1498Szrj 		  if (!(cond_jump->flags & EDGE_CROSSING))
1874*38fd1498Szrj 		    cond_jump_crosses = false;
1875*38fd1498Szrj 
1876*38fd1498Szrj 		  /* We know the fall-thru edge crosses; if the cond
1877*38fd1498Szrj 		     jump edge does NOT cross, and its destination is the
1878*38fd1498Szrj 		     next block in the bb order, invert the jump
1879*38fd1498Szrj 		     (i.e. fix it so the fall through does not cross and
1880*38fd1498Szrj 		     the cond jump does).  */
1881*38fd1498Szrj 
1882*38fd1498Szrj 		  if (!cond_jump_crosses)
1883*38fd1498Szrj 		    {
1884*38fd1498Szrj 		      /* Find label in fall_thru block. We've already added
1885*38fd1498Szrj 			 any missing labels, so there must be one.  */
1886*38fd1498Szrj 
1887*38fd1498Szrj 		      rtx_code_label *fall_thru_label
1888*38fd1498Szrj 			= block_label (fall_thru->dest);
1889*38fd1498Szrj 
1890*38fd1498Szrj 		      if (old_jump && fall_thru_label)
1891*38fd1498Szrj 			{
1892*38fd1498Szrj 			  rtx_jump_insn *old_jump_insn
1893*38fd1498Szrj 			    = dyn_cast <rtx_jump_insn *> (old_jump);
1894*38fd1498Szrj 			  if (old_jump_insn)
1895*38fd1498Szrj 			    invert_worked = invert_jump (old_jump_insn,
1896*38fd1498Szrj 							 fall_thru_label, 0);
1897*38fd1498Szrj 			}
1898*38fd1498Szrj 
1899*38fd1498Szrj 		      if (invert_worked)
1900*38fd1498Szrj 			{
1901*38fd1498Szrj 			  fall_thru->flags &= ~EDGE_FALLTHRU;
1902*38fd1498Szrj 			  cond_jump->flags |= EDGE_FALLTHRU;
1903*38fd1498Szrj 			  update_br_prob_note (cur_bb);
1904*38fd1498Szrj 			  std::swap (fall_thru, cond_jump);
1905*38fd1498Szrj 			  cond_jump->flags |= EDGE_CROSSING;
1906*38fd1498Szrj 			  fall_thru->flags &= ~EDGE_CROSSING;
1907*38fd1498Szrj 			}
1908*38fd1498Szrj 		    }
1909*38fd1498Szrj 		}
1910*38fd1498Szrj 
1911*38fd1498Szrj 	      if (cond_jump_crosses || !invert_worked)
1912*38fd1498Szrj 		{
1913*38fd1498Szrj 		  /* This is the case where both edges out of the basic
1914*38fd1498Szrj 		     block are crossing edges. Here we will fix up the
1915*38fd1498Szrj 		     fall through edge. The jump edge will be taken care
1916*38fd1498Szrj 		     of later.  The EDGE_CROSSING flag of fall_thru edge
1917*38fd1498Szrj 		     is unset before the call to force_nonfallthru
1918*38fd1498Szrj 		     function because if a new basic-block is created
1919*38fd1498Szrj 		     this edge remains in the current section boundary
1920*38fd1498Szrj 		     while the edge between new_bb and the fall_thru->dest
1921*38fd1498Szrj 		     becomes EDGE_CROSSING.  */
1922*38fd1498Szrj 
1923*38fd1498Szrj 		  fall_thru->flags &= ~EDGE_CROSSING;
1924*38fd1498Szrj 		  basic_block new_bb = force_nonfallthru (fall_thru);
1925*38fd1498Szrj 
1926*38fd1498Szrj 		  if (new_bb)
1927*38fd1498Szrj 		    {
1928*38fd1498Szrj 		      new_bb->aux = cur_bb->aux;
1929*38fd1498Szrj 		      cur_bb->aux = new_bb;
1930*38fd1498Szrj 
1931*38fd1498Szrj                       /* This is done by force_nonfallthru_and_redirect.  */
1932*38fd1498Szrj 		      gcc_assert (BB_PARTITION (new_bb)
1933*38fd1498Szrj                                   == BB_PARTITION (cur_bb));
1934*38fd1498Szrj 
1935*38fd1498Szrj 		      single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
1936*38fd1498Szrj 		    }
1937*38fd1498Szrj 		  else
1938*38fd1498Szrj 		    {
1939*38fd1498Szrj 		      /* If a new basic-block was not created; restore
1940*38fd1498Szrj 			 the EDGE_CROSSING flag.  */
1941*38fd1498Szrj 		      fall_thru->flags |= EDGE_CROSSING;
1942*38fd1498Szrj 		    }
1943*38fd1498Szrj 
1944*38fd1498Szrj 		  /* Add barrier after new jump */
1945*38fd1498Szrj 		  emit_barrier_after_bb (new_bb ? new_bb : cur_bb);
1946*38fd1498Szrj 		}
1947*38fd1498Szrj 	    }
1948*38fd1498Szrj 	}
1949*38fd1498Szrj     }
1950*38fd1498Szrj }
1951*38fd1498Szrj 
1952*38fd1498Szrj /* This function checks the destination block of a "crossing jump" to
1953*38fd1498Szrj    see if it has any crossing predecessors that begin with a code label
1954*38fd1498Szrj    and end with an unconditional jump.  If so, it returns that predecessor
1955*38fd1498Szrj    block.  (This is to avoid creating lots of new basic blocks that all
1956*38fd1498Szrj    contain unconditional jumps to the same destination).  */
1957*38fd1498Szrj 
1958*38fd1498Szrj static basic_block
1959*38fd1498Szrj find_jump_block (basic_block jump_dest)
1960*38fd1498Szrj {
1961*38fd1498Szrj   basic_block source_bb = NULL;
1962*38fd1498Szrj   edge e;
1963*38fd1498Szrj   rtx_insn *insn;
1964*38fd1498Szrj   edge_iterator ei;
1965*38fd1498Szrj 
1966*38fd1498Szrj   FOR_EACH_EDGE (e, ei, jump_dest->preds)
1967*38fd1498Szrj     if (e->flags & EDGE_CROSSING)
1968*38fd1498Szrj       {
1969*38fd1498Szrj 	basic_block src = e->src;
1970*38fd1498Szrj 
1971*38fd1498Szrj 	/* Check each predecessor to see if it has a label, and contains
1972*38fd1498Szrj 	   only one executable instruction, which is an unconditional jump.
1973*38fd1498Szrj 	   If so, we can use it.  */
1974*38fd1498Szrj 
1975*38fd1498Szrj 	if (LABEL_P (BB_HEAD (src)))
1976*38fd1498Szrj 	  for (insn = BB_HEAD (src);
1977*38fd1498Szrj 	       !INSN_P (insn) && insn != NEXT_INSN (BB_END (src));
1978*38fd1498Szrj 	       insn = NEXT_INSN (insn))
1979*38fd1498Szrj 	    {
1980*38fd1498Szrj 	      if (INSN_P (insn)
1981*38fd1498Szrj 		  && insn == BB_END (src)
1982*38fd1498Szrj 		  && JUMP_P (insn)
1983*38fd1498Szrj 		  && !any_condjump_p (insn))
1984*38fd1498Szrj 		{
1985*38fd1498Szrj 		  source_bb = src;
1986*38fd1498Szrj 		  break;
1987*38fd1498Szrj 		}
1988*38fd1498Szrj 	    }
1989*38fd1498Szrj 
1990*38fd1498Szrj 	if (source_bb)
1991*38fd1498Szrj 	  break;
1992*38fd1498Szrj       }
1993*38fd1498Szrj 
1994*38fd1498Szrj   return source_bb;
1995*38fd1498Szrj }
1996*38fd1498Szrj 
1997*38fd1498Szrj /* Find all BB's with conditional jumps that are crossing edges;
1998*38fd1498Szrj    insert a new bb and make the conditional jump branch to the new
1999*38fd1498Szrj    bb instead (make the new bb same color so conditional branch won't
2000*38fd1498Szrj    be a 'crossing' edge).  Insert an unconditional jump from the
2001*38fd1498Szrj    new bb to the original destination of the conditional jump.  */
2002*38fd1498Szrj 
2003*38fd1498Szrj static void
2004*38fd1498Szrj fix_crossing_conditional_branches (void)
2005*38fd1498Szrj {
2006*38fd1498Szrj   basic_block cur_bb;
2007*38fd1498Szrj   basic_block new_bb;
2008*38fd1498Szrj   basic_block dest;
2009*38fd1498Szrj   edge succ1;
2010*38fd1498Szrj   edge succ2;
2011*38fd1498Szrj   edge crossing_edge;
2012*38fd1498Szrj   edge new_edge;
2013*38fd1498Szrj   rtx set_src;
2014*38fd1498Szrj   rtx old_label = NULL_RTX;
2015*38fd1498Szrj   rtx_code_label *new_label;
2016*38fd1498Szrj 
2017*38fd1498Szrj   FOR_EACH_BB_FN (cur_bb, cfun)
2018*38fd1498Szrj     {
2019*38fd1498Szrj       crossing_edge = NULL;
2020*38fd1498Szrj       if (EDGE_COUNT (cur_bb->succs) > 0)
2021*38fd1498Szrj 	succ1 = EDGE_SUCC (cur_bb, 0);
2022*38fd1498Szrj       else
2023*38fd1498Szrj 	succ1 = NULL;
2024*38fd1498Szrj 
2025*38fd1498Szrj       if (EDGE_COUNT (cur_bb->succs) > 1)
2026*38fd1498Szrj 	succ2 = EDGE_SUCC (cur_bb, 1);
2027*38fd1498Szrj       else
2028*38fd1498Szrj 	succ2 = NULL;
2029*38fd1498Szrj 
2030*38fd1498Szrj       /* We already took care of fall-through edges, so only one successor
2031*38fd1498Szrj 	 can be a crossing edge.  */
2032*38fd1498Szrj 
2033*38fd1498Szrj       if (succ1 && (succ1->flags & EDGE_CROSSING))
2034*38fd1498Szrj 	crossing_edge = succ1;
2035*38fd1498Szrj       else if (succ2 && (succ2->flags & EDGE_CROSSING))
2036*38fd1498Szrj 	crossing_edge = succ2;
2037*38fd1498Szrj 
2038*38fd1498Szrj       if (crossing_edge)
2039*38fd1498Szrj 	{
2040*38fd1498Szrj 	  rtx_insn *old_jump = BB_END (cur_bb);
2041*38fd1498Szrj 
2042*38fd1498Szrj 	  /* Check to make sure the jump instruction is a
2043*38fd1498Szrj 	     conditional jump.  */
2044*38fd1498Szrj 
2045*38fd1498Szrj 	  set_src = NULL_RTX;
2046*38fd1498Szrj 
2047*38fd1498Szrj 	  if (any_condjump_p (old_jump))
2048*38fd1498Szrj 	    {
2049*38fd1498Szrj 	      if (GET_CODE (PATTERN (old_jump)) == SET)
2050*38fd1498Szrj 		set_src = SET_SRC (PATTERN (old_jump));
2051*38fd1498Szrj 	      else if (GET_CODE (PATTERN (old_jump)) == PARALLEL)
2052*38fd1498Szrj 		{
2053*38fd1498Szrj 		  set_src = XVECEXP (PATTERN (old_jump), 0,0);
2054*38fd1498Szrj 		  if (GET_CODE (set_src) == SET)
2055*38fd1498Szrj 		    set_src = SET_SRC (set_src);
2056*38fd1498Szrj 		  else
2057*38fd1498Szrj 		    set_src = NULL_RTX;
2058*38fd1498Szrj 		}
2059*38fd1498Szrj 	    }
2060*38fd1498Szrj 
2061*38fd1498Szrj 	  if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE))
2062*38fd1498Szrj 	    {
2063*38fd1498Szrj 	      rtx_jump_insn *old_jump_insn =
2064*38fd1498Szrj 			as_a <rtx_jump_insn *> (old_jump);
2065*38fd1498Szrj 
2066*38fd1498Szrj 	      if (GET_CODE (XEXP (set_src, 1)) == PC)
2067*38fd1498Szrj 		old_label = XEXP (set_src, 2);
2068*38fd1498Szrj 	      else if (GET_CODE (XEXP (set_src, 2)) == PC)
2069*38fd1498Szrj 		old_label = XEXP (set_src, 1);
2070*38fd1498Szrj 
2071*38fd1498Szrj 	      /* Check to see if new bb for jumping to that dest has
2072*38fd1498Szrj 		 already been created; if so, use it; if not, create
2073*38fd1498Szrj 		 a new one.  */
2074*38fd1498Szrj 
2075*38fd1498Szrj 	      new_bb = find_jump_block (crossing_edge->dest);
2076*38fd1498Szrj 
2077*38fd1498Szrj 	      if (new_bb)
2078*38fd1498Szrj 		new_label = block_label (new_bb);
2079*38fd1498Szrj 	      else
2080*38fd1498Szrj 		{
2081*38fd1498Szrj 		  basic_block last_bb;
2082*38fd1498Szrj 		  rtx_code_label *old_jump_target;
2083*38fd1498Szrj 		  rtx_jump_insn *new_jump;
2084*38fd1498Szrj 
2085*38fd1498Szrj 		  /* Create new basic block to be dest for
2086*38fd1498Szrj 		     conditional jump.  */
2087*38fd1498Szrj 
2088*38fd1498Szrj 		  /* Put appropriate instructions in new bb.  */
2089*38fd1498Szrj 
2090*38fd1498Szrj 		  new_label = gen_label_rtx ();
2091*38fd1498Szrj 		  emit_label (new_label);
2092*38fd1498Szrj 
2093*38fd1498Szrj 		  gcc_assert (GET_CODE (old_label) == LABEL_REF);
2094*38fd1498Szrj 		  old_jump_target = old_jump_insn->jump_target ();
2095*38fd1498Szrj 		  new_jump = as_a <rtx_jump_insn *>
2096*38fd1498Szrj 		    (emit_jump_insn (targetm.gen_jump (old_jump_target)));
2097*38fd1498Szrj 		  new_jump->set_jump_target (old_jump_target);
2098*38fd1498Szrj 
2099*38fd1498Szrj 		  last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
2100*38fd1498Szrj 		  new_bb = create_basic_block (new_label, new_jump, last_bb);
2101*38fd1498Szrj 		  new_bb->aux = last_bb->aux;
2102*38fd1498Szrj 		  last_bb->aux = new_bb;
2103*38fd1498Szrj 
2104*38fd1498Szrj 		  emit_barrier_after_bb (new_bb);
2105*38fd1498Szrj 
2106*38fd1498Szrj 		  /* Make sure new bb is in same partition as source
2107*38fd1498Szrj 		     of conditional branch.  */
2108*38fd1498Szrj 		  BB_COPY_PARTITION (new_bb, cur_bb);
2109*38fd1498Szrj 		}
2110*38fd1498Szrj 
2111*38fd1498Szrj 	      /* Make old jump branch to new bb.  */
2112*38fd1498Szrj 
2113*38fd1498Szrj 	      redirect_jump (old_jump_insn, new_label, 0);
2114*38fd1498Szrj 
2115*38fd1498Szrj 	      /* Remove crossing_edge as predecessor of 'dest'.  */
2116*38fd1498Szrj 
2117*38fd1498Szrj 	      dest = crossing_edge->dest;
2118*38fd1498Szrj 
2119*38fd1498Szrj 	      redirect_edge_succ (crossing_edge, new_bb);
2120*38fd1498Szrj 
2121*38fd1498Szrj 	      /* Make a new edge from new_bb to old dest; new edge
2122*38fd1498Szrj 		 will be a successor for new_bb and a predecessor
2123*38fd1498Szrj 		 for 'dest'.  */
2124*38fd1498Szrj 
2125*38fd1498Szrj 	      if (EDGE_COUNT (new_bb->succs) == 0)
2126*38fd1498Szrj 		new_edge = make_single_succ_edge (new_bb, dest, 0);
2127*38fd1498Szrj 	      else
2128*38fd1498Szrj 		new_edge = EDGE_SUCC (new_bb, 0);
2129*38fd1498Szrj 
2130*38fd1498Szrj 	      crossing_edge->flags &= ~EDGE_CROSSING;
2131*38fd1498Szrj 	      new_edge->flags |= EDGE_CROSSING;
2132*38fd1498Szrj 	    }
2133*38fd1498Szrj 	}
2134*38fd1498Szrj     }
2135*38fd1498Szrj }
2136*38fd1498Szrj 
2137*38fd1498Szrj /* Find any unconditional branches that cross between hot and cold
2138*38fd1498Szrj    sections.  Convert them into indirect jumps instead.  */
2139*38fd1498Szrj 
2140*38fd1498Szrj static void
2141*38fd1498Szrj fix_crossing_unconditional_branches (void)
2142*38fd1498Szrj {
2143*38fd1498Szrj   basic_block cur_bb;
2144*38fd1498Szrj   rtx_insn *last_insn;
2145*38fd1498Szrj   rtx label;
2146*38fd1498Szrj   rtx label_addr;
2147*38fd1498Szrj   rtx_insn *indirect_jump_sequence;
2148*38fd1498Szrj   rtx_insn *jump_insn = NULL;
2149*38fd1498Szrj   rtx new_reg;
2150*38fd1498Szrj   rtx_insn *cur_insn;
2151*38fd1498Szrj   edge succ;
2152*38fd1498Szrj 
2153*38fd1498Szrj   FOR_EACH_BB_FN (cur_bb, cfun)
2154*38fd1498Szrj     {
2155*38fd1498Szrj       last_insn = BB_END (cur_bb);
2156*38fd1498Szrj 
2157*38fd1498Szrj       if (EDGE_COUNT (cur_bb->succs) < 1)
2158*38fd1498Szrj 	continue;
2159*38fd1498Szrj 
2160*38fd1498Szrj       succ = EDGE_SUCC (cur_bb, 0);
2161*38fd1498Szrj 
2162*38fd1498Szrj       /* Check to see if bb ends in a crossing (unconditional) jump.  At
2163*38fd1498Szrj 	 this point, no crossing jumps should be conditional.  */
2164*38fd1498Szrj 
2165*38fd1498Szrj       if (JUMP_P (last_insn)
2166*38fd1498Szrj 	  && (succ->flags & EDGE_CROSSING))
2167*38fd1498Szrj 	{
2168*38fd1498Szrj 	  gcc_assert (!any_condjump_p (last_insn));
2169*38fd1498Szrj 
2170*38fd1498Szrj 	  /* Make sure the jump is not already an indirect or table jump.  */
2171*38fd1498Szrj 
2172*38fd1498Szrj 	  if (!computed_jump_p (last_insn)
2173*38fd1498Szrj 	      && !tablejump_p (last_insn, NULL, NULL))
2174*38fd1498Szrj 	    {
2175*38fd1498Szrj 	      /* We have found a "crossing" unconditional branch.  Now
2176*38fd1498Szrj 		 we must convert it to an indirect jump.  First create
2177*38fd1498Szrj 		 reference of label, as target for jump.  */
2178*38fd1498Szrj 
2179*38fd1498Szrj 	      label = JUMP_LABEL (last_insn);
2180*38fd1498Szrj 	      label_addr = gen_rtx_LABEL_REF (Pmode, label);
2181*38fd1498Szrj 	      LABEL_NUSES (label) += 1;
2182*38fd1498Szrj 
2183*38fd1498Szrj 	      /* Get a register to use for the indirect jump.  */
2184*38fd1498Szrj 
2185*38fd1498Szrj 	      new_reg = gen_reg_rtx (Pmode);
2186*38fd1498Szrj 
2187*38fd1498Szrj 	      /* Generate indirect the jump sequence.  */
2188*38fd1498Szrj 
2189*38fd1498Szrj 	      start_sequence ();
2190*38fd1498Szrj 	      emit_move_insn (new_reg, label_addr);
2191*38fd1498Szrj 	      emit_indirect_jump (new_reg);
2192*38fd1498Szrj 	      indirect_jump_sequence = get_insns ();
2193*38fd1498Szrj 	      end_sequence ();
2194*38fd1498Szrj 
2195*38fd1498Szrj 	      /* Make sure every instruction in the new jump sequence has
2196*38fd1498Szrj 		 its basic block set to be cur_bb.  */
2197*38fd1498Szrj 
2198*38fd1498Szrj 	      for (cur_insn = indirect_jump_sequence; cur_insn;
2199*38fd1498Szrj 		   cur_insn = NEXT_INSN (cur_insn))
2200*38fd1498Szrj 		{
2201*38fd1498Szrj 		  if (!BARRIER_P (cur_insn))
2202*38fd1498Szrj 		    BLOCK_FOR_INSN (cur_insn) = cur_bb;
2203*38fd1498Szrj 		  if (JUMP_P (cur_insn))
2204*38fd1498Szrj 		    jump_insn = cur_insn;
2205*38fd1498Szrj 		}
2206*38fd1498Szrj 
2207*38fd1498Szrj 	      /* Insert the new (indirect) jump sequence immediately before
2208*38fd1498Szrj 		 the unconditional jump, then delete the unconditional jump.  */
2209*38fd1498Szrj 
2210*38fd1498Szrj 	      emit_insn_before (indirect_jump_sequence, last_insn);
2211*38fd1498Szrj 	      delete_insn (last_insn);
2212*38fd1498Szrj 
2213*38fd1498Szrj 	      JUMP_LABEL (jump_insn) = label;
2214*38fd1498Szrj 	      LABEL_NUSES (label)++;
2215*38fd1498Szrj 
2216*38fd1498Szrj 	      /* Make BB_END for cur_bb be the jump instruction (NOT the
2217*38fd1498Szrj 		 barrier instruction at the end of the sequence...).  */
2218*38fd1498Szrj 
2219*38fd1498Szrj 	      BB_END (cur_bb) = jump_insn;
2220*38fd1498Szrj 	    }
2221*38fd1498Szrj 	}
2222*38fd1498Szrj     }
2223*38fd1498Szrj }
2224*38fd1498Szrj 
2225*38fd1498Szrj /* Update CROSSING_JUMP_P flags on all jump insns.  */
2226*38fd1498Szrj 
2227*38fd1498Szrj static void
2228*38fd1498Szrj update_crossing_jump_flags (void)
2229*38fd1498Szrj {
2230*38fd1498Szrj   basic_block bb;
2231*38fd1498Szrj   edge e;
2232*38fd1498Szrj   edge_iterator ei;
2233*38fd1498Szrj 
2234*38fd1498Szrj   FOR_EACH_BB_FN (bb, cfun)
2235*38fd1498Szrj     FOR_EACH_EDGE (e, ei, bb->succs)
2236*38fd1498Szrj       if (e->flags & EDGE_CROSSING)
2237*38fd1498Szrj 	{
2238*38fd1498Szrj 	  if (JUMP_P (BB_END (bb)))
2239*38fd1498Szrj 	    CROSSING_JUMP_P (BB_END (bb)) = 1;
2240*38fd1498Szrj 	  break;
2241*38fd1498Szrj 	}
2242*38fd1498Szrj }
2243*38fd1498Szrj 
2244*38fd1498Szrj /* Reorder basic blocks using the software trace cache (STC) algorithm.  */
2245*38fd1498Szrj 
2246*38fd1498Szrj static void
2247*38fd1498Szrj reorder_basic_blocks_software_trace_cache (void)
2248*38fd1498Szrj {
2249*38fd1498Szrj   if (dump_file)
2250*38fd1498Szrj     fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
2251*38fd1498Szrj 
2252*38fd1498Szrj   int n_traces;
2253*38fd1498Szrj   int i;
2254*38fd1498Szrj   struct trace *traces;
2255*38fd1498Szrj 
2256*38fd1498Szrj   /* We are estimating the length of uncond jump insn only once since the code
2257*38fd1498Szrj      for getting the insn length always returns the minimal length now.  */
2258*38fd1498Szrj   if (uncond_jump_length == 0)
2259*38fd1498Szrj     uncond_jump_length = get_uncond_jump_length ();
2260*38fd1498Szrj 
2261*38fd1498Szrj   /* We need to know some information for each basic block.  */
2262*38fd1498Szrj   array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun));
2263*38fd1498Szrj   bbd = XNEWVEC (bbro_basic_block_data, array_size);
2264*38fd1498Szrj   for (i = 0; i < array_size; i++)
2265*38fd1498Szrj     {
2266*38fd1498Szrj       bbd[i].start_of_trace = -1;
2267*38fd1498Szrj       bbd[i].end_of_trace = -1;
2268*38fd1498Szrj       bbd[i].in_trace = -1;
2269*38fd1498Szrj       bbd[i].visited = 0;
2270*38fd1498Szrj       bbd[i].priority = -1;
2271*38fd1498Szrj       bbd[i].heap = NULL;
2272*38fd1498Szrj       bbd[i].node = NULL;
2273*38fd1498Szrj     }
2274*38fd1498Szrj 
2275*38fd1498Szrj   traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun));
2276*38fd1498Szrj   n_traces = 0;
2277*38fd1498Szrj   find_traces (&n_traces, traces);
2278*38fd1498Szrj   connect_traces (n_traces, traces);
2279*38fd1498Szrj   FREE (traces);
2280*38fd1498Szrj   FREE (bbd);
2281*38fd1498Szrj }
2282*38fd1498Szrj 
2283*38fd1498Szrj /* Return true if edge E1 is more desirable as a fallthrough edge than
2284*38fd1498Szrj    edge E2 is.  */
2285*38fd1498Szrj 
2286*38fd1498Szrj static bool
2287*38fd1498Szrj edge_order (edge e1, edge e2)
2288*38fd1498Szrj {
2289*38fd1498Szrj   return e1->count () > e2->count ();
2290*38fd1498Szrj }
2291*38fd1498Szrj 
2292*38fd1498Szrj /* Reorder basic blocks using the "simple" algorithm.  This tries to
2293*38fd1498Szrj    maximize the dynamic number of branches that are fallthrough, without
2294*38fd1498Szrj    copying instructions.  The algorithm is greedy, looking at the most
2295*38fd1498Szrj    frequently executed branch first.  */
2296*38fd1498Szrj 
2297*38fd1498Szrj static void
2298*38fd1498Szrj reorder_basic_blocks_simple (void)
2299*38fd1498Szrj {
2300*38fd1498Szrj   if (dump_file)
2301*38fd1498Szrj     fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
2302*38fd1498Szrj 
2303*38fd1498Szrj   edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
2304*38fd1498Szrj 
2305*38fd1498Szrj   /* First, collect all edges that can be optimized by reordering blocks:
2306*38fd1498Szrj      simple jumps and conditional jumps, as well as the function entry edge.  */
2307*38fd1498Szrj 
2308*38fd1498Szrj   int n = 0;
2309*38fd1498Szrj   edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
2310*38fd1498Szrj 
2311*38fd1498Szrj   basic_block bb;
2312*38fd1498Szrj   FOR_EACH_BB_FN (bb, cfun)
2313*38fd1498Szrj     {
2314*38fd1498Szrj       rtx_insn *end = BB_END (bb);
2315*38fd1498Szrj 
2316*38fd1498Szrj       if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
2317*38fd1498Szrj 	continue;
2318*38fd1498Szrj 
2319*38fd1498Szrj       /* We cannot optimize asm goto.  */
2320*38fd1498Szrj       if (JUMP_P (end) && extract_asm_operands (end))
2321*38fd1498Szrj 	continue;
2322*38fd1498Szrj 
2323*38fd1498Szrj       if (single_succ_p (bb))
2324*38fd1498Szrj 	edges[n++] = EDGE_SUCC (bb, 0);
2325*38fd1498Szrj       else if (any_condjump_p (end))
2326*38fd1498Szrj 	{
2327*38fd1498Szrj 	  edge e0 = EDGE_SUCC (bb, 0);
2328*38fd1498Szrj 	  edge e1 = EDGE_SUCC (bb, 1);
2329*38fd1498Szrj 	  /* When optimizing for size it is best to keep the original
2330*38fd1498Szrj 	     fallthrough edges.  */
2331*38fd1498Szrj 	  if (e1->flags & EDGE_FALLTHRU)
2332*38fd1498Szrj 	    std::swap (e0, e1);
2333*38fd1498Szrj 	  edges[n++] = e0;
2334*38fd1498Szrj 	  edges[n++] = e1;
2335*38fd1498Szrj 	}
2336*38fd1498Szrj     }
2337*38fd1498Szrj 
2338*38fd1498Szrj   /* Sort the edges, the most desirable first.  When optimizing for size
2339*38fd1498Szrj      all edges are equally desirable.  */
2340*38fd1498Szrj 
2341*38fd1498Szrj   if (optimize_function_for_speed_p (cfun))
2342*38fd1498Szrj     std::stable_sort (edges, edges + n, edge_order);
2343*38fd1498Szrj 
2344*38fd1498Szrj   /* Now decide which of those edges to make fallthrough edges.  We set
2345*38fd1498Szrj      BB_VISITED if a block already has a fallthrough successor assigned
2346*38fd1498Szrj      to it.  We make ->AUX of an endpoint point to the opposite endpoint
2347*38fd1498Szrj      of a sequence of blocks that fall through, and ->AUX will be NULL
2348*38fd1498Szrj      for a block that is in such a sequence but not an endpoint anymore.
2349*38fd1498Szrj 
2350*38fd1498Szrj      To start with, everything points to itself, nothing is assigned yet.  */
2351*38fd1498Szrj 
2352*38fd1498Szrj   FOR_ALL_BB_FN (bb, cfun)
2353*38fd1498Szrj     {
2354*38fd1498Szrj       bb->aux = bb;
2355*38fd1498Szrj       bb->flags &= ~BB_VISITED;
2356*38fd1498Szrj     }
2357*38fd1498Szrj 
2358*38fd1498Szrj   EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
2359*38fd1498Szrj 
2360*38fd1498Szrj   /* Now for all edges, the most desirable first, see if that edge can
2361*38fd1498Szrj      connect two sequences.  If it can, update AUX and BB_VISITED; if it
2362*38fd1498Szrj      cannot, zero out the edge in the table.  */
2363*38fd1498Szrj 
2364*38fd1498Szrj   for (int j = 0; j < n; j++)
2365*38fd1498Szrj     {
2366*38fd1498Szrj       edge e = edges[j];
2367*38fd1498Szrj 
2368*38fd1498Szrj       basic_block tail_a = e->src;
2369*38fd1498Szrj       basic_block head_b = e->dest;
2370*38fd1498Szrj       basic_block head_a = (basic_block) tail_a->aux;
2371*38fd1498Szrj       basic_block tail_b = (basic_block) head_b->aux;
2372*38fd1498Szrj 
2373*38fd1498Szrj       /* An edge cannot connect two sequences if:
2374*38fd1498Szrj 	 - it crosses partitions;
2375*38fd1498Szrj 	 - its src is not a current endpoint;
2376*38fd1498Szrj 	 - its dest is not a current endpoint;
2377*38fd1498Szrj 	 - or, it would create a loop.  */
2378*38fd1498Szrj 
2379*38fd1498Szrj       if (e->flags & EDGE_CROSSING
2380*38fd1498Szrj 	  || tail_a->flags & BB_VISITED
2381*38fd1498Szrj 	  || !tail_b
2382*38fd1498Szrj 	  || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
2383*38fd1498Szrj 	  || tail_a == tail_b)
2384*38fd1498Szrj 	{
2385*38fd1498Szrj 	  edges[j] = 0;
2386*38fd1498Szrj 	  continue;
2387*38fd1498Szrj 	}
2388*38fd1498Szrj 
2389*38fd1498Szrj       tail_a->aux = 0;
2390*38fd1498Szrj       head_b->aux = 0;
2391*38fd1498Szrj       head_a->aux = tail_b;
2392*38fd1498Szrj       tail_b->aux = head_a;
2393*38fd1498Szrj       tail_a->flags |= BB_VISITED;
2394*38fd1498Szrj     }
2395*38fd1498Szrj 
2396*38fd1498Szrj   /* Put the pieces together, in the same order that the start blocks of
2397*38fd1498Szrj      the sequences already had.  The hot/cold partitioning gives a little
2398*38fd1498Szrj      complication: as a first pass only do this for blocks in the same
2399*38fd1498Szrj      partition as the start block, and (if there is anything left to do)
2400*38fd1498Szrj      in a second pass handle the other partition.  */
2401*38fd1498Szrj 
2402*38fd1498Szrj   basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2403*38fd1498Szrj 
2404*38fd1498Szrj   int current_partition
2405*38fd1498Szrj     = BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun)
2406*38fd1498Szrj 		    ? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0)->dest
2407*38fd1498Szrj 		    : last_tail);
2408*38fd1498Szrj   bool need_another_pass = true;
2409*38fd1498Szrj 
2410*38fd1498Szrj   for (int pass = 0; pass < 2 && need_another_pass; pass++)
2411*38fd1498Szrj     {
2412*38fd1498Szrj       need_another_pass = false;
2413*38fd1498Szrj 
2414*38fd1498Szrj       FOR_EACH_BB_FN (bb, cfun)
2415*38fd1498Szrj 	if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
2416*38fd1498Szrj 	  {
2417*38fd1498Szrj 	    if (BB_PARTITION (bb) != current_partition)
2418*38fd1498Szrj 	      {
2419*38fd1498Szrj 		need_another_pass = true;
2420*38fd1498Szrj 		continue;
2421*38fd1498Szrj 	      }
2422*38fd1498Szrj 
2423*38fd1498Szrj 	    last_tail->aux = bb;
2424*38fd1498Szrj 	    last_tail = (basic_block) bb->aux;
2425*38fd1498Szrj 	  }
2426*38fd1498Szrj 
2427*38fd1498Szrj       current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
2428*38fd1498Szrj     }
2429*38fd1498Szrj 
2430*38fd1498Szrj   last_tail->aux = 0;
2431*38fd1498Szrj 
2432*38fd1498Szrj   /* Finally, link all the chosen fallthrough edges.  */
2433*38fd1498Szrj 
2434*38fd1498Szrj   for (int j = 0; j < n; j++)
2435*38fd1498Szrj     if (edges[j])
2436*38fd1498Szrj       edges[j]->src->aux = edges[j]->dest;
2437*38fd1498Szrj 
2438*38fd1498Szrj   delete[] edges;
2439*38fd1498Szrj 
2440*38fd1498Szrj   /* If the entry edge no longer falls through we have to make a new
2441*38fd1498Szrj      block so it can do so again.  */
2442*38fd1498Szrj 
2443*38fd1498Szrj   edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
2444*38fd1498Szrj   if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
2445*38fd1498Szrj     {
2446*38fd1498Szrj       force_nonfallthru (e);
2447*38fd1498Szrj       e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2448*38fd1498Szrj     }
2449*38fd1498Szrj }
2450*38fd1498Szrj 
2451*38fd1498Szrj /* Reorder basic blocks.  The main entry point to this file.  */
2452*38fd1498Szrj 
2453*38fd1498Szrj static void
2454*38fd1498Szrj reorder_basic_blocks (void)
2455*38fd1498Szrj {
2456*38fd1498Szrj   gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
2457*38fd1498Szrj 
2458*38fd1498Szrj   if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
2459*38fd1498Szrj     return;
2460*38fd1498Szrj 
2461*38fd1498Szrj   set_edge_can_fallthru_flag ();
2462*38fd1498Szrj   mark_dfs_back_edges ();
2463*38fd1498Szrj 
2464*38fd1498Szrj   switch (flag_reorder_blocks_algorithm)
2465*38fd1498Szrj     {
2466*38fd1498Szrj     case REORDER_BLOCKS_ALGORITHM_SIMPLE:
2467*38fd1498Szrj       reorder_basic_blocks_simple ();
2468*38fd1498Szrj       break;
2469*38fd1498Szrj 
2470*38fd1498Szrj     case REORDER_BLOCKS_ALGORITHM_STC:
2471*38fd1498Szrj       reorder_basic_blocks_software_trace_cache ();
2472*38fd1498Szrj       break;
2473*38fd1498Szrj 
2474*38fd1498Szrj     default:
2475*38fd1498Szrj       gcc_unreachable ();
2476*38fd1498Szrj     }
2477*38fd1498Szrj 
2478*38fd1498Szrj   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
2479*38fd1498Szrj 
2480*38fd1498Szrj   if (dump_file)
2481*38fd1498Szrj     {
2482*38fd1498Szrj       if (dump_flags & TDF_DETAILS)
2483*38fd1498Szrj 	dump_reg_info (dump_file);
2484*38fd1498Szrj       dump_flow_info (dump_file, dump_flags);
2485*38fd1498Szrj     }
2486*38fd1498Szrj 
2487*38fd1498Szrj   /* Signal that rtl_verify_flow_info_1 can now verify that there
2488*38fd1498Szrj      is at most one switch between hot/cold sections.  */
2489*38fd1498Szrj   crtl->bb_reorder_complete = true;
2490*38fd1498Szrj }
2491*38fd1498Szrj 
2492*38fd1498Szrj /* Determine which partition the first basic block in the function
2493*38fd1498Szrj    belongs to, then find the first basic block in the current function
2494*38fd1498Szrj    that belongs to a different section, and insert a
2495*38fd1498Szrj    NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the
2496*38fd1498Szrj    instruction stream.  When writing out the assembly code,
2497*38fd1498Szrj    encountering this note will make the compiler switch between the
2498*38fd1498Szrj    hot and cold text sections.  */
2499*38fd1498Szrj 
2500*38fd1498Szrj void
2501*38fd1498Szrj insert_section_boundary_note (void)
2502*38fd1498Szrj {
2503*38fd1498Szrj   basic_block bb;
2504*38fd1498Szrj   bool switched_sections = false;
2505*38fd1498Szrj   int current_partition = 0;
2506*38fd1498Szrj 
2507*38fd1498Szrj   if (!crtl->has_bb_partition)
2508*38fd1498Szrj     return;
2509*38fd1498Szrj 
2510*38fd1498Szrj   FOR_EACH_BB_FN (bb, cfun)
2511*38fd1498Szrj     {
2512*38fd1498Szrj       if (!current_partition)
2513*38fd1498Szrj 	current_partition = BB_PARTITION (bb);
2514*38fd1498Szrj       if (BB_PARTITION (bb) != current_partition)
2515*38fd1498Szrj 	{
2516*38fd1498Szrj 	  gcc_assert (!switched_sections);
2517*38fd1498Szrj           switched_sections = true;
2518*38fd1498Szrj           emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
2519*38fd1498Szrj           current_partition = BB_PARTITION (bb);
2520*38fd1498Szrj 	}
2521*38fd1498Szrj     }
2522*38fd1498Szrj 
2523*38fd1498Szrj   /* Make sure crtl->has_bb_partition matches reality even if bbpart finds
2524*38fd1498Szrj      some hot and some cold basic blocks, but later one of those kinds is
2525*38fd1498Szrj      optimized away.  */
2526*38fd1498Szrj   crtl->has_bb_partition = switched_sections;
2527*38fd1498Szrj }
2528*38fd1498Szrj 
2529*38fd1498Szrj namespace {
2530*38fd1498Szrj 
2531*38fd1498Szrj const pass_data pass_data_reorder_blocks =
2532*38fd1498Szrj {
2533*38fd1498Szrj   RTL_PASS, /* type */
2534*38fd1498Szrj   "bbro", /* name */
2535*38fd1498Szrj   OPTGROUP_NONE, /* optinfo_flags */
2536*38fd1498Szrj   TV_REORDER_BLOCKS, /* tv_id */
2537*38fd1498Szrj   0, /* properties_required */
2538*38fd1498Szrj   0, /* properties_provided */
2539*38fd1498Szrj   0, /* properties_destroyed */
2540*38fd1498Szrj   0, /* todo_flags_start */
2541*38fd1498Szrj   0, /* todo_flags_finish */
2542*38fd1498Szrj };
2543*38fd1498Szrj 
2544*38fd1498Szrj class pass_reorder_blocks : public rtl_opt_pass
2545*38fd1498Szrj {
2546*38fd1498Szrj public:
2547*38fd1498Szrj   pass_reorder_blocks (gcc::context *ctxt)
2548*38fd1498Szrj     : rtl_opt_pass (pass_data_reorder_blocks, ctxt)
2549*38fd1498Szrj   {}
2550*38fd1498Szrj 
2551*38fd1498Szrj   /* opt_pass methods: */
2552*38fd1498Szrj   virtual bool gate (function *)
2553*38fd1498Szrj     {
2554*38fd1498Szrj       if (targetm.cannot_modify_jumps_p ())
2555*38fd1498Szrj 	return false;
2556*38fd1498Szrj       return (optimize > 0
2557*38fd1498Szrj 	      && (flag_reorder_blocks || flag_reorder_blocks_and_partition));
2558*38fd1498Szrj     }
2559*38fd1498Szrj 
2560*38fd1498Szrj   virtual unsigned int execute (function *);
2561*38fd1498Szrj 
2562*38fd1498Szrj }; // class pass_reorder_blocks
2563*38fd1498Szrj 
2564*38fd1498Szrj unsigned int
2565*38fd1498Szrj pass_reorder_blocks::execute (function *fun)
2566*38fd1498Szrj {
2567*38fd1498Szrj   basic_block bb;
2568*38fd1498Szrj 
2569*38fd1498Szrj   /* Last attempt to optimize CFG, as scheduling, peepholing and insn
2570*38fd1498Szrj      splitting possibly introduced more crossjumping opportunities.  */
2571*38fd1498Szrj   cfg_layout_initialize (CLEANUP_EXPENSIVE);
2572*38fd1498Szrj 
2573*38fd1498Szrj   reorder_basic_blocks ();
2574*38fd1498Szrj   cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING);
2575*38fd1498Szrj 
2576*38fd1498Szrj   FOR_EACH_BB_FN (bb, fun)
2577*38fd1498Szrj     if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
2578*38fd1498Szrj       bb->aux = bb->next_bb;
2579*38fd1498Szrj   cfg_layout_finalize ();
2580*38fd1498Szrj 
2581*38fd1498Szrj   return 0;
2582*38fd1498Szrj }
2583*38fd1498Szrj 
2584*38fd1498Szrj } // anon namespace
2585*38fd1498Szrj 
2586*38fd1498Szrj rtl_opt_pass *
2587*38fd1498Szrj make_pass_reorder_blocks (gcc::context *ctxt)
2588*38fd1498Szrj {
2589*38fd1498Szrj   return new pass_reorder_blocks (ctxt);
2590*38fd1498Szrj }
2591*38fd1498Szrj 
2592*38fd1498Szrj /* Duplicate a block (that we already know ends in a computed jump) into its
2593*38fd1498Szrj    predecessors, where possible.  Return whether anything is changed.  */
2594*38fd1498Szrj static bool
2595*38fd1498Szrj maybe_duplicate_computed_goto (basic_block bb, int max_size)
2596*38fd1498Szrj {
2597*38fd1498Szrj   if (single_pred_p (bb))
2598*38fd1498Szrj     return false;
2599*38fd1498Szrj 
2600*38fd1498Szrj   /* Make sure that the block is small enough.  */
2601*38fd1498Szrj   rtx_insn *insn;
2602*38fd1498Szrj   FOR_BB_INSNS (bb, insn)
2603*38fd1498Szrj     if (INSN_P (insn))
2604*38fd1498Szrj       {
2605*38fd1498Szrj 	max_size -= get_attr_min_length (insn);
2606*38fd1498Szrj 	if (max_size < 0)
2607*38fd1498Szrj 	   return false;
2608*38fd1498Szrj       }
2609*38fd1498Szrj 
2610*38fd1498Szrj   bool changed = false;
2611*38fd1498Szrj   edge e;
2612*38fd1498Szrj   edge_iterator ei;
2613*38fd1498Szrj   for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
2614*38fd1498Szrj     {
2615*38fd1498Szrj       basic_block pred = e->src;
2616*38fd1498Szrj 
2617*38fd1498Szrj       /* Do not duplicate BB into PRED if that is the last predecessor, or if
2618*38fd1498Szrj 	 we cannot merge a copy of BB with PRED.  */
2619*38fd1498Szrj       if (single_pred_p (bb)
2620*38fd1498Szrj 	  || !single_succ_p (pred)
2621*38fd1498Szrj 	  || e->flags & EDGE_COMPLEX
2622*38fd1498Szrj 	  || pred->index < NUM_FIXED_BLOCKS
2623*38fd1498Szrj 	  || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
2624*38fd1498Szrj 	  || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
2625*38fd1498Szrj 	{
2626*38fd1498Szrj 	  ei_next (&ei);
2627*38fd1498Szrj 	  continue;
2628*38fd1498Szrj 	}
2629*38fd1498Szrj 
2630*38fd1498Szrj       if (dump_file)
2631*38fd1498Szrj 	fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
2632*38fd1498Szrj 		 bb->index, e->src->index);
2633*38fd1498Szrj 
2634*38fd1498Szrj       /* Remember if PRED can be duplicated; if so, the copy of BB merged
2635*38fd1498Szrj 	 with PRED can be duplicated as well.  */
2636*38fd1498Szrj       bool can_dup_more = can_duplicate_block_p (pred);
2637*38fd1498Szrj 
2638*38fd1498Szrj       /* Make a copy of BB, merge it into PRED.  */
2639*38fd1498Szrj       basic_block copy = duplicate_block (bb, e, NULL);
2640*38fd1498Szrj       emit_barrier_after_bb (copy);
2641*38fd1498Szrj       reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
2642*38fd1498Szrj       merge_blocks (pred, copy);
2643*38fd1498Szrj 
2644*38fd1498Szrj       changed = true;
2645*38fd1498Szrj 
2646*38fd1498Szrj       /* Try to merge the resulting merged PRED into further predecessors.  */
2647*38fd1498Szrj       if (can_dup_more)
2648*38fd1498Szrj 	maybe_duplicate_computed_goto (pred, max_size);
2649*38fd1498Szrj     }
2650*38fd1498Szrj 
2651*38fd1498Szrj   return changed;
2652*38fd1498Szrj }
2653*38fd1498Szrj 
2654*38fd1498Szrj /* Duplicate the blocks containing computed gotos.  This basically unfactors
2655*38fd1498Szrj    computed gotos that were factored early on in the compilation process to
2656*38fd1498Szrj    speed up edge based data flow.  We used to not unfactor them again, which
2657*38fd1498Szrj    can seriously pessimize code with many computed jumps in the source code,
2658*38fd1498Szrj    such as interpreters.  See e.g. PR15242.  */
2659*38fd1498Szrj static void
2660*38fd1498Szrj duplicate_computed_gotos (function *fun)
2661*38fd1498Szrj {
2662*38fd1498Szrj   /* We are estimating the length of uncond jump insn only once
2663*38fd1498Szrj      since the code for getting the insn length always returns
2664*38fd1498Szrj      the minimal length now.  */
2665*38fd1498Szrj   if (uncond_jump_length == 0)
2666*38fd1498Szrj     uncond_jump_length = get_uncond_jump_length ();
2667*38fd1498Szrj 
2668*38fd1498Szrj   /* Never copy a block larger than this.  */
2669*38fd1498Szrj   int max_size
2670*38fd1498Szrj     = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
2671*38fd1498Szrj 
2672*38fd1498Szrj   bool changed = false;
2673*38fd1498Szrj 
2674*38fd1498Szrj   /* Try to duplicate all blocks that end in a computed jump and that
2675*38fd1498Szrj      can be duplicated at all.  */
2676*38fd1498Szrj   basic_block bb;
2677*38fd1498Szrj   FOR_EACH_BB_FN (bb, fun)
2678*38fd1498Szrj     if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
2679*38fd1498Szrj       changed |= maybe_duplicate_computed_goto (bb, max_size);
2680*38fd1498Szrj 
2681*38fd1498Szrj   /* Duplicating blocks will redirect edges and may cause hot blocks
2682*38fd1498Szrj     previously reached by both hot and cold blocks to become dominated
2683*38fd1498Szrj     only by cold blocks.  */
2684*38fd1498Szrj   if (changed)
2685*38fd1498Szrj     fixup_partitions ();
2686*38fd1498Szrj }
2687*38fd1498Szrj 
2688*38fd1498Szrj namespace {
2689*38fd1498Szrj 
2690*38fd1498Szrj const pass_data pass_data_duplicate_computed_gotos =
2691*38fd1498Szrj {
2692*38fd1498Szrj   RTL_PASS, /* type */
2693*38fd1498Szrj   "compgotos", /* name */
2694*38fd1498Szrj   OPTGROUP_NONE, /* optinfo_flags */
2695*38fd1498Szrj   TV_REORDER_BLOCKS, /* tv_id */
2696*38fd1498Szrj   0, /* properties_required */
2697*38fd1498Szrj   0, /* properties_provided */
2698*38fd1498Szrj   0, /* properties_destroyed */
2699*38fd1498Szrj   0, /* todo_flags_start */
2700*38fd1498Szrj   0, /* todo_flags_finish */
2701*38fd1498Szrj };
2702*38fd1498Szrj 
2703*38fd1498Szrj class pass_duplicate_computed_gotos : public rtl_opt_pass
2704*38fd1498Szrj {
2705*38fd1498Szrj public:
2706*38fd1498Szrj   pass_duplicate_computed_gotos (gcc::context *ctxt)
2707*38fd1498Szrj     : rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt)
2708*38fd1498Szrj   {}
2709*38fd1498Szrj 
2710*38fd1498Szrj   /* opt_pass methods: */
2711*38fd1498Szrj   virtual bool gate (function *);
2712*38fd1498Szrj   virtual unsigned int execute (function *);
2713*38fd1498Szrj 
2714*38fd1498Szrj }; // class pass_duplicate_computed_gotos
2715*38fd1498Szrj 
2716*38fd1498Szrj bool
2717*38fd1498Szrj pass_duplicate_computed_gotos::gate (function *fun)
2718*38fd1498Szrj {
2719*38fd1498Szrj   if (targetm.cannot_modify_jumps_p ())
2720*38fd1498Szrj     return false;
2721*38fd1498Szrj   return (optimize > 0
2722*38fd1498Szrj 	  && flag_expensive_optimizations
2723*38fd1498Szrj 	  && ! optimize_function_for_size_p (fun));
2724*38fd1498Szrj }
2725*38fd1498Szrj 
2726*38fd1498Szrj unsigned int
2727*38fd1498Szrj pass_duplicate_computed_gotos::execute (function *fun)
2728*38fd1498Szrj {
2729*38fd1498Szrj   duplicate_computed_gotos (fun);
2730*38fd1498Szrj 
2731*38fd1498Szrj   return 0;
2732*38fd1498Szrj }
2733*38fd1498Szrj 
2734*38fd1498Szrj } // anon namespace
2735*38fd1498Szrj 
2736*38fd1498Szrj rtl_opt_pass *
2737*38fd1498Szrj make_pass_duplicate_computed_gotos (gcc::context *ctxt)
2738*38fd1498Szrj {
2739*38fd1498Szrj   return new pass_duplicate_computed_gotos (ctxt);
2740*38fd1498Szrj }
2741*38fd1498Szrj 
2742*38fd1498Szrj /* This function is the main 'entrance' for the optimization that
2743*38fd1498Szrj    partitions hot and cold basic blocks into separate sections of the
2744*38fd1498Szrj    .o file (to improve performance and cache locality).  Ideally it
2745*38fd1498Szrj    would be called after all optimizations that rearrange the CFG have
2746*38fd1498Szrj    been called.  However part of this optimization may introduce new
2747*38fd1498Szrj    register usage, so it must be called before register allocation has
2748*38fd1498Szrj    occurred.  This means that this optimization is actually called
2749*38fd1498Szrj    well before the optimization that reorders basic blocks (see
2750*38fd1498Szrj    function above).
2751*38fd1498Szrj 
2752*38fd1498Szrj    This optimization checks the feedback information to determine
2753*38fd1498Szrj    which basic blocks are hot/cold, updates flags on the basic blocks
2754*38fd1498Szrj    to indicate which section they belong in.  This information is
2755*38fd1498Szrj    later used for writing out sections in the .o file.  Because hot
2756*38fd1498Szrj    and cold sections can be arbitrarily large (within the bounds of
2757*38fd1498Szrj    memory), far beyond the size of a single function, it is necessary
2758*38fd1498Szrj    to fix up all edges that cross section boundaries, to make sure the
2759*38fd1498Szrj    instructions used can actually span the required distance.  The
2760*38fd1498Szrj    fixes are described below.
2761*38fd1498Szrj 
2762*38fd1498Szrj    Fall-through edges must be changed into jumps; it is not safe or
2763*38fd1498Szrj    legal to fall through across a section boundary.  Whenever a
2764*38fd1498Szrj    fall-through edge crossing a section boundary is encountered, a new
2765*38fd1498Szrj    basic block is inserted (in the same section as the fall-through
2766*38fd1498Szrj    source), and the fall through edge is redirected to the new basic
2767*38fd1498Szrj    block.  The new basic block contains an unconditional jump to the
2768*38fd1498Szrj    original fall-through target.  (If the unconditional jump is
2769*38fd1498Szrj    insufficient to cross section boundaries, that is dealt with a
2770*38fd1498Szrj    little later, see below).
2771*38fd1498Szrj 
2772*38fd1498Szrj    In order to deal with architectures that have short conditional
2773*38fd1498Szrj    branches (which cannot span all of memory) we take any conditional
2774*38fd1498Szrj    jump that attempts to cross a section boundary and add a level of
2775*38fd1498Szrj    indirection: it becomes a conditional jump to a new basic block, in
2776*38fd1498Szrj    the same section.  The new basic block contains an unconditional
2777*38fd1498Szrj    jump to the original target, in the other section.
2778*38fd1498Szrj 
2779*38fd1498Szrj    For those architectures whose unconditional branch is also
2780*38fd1498Szrj    incapable of reaching all of memory, those unconditional jumps are
2781*38fd1498Szrj    converted into indirect jumps, through a register.
2782*38fd1498Szrj 
2783*38fd1498Szrj    IMPORTANT NOTE: This optimization causes some messy interactions
2784*38fd1498Szrj    with the cfg cleanup optimizations; those optimizations want to
2785*38fd1498Szrj    merge blocks wherever possible, and to collapse indirect jump
2786*38fd1498Szrj    sequences (change "A jumps to B jumps to C" directly into "A jumps
2787*38fd1498Szrj    to C").  Those optimizations can undo the jump fixes that
2788*38fd1498Szrj    partitioning is required to make (see above), in order to ensure
2789*38fd1498Szrj    that jumps attempting to cross section boundaries are really able
2790*38fd1498Szrj    to cover whatever distance the jump requires (on many architectures
2791*38fd1498Szrj    conditional or unconditional jumps are not able to reach all of
2792*38fd1498Szrj    memory).  Therefore tests have to be inserted into each such
2793*38fd1498Szrj    optimization to make sure that it does not undo stuff necessary to
2794*38fd1498Szrj    cross partition boundaries.  This would be much less of a problem
2795*38fd1498Szrj    if we could perform this optimization later in the compilation, but
2796*38fd1498Szrj    unfortunately the fact that we may need to create indirect jumps
2797*38fd1498Szrj    (through registers) requires that this optimization be performed
2798*38fd1498Szrj    before register allocation.
2799*38fd1498Szrj 
2800*38fd1498Szrj    Hot and cold basic blocks are partitioned and put in separate
2801*38fd1498Szrj    sections of the .o file, to reduce paging and improve cache
2802*38fd1498Szrj    performance (hopefully).  This can result in bits of code from the
2803*38fd1498Szrj    same function being widely separated in the .o file.  However this
2804*38fd1498Szrj    is not obvious to the current bb structure.  Therefore we must take
2805*38fd1498Szrj    care to ensure that: 1). There are no fall_thru edges that cross
2806*38fd1498Szrj    between sections; 2). For those architectures which have "short"
2807*38fd1498Szrj    conditional branches, all conditional branches that attempt to
2808*38fd1498Szrj    cross between sections are converted to unconditional branches;
2809*38fd1498Szrj    and, 3). For those architectures which have "short" unconditional
2810*38fd1498Szrj    branches, all unconditional branches that attempt to cross between
2811*38fd1498Szrj    sections are converted to indirect jumps.
2812*38fd1498Szrj 
2813*38fd1498Szrj    The code for fixing up fall_thru edges that cross between hot and
2814*38fd1498Szrj    cold basic blocks does so by creating new basic blocks containing
2815*38fd1498Szrj    unconditional branches to the appropriate label in the "other"
2816*38fd1498Szrj    section.  The new basic block is then put in the same (hot or cold)
2817*38fd1498Szrj    section as the original conditional branch, and the fall_thru edge
2818*38fd1498Szrj    is modified to fall into the new basic block instead.  By adding
2819*38fd1498Szrj    this level of indirection we end up with only unconditional branches
2820*38fd1498Szrj    crossing between hot and cold sections.
2821*38fd1498Szrj 
2822*38fd1498Szrj    Conditional branches are dealt with by adding a level of indirection.
2823*38fd1498Szrj    A new basic block is added in the same (hot/cold) section as the
2824*38fd1498Szrj    conditional branch, and the conditional branch is retargeted to the
2825*38fd1498Szrj    new basic block.  The new basic block contains an unconditional branch
2826*38fd1498Szrj    to the original target of the conditional branch (in the other section).
2827*38fd1498Szrj 
2828*38fd1498Szrj    Unconditional branches are dealt with by converting them into
2829*38fd1498Szrj    indirect jumps.  */
2830*38fd1498Szrj 
2831*38fd1498Szrj namespace {
2832*38fd1498Szrj 
2833*38fd1498Szrj const pass_data pass_data_partition_blocks =
2834*38fd1498Szrj {
2835*38fd1498Szrj   RTL_PASS, /* type */
2836*38fd1498Szrj   "bbpart", /* name */
2837*38fd1498Szrj   OPTGROUP_NONE, /* optinfo_flags */
2838*38fd1498Szrj   TV_REORDER_BLOCKS, /* tv_id */
2839*38fd1498Szrj   PROP_cfglayout, /* properties_required */
2840*38fd1498Szrj   0, /* properties_provided */
2841*38fd1498Szrj   0, /* properties_destroyed */
2842*38fd1498Szrj   0, /* todo_flags_start */
2843*38fd1498Szrj   0, /* todo_flags_finish */
2844*38fd1498Szrj };
2845*38fd1498Szrj 
2846*38fd1498Szrj class pass_partition_blocks : public rtl_opt_pass
2847*38fd1498Szrj {
2848*38fd1498Szrj public:
2849*38fd1498Szrj   pass_partition_blocks (gcc::context *ctxt)
2850*38fd1498Szrj     : rtl_opt_pass (pass_data_partition_blocks, ctxt)
2851*38fd1498Szrj   {}
2852*38fd1498Szrj 
2853*38fd1498Szrj   /* opt_pass methods: */
2854*38fd1498Szrj   virtual bool gate (function *);
2855*38fd1498Szrj   virtual unsigned int execute (function *);
2856*38fd1498Szrj 
2857*38fd1498Szrj }; // class pass_partition_blocks
2858*38fd1498Szrj 
2859*38fd1498Szrj bool
2860*38fd1498Szrj pass_partition_blocks::gate (function *fun)
2861*38fd1498Szrj {
2862*38fd1498Szrj   /* The optimization to partition hot/cold basic blocks into separate
2863*38fd1498Szrj      sections of the .o file does not work well with linkonce or with
2864*38fd1498Szrj      user defined section attributes.  Don't call it if either case
2865*38fd1498Szrj      arises.  */
2866*38fd1498Szrj   return (flag_reorder_blocks_and_partition
2867*38fd1498Szrj 	  && optimize
2868*38fd1498Szrj 	  /* See pass_reorder_blocks::gate.  We should not partition if
2869*38fd1498Szrj 	     we are going to omit the reordering.  */
2870*38fd1498Szrj 	  && optimize_function_for_speed_p (fun)
2871*38fd1498Szrj 	  && !DECL_COMDAT_GROUP (current_function_decl)
2872*38fd1498Szrj 	  && !lookup_attribute ("section", DECL_ATTRIBUTES (fun->decl))
2873*38fd1498Szrj 	  /* Workaround a bug in GDB where read_partial_die doesn't cope
2874*38fd1498Szrj 	     with DIEs with DW_AT_ranges, see PR81115.  */
2875*38fd1498Szrj 	  && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl))));
2876*38fd1498Szrj }
2877*38fd1498Szrj 
2878*38fd1498Szrj unsigned
2879*38fd1498Szrj pass_partition_blocks::execute (function *fun)
2880*38fd1498Szrj {
2881*38fd1498Szrj   vec<edge> crossing_edges;
2882*38fd1498Szrj 
2883*38fd1498Szrj   if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
2884*38fd1498Szrj     return 0;
2885*38fd1498Szrj 
2886*38fd1498Szrj   df_set_flags (DF_DEFER_INSN_RESCAN);
2887*38fd1498Szrj 
2888*38fd1498Szrj   crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
2889*38fd1498Szrj   if (!crossing_edges.exists ())
2890*38fd1498Szrj     /* Make sure to process deferred rescans and clear changeable df flags.  */
2891*38fd1498Szrj     return TODO_df_finish;
2892*38fd1498Szrj 
2893*38fd1498Szrj   crtl->has_bb_partition = true;
2894*38fd1498Szrj 
2895*38fd1498Szrj   /* Make sure the source of any crossing edge ends in a jump and the
2896*38fd1498Szrj      destination of any crossing edge has a label.  */
2897*38fd1498Szrj   add_labels_and_missing_jumps (crossing_edges);
2898*38fd1498Szrj 
2899*38fd1498Szrj   /* Convert all crossing fall_thru edges to non-crossing fall
2900*38fd1498Szrj      thrus to unconditional jumps (that jump to the original fall
2901*38fd1498Szrj      through dest).  */
2902*38fd1498Szrj   fix_up_fall_thru_edges ();
2903*38fd1498Szrj 
2904*38fd1498Szrj   /* If the architecture does not have conditional branches that can
2905*38fd1498Szrj      span all of memory, convert crossing conditional branches into
2906*38fd1498Szrj      crossing unconditional branches.  */
2907*38fd1498Szrj   if (!HAS_LONG_COND_BRANCH)
2908*38fd1498Szrj     fix_crossing_conditional_branches ();
2909*38fd1498Szrj 
2910*38fd1498Szrj   /* If the architecture does not have unconditional branches that
2911*38fd1498Szrj      can span all of memory, convert crossing unconditional branches
2912*38fd1498Szrj      into indirect jumps.  Since adding an indirect jump also adds
2913*38fd1498Szrj      a new register usage, update the register usage information as
2914*38fd1498Szrj      well.  */
2915*38fd1498Szrj   if (!HAS_LONG_UNCOND_BRANCH)
2916*38fd1498Szrj     fix_crossing_unconditional_branches ();
2917*38fd1498Szrj 
2918*38fd1498Szrj   update_crossing_jump_flags ();
2919*38fd1498Szrj 
2920*38fd1498Szrj   /* Clear bb->aux fields that the above routines were using.  */
2921*38fd1498Szrj   clear_aux_for_blocks ();
2922*38fd1498Szrj 
2923*38fd1498Szrj   crossing_edges.release ();
2924*38fd1498Szrj 
2925*38fd1498Szrj   /* ??? FIXME: DF generates the bb info for a block immediately.
2926*38fd1498Szrj      And by immediately, I mean *during* creation of the block.
2927*38fd1498Szrj 
2928*38fd1498Szrj 	#0  df_bb_refs_collect
2929*38fd1498Szrj 	#1  in df_bb_refs_record
2930*38fd1498Szrj 	#2  in create_basic_block_structure
2931*38fd1498Szrj 
2932*38fd1498Szrj      Which means that the bb_has_eh_pred test in df_bb_refs_collect
2933*38fd1498Szrj      will *always* fail, because no edges can have been added to the
2934*38fd1498Szrj      block yet.  Which of course means we don't add the right
2935*38fd1498Szrj      artificial refs, which means we fail df_verify (much) later.
2936*38fd1498Szrj 
2937*38fd1498Szrj      Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply
2938*38fd1498Szrj      that we also shouldn't grab data from the new blocks those new
2939*38fd1498Szrj      insns are in either.  In this way one can create the block, link
2940*38fd1498Szrj      it up properly, and have everything Just Work later, when deferred
2941*38fd1498Szrj      insns are processed.
2942*38fd1498Szrj 
2943*38fd1498Szrj      In the meantime, we have no other option but to throw away all
2944*38fd1498Szrj      of the DF data and recompute it all.  */
2945*38fd1498Szrj   if (fun->eh->lp_array)
2946*38fd1498Szrj     {
2947*38fd1498Szrj       df_finish_pass (true);
2948*38fd1498Szrj       df_scan_alloc (NULL);
2949*38fd1498Szrj       df_scan_blocks ();
2950*38fd1498Szrj       /* Not all post-landing pads use all of the EH_RETURN_DATA_REGNO
2951*38fd1498Szrj 	 data.  We blindly generated all of them when creating the new
2952*38fd1498Szrj 	 landing pad.  Delete those assignments we don't use.  */
2953*38fd1498Szrj       df_set_flags (DF_LR_RUN_DCE);
2954*38fd1498Szrj       df_analyze ();
2955*38fd1498Szrj     }
2956*38fd1498Szrj 
2957*38fd1498Szrj   /* Make sure to process deferred rescans and clear changeable df flags.  */
2958*38fd1498Szrj   return TODO_df_finish;
2959*38fd1498Szrj }
2960*38fd1498Szrj 
2961*38fd1498Szrj } // anon namespace
2962*38fd1498Szrj 
2963*38fd1498Szrj rtl_opt_pass *
2964*38fd1498Szrj make_pass_partition_blocks (gcc::context *ctxt)
2965*38fd1498Szrj {
2966*38fd1498Szrj   return new pass_partition_blocks (ctxt);
2967*38fd1498Szrj }
2968