138fd1498Szrj /* Basic block reordering routines for the GNU compiler.
238fd1498Szrj Copyright (C) 2000-2018 Free Software Foundation, Inc.
338fd1498Szrj
438fd1498Szrj This file is part of GCC.
538fd1498Szrj
638fd1498Szrj GCC is free software; you can redistribute it and/or modify it
738fd1498Szrj under the terms of the GNU General Public License as published by
838fd1498Szrj the Free Software Foundation; either version 3, or (at your option)
938fd1498Szrj any later version.
1038fd1498Szrj
1138fd1498Szrj GCC is distributed in the hope that it will be useful, but WITHOUT
1238fd1498Szrj ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
1338fd1498Szrj or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
1438fd1498Szrj License for more details.
1538fd1498Szrj
1638fd1498Szrj You should have received a copy of the GNU General Public License
1738fd1498Szrj along with GCC; see the file COPYING3. If not see
1838fd1498Szrj <http://www.gnu.org/licenses/>. */
1938fd1498Szrj
2038fd1498Szrj /* This file contains the "reorder blocks" pass, which changes the control
2138fd1498Szrj flow of a function to encounter fewer branches; the "partition blocks"
2238fd1498Szrj pass, which divides the basic blocks into "hot" and "cold" partitions,
2338fd1498Szrj which are kept separate; and the "duplicate computed gotos" pass, which
2438fd1498Szrj duplicates blocks ending in an indirect jump.
2538fd1498Szrj
2638fd1498Szrj There are two algorithms for "reorder blocks": the "simple" algorithm,
2738fd1498Szrj which just rearranges blocks, trying to minimize the number of executed
2838fd1498Szrj unconditional branches; and the "software trace cache" algorithm, which
2938fd1498Szrj also copies code, and in general tries a lot harder to have long linear
3038fd1498Szrj pieces of machine code executed. This algorithm is described next. */
3138fd1498Szrj
3238fd1498Szrj /* This (greedy) algorithm constructs traces in several rounds.
3338fd1498Szrj The construction starts from "seeds". The seed for the first round
3438fd1498Szrj is the entry point of the function. When there are more than one seed,
3538fd1498Szrj the one with the lowest key in the heap is selected first (see bb_to_key).
3638fd1498Szrj Then the algorithm repeatedly adds the most probable successor to the end
3738fd1498Szrj of a trace. Finally it connects the traces.
3838fd1498Szrj
3938fd1498Szrj There are two parameters: Branch Threshold and Exec Threshold.
4038fd1498Szrj If the probability of an edge to a successor of the current basic block is
4138fd1498Szrj lower than Branch Threshold or its count is lower than Exec Threshold,
4238fd1498Szrj then the successor will be the seed in one of the next rounds.
4338fd1498Szrj Each round has these parameters lower than the previous one.
4438fd1498Szrj The last round has to have these parameters set to zero so that the
4538fd1498Szrj remaining blocks are picked up.
4638fd1498Szrj
4738fd1498Szrj The algorithm selects the most probable successor from all unvisited
4838fd1498Szrj successors and successors that have been added to this trace.
4938fd1498Szrj The other successors (that has not been "sent" to the next round) will be
5038fd1498Szrj other seeds for this round and the secondary traces will start from them.
5138fd1498Szrj If the successor has not been visited in this trace, it is added to the
5238fd1498Szrj trace (however, there is some heuristic for simple branches).
5338fd1498Szrj If the successor has been visited in this trace, a loop has been found.
5438fd1498Szrj If the loop has many iterations, the loop is rotated so that the source
5538fd1498Szrj block of the most probable edge going out of the loop is the last block
5638fd1498Szrj of the trace.
5738fd1498Szrj If the loop has few iterations and there is no edge from the last block of
5838fd1498Szrj the loop going out of the loop, the loop header is duplicated.
5938fd1498Szrj
6038fd1498Szrj When connecting traces, the algorithm first checks whether there is an edge
6138fd1498Szrj from the last block of a trace to the first block of another trace.
6238fd1498Szrj When there are still some unconnected traces it checks whether there exists
6338fd1498Szrj a basic block BB such that BB is a successor of the last block of a trace
6438fd1498Szrj and BB is a predecessor of the first block of another trace. In this case,
6538fd1498Szrj BB is duplicated, added at the end of the first trace and the traces are
6638fd1498Szrj connected through it.
6738fd1498Szrj The rest of traces are simply connected so there will be a jump to the
6838fd1498Szrj beginning of the rest of traces.
6938fd1498Szrj
7038fd1498Szrj The above description is for the full algorithm, which is used when the
7138fd1498Szrj function is optimized for speed. When the function is optimized for size,
7238fd1498Szrj in order to reduce long jumps and connect more fallthru edges, the
7338fd1498Szrj algorithm is modified as follows:
7438fd1498Szrj (1) Break long traces to short ones. A trace is broken at a block that has
7538fd1498Szrj multiple predecessors/ successors during trace discovery. When connecting
7638fd1498Szrj traces, only connect Trace n with Trace n + 1. This change reduces most
7738fd1498Szrj long jumps compared with the above algorithm.
7838fd1498Szrj (2) Ignore the edge probability and count for fallthru edges.
7938fd1498Szrj (3) Keep the original order of blocks when there is no chance to fall
8038fd1498Szrj through. We rely on the results of cfg_cleanup.
8138fd1498Szrj
8238fd1498Szrj To implement the change for code size optimization, block's index is
8338fd1498Szrj selected as the key and all traces are found in one round.
8438fd1498Szrj
8538fd1498Szrj References:
8638fd1498Szrj
8738fd1498Szrj "Software Trace Cache"
8838fd1498Szrj A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999
8938fd1498Szrj http://citeseer.nj.nec.com/15361.html
9038fd1498Szrj
9138fd1498Szrj */
9238fd1498Szrj
9338fd1498Szrj #include "config.h"
9438fd1498Szrj #define INCLUDE_ALGORITHM /* stable_sort */
9538fd1498Szrj #include "system.h"
9638fd1498Szrj #include "coretypes.h"
9738fd1498Szrj #include "backend.h"
9838fd1498Szrj #include "target.h"
9938fd1498Szrj #include "rtl.h"
10038fd1498Szrj #include "tree.h"
10138fd1498Szrj #include "cfghooks.h"
10238fd1498Szrj #include "df.h"
10338fd1498Szrj #include "memmodel.h"
10438fd1498Szrj #include "optabs.h"
10538fd1498Szrj #include "regs.h"
10638fd1498Szrj #include "emit-rtl.h"
10738fd1498Szrj #include "output.h"
10838fd1498Szrj #include "expr.h"
10938fd1498Szrj #include "params.h"
11038fd1498Szrj #include "tree-pass.h"
11138fd1498Szrj #include "cfgrtl.h"
11238fd1498Szrj #include "cfganal.h"
11338fd1498Szrj #include "cfgbuild.h"
11438fd1498Szrj #include "cfgcleanup.h"
11538fd1498Szrj #include "bb-reorder.h"
11638fd1498Szrj #include "except.h"
11738fd1498Szrj #include "fibonacci_heap.h"
11838fd1498Szrj #include "stringpool.h"
11938fd1498Szrj #include "attribs.h"
120*58e805e6Szrj #include "common/common-target.h"
12138fd1498Szrj
12238fd1498Szrj /* The number of rounds. In most cases there will only be 4 rounds, but
12338fd1498Szrj when partitioning hot and cold basic blocks into separate sections of
12438fd1498Szrj the object file there will be an extra round. */
12538fd1498Szrj #define N_ROUNDS 5
12638fd1498Szrj
12738fd1498Szrj struct target_bb_reorder default_target_bb_reorder;
12838fd1498Szrj #if SWITCHABLE_TARGET
12938fd1498Szrj struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder;
13038fd1498Szrj #endif
13138fd1498Szrj
13238fd1498Szrj #define uncond_jump_length \
13338fd1498Szrj (this_target_bb_reorder->x_uncond_jump_length)
13438fd1498Szrj
13538fd1498Szrj /* Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. */
13638fd1498Szrj static const int branch_threshold[N_ROUNDS] = {400, 200, 100, 0, 0};
13738fd1498Szrj
13838fd1498Szrj /* Exec thresholds in thousandths (per mille) of the count of bb 0. */
13938fd1498Szrj static const int exec_threshold[N_ROUNDS] = {500, 200, 50, 0, 0};
14038fd1498Szrj
14138fd1498Szrj /* If edge count is lower than DUPLICATION_THRESHOLD per mille of entry
14238fd1498Szrj block the edge destination is not duplicated while connecting traces. */
14338fd1498Szrj #define DUPLICATION_THRESHOLD 100
14438fd1498Szrj
14538fd1498Szrj typedef fibonacci_heap <long, basic_block_def> bb_heap_t;
14638fd1498Szrj typedef fibonacci_node <long, basic_block_def> bb_heap_node_t;
14738fd1498Szrj
14838fd1498Szrj /* Structure to hold needed information for each basic block. */
14938fd1498Szrj struct bbro_basic_block_data
15038fd1498Szrj {
15138fd1498Szrj /* Which trace is the bb start of (-1 means it is not a start of any). */
15238fd1498Szrj int start_of_trace;
15338fd1498Szrj
15438fd1498Szrj /* Which trace is the bb end of (-1 means it is not an end of any). */
15538fd1498Szrj int end_of_trace;
15638fd1498Szrj
15738fd1498Szrj /* Which trace is the bb in? */
15838fd1498Szrj int in_trace;
15938fd1498Szrj
16038fd1498Szrj /* Which trace was this bb visited in? */
16138fd1498Szrj int visited;
16238fd1498Szrj
16338fd1498Szrj /* Cached maximum frequency of interesting incoming edges.
16438fd1498Szrj Minus one means not yet computed. */
16538fd1498Szrj int priority;
16638fd1498Szrj
16738fd1498Szrj /* Which heap is BB in (if any)? */
16838fd1498Szrj bb_heap_t *heap;
16938fd1498Szrj
17038fd1498Szrj /* Which heap node is BB in (if any)? */
17138fd1498Szrj bb_heap_node_t *node;
17238fd1498Szrj };
17338fd1498Szrj
17438fd1498Szrj /* The current size of the following dynamic array. */
17538fd1498Szrj static int array_size;
17638fd1498Szrj
17738fd1498Szrj /* The array which holds needed information for basic blocks. */
17838fd1498Szrj static bbro_basic_block_data *bbd;
17938fd1498Szrj
18038fd1498Szrj /* To avoid frequent reallocation the size of arrays is greater than needed,
18138fd1498Szrj the number of elements is (not less than) 1.25 * size_wanted. */
18238fd1498Szrj #define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5)
18338fd1498Szrj
18438fd1498Szrj /* Free the memory and set the pointer to NULL. */
18538fd1498Szrj #define FREE(P) (gcc_assert (P), free (P), P = 0)
18638fd1498Szrj
18738fd1498Szrj /* Structure for holding information about a trace. */
18838fd1498Szrj struct trace
18938fd1498Szrj {
19038fd1498Szrj /* First and last basic block of the trace. */
19138fd1498Szrj basic_block first, last;
19238fd1498Szrj
19338fd1498Szrj /* The round of the STC creation which this trace was found in. */
19438fd1498Szrj int round;
19538fd1498Szrj
19638fd1498Szrj /* The length (i.e. the number of basic blocks) of the trace. */
19738fd1498Szrj int length;
19838fd1498Szrj };
19938fd1498Szrj
20038fd1498Szrj /* Maximum count of one of the entry blocks. */
20138fd1498Szrj static profile_count max_entry_count;
20238fd1498Szrj
20338fd1498Szrj /* Local function prototypes. */
20438fd1498Szrj static void find_traces_1_round (int, profile_count, struct trace *, int *,
20538fd1498Szrj int, bb_heap_t **, int);
20638fd1498Szrj static basic_block copy_bb (basic_block, edge, basic_block, int);
20738fd1498Szrj static long bb_to_key (basic_block);
20838fd1498Szrj static bool better_edge_p (const_basic_block, const_edge, profile_probability,
20938fd1498Szrj profile_count, profile_probability, profile_count,
21038fd1498Szrj const_edge);
21138fd1498Szrj static bool copy_bb_p (const_basic_block, int);
21238fd1498Szrj
21338fd1498Szrj /* Return the trace number in which BB was visited. */
21438fd1498Szrj
21538fd1498Szrj static int
bb_visited_trace(const_basic_block bb)21638fd1498Szrj bb_visited_trace (const_basic_block bb)
21738fd1498Szrj {
21838fd1498Szrj gcc_assert (bb->index < array_size);
21938fd1498Szrj return bbd[bb->index].visited;
22038fd1498Szrj }
22138fd1498Szrj
22238fd1498Szrj /* This function marks BB that it was visited in trace number TRACE. */
22338fd1498Szrj
22438fd1498Szrj static void
mark_bb_visited(basic_block bb,int trace)22538fd1498Szrj mark_bb_visited (basic_block bb, int trace)
22638fd1498Szrj {
22738fd1498Szrj bbd[bb->index].visited = trace;
22838fd1498Szrj if (bbd[bb->index].heap)
22938fd1498Szrj {
23038fd1498Szrj bbd[bb->index].heap->delete_node (bbd[bb->index].node);
23138fd1498Szrj bbd[bb->index].heap = NULL;
23238fd1498Szrj bbd[bb->index].node = NULL;
23338fd1498Szrj }
23438fd1498Szrj }
23538fd1498Szrj
23638fd1498Szrj /* Check to see if bb should be pushed into the next round of trace
23738fd1498Szrj collections or not. Reasons for pushing the block forward are 1).
23838fd1498Szrj If the block is cold, we are doing partitioning, and there will be
23938fd1498Szrj another round (cold partition blocks are not supposed to be
24038fd1498Szrj collected into traces until the very last round); or 2). There will
24138fd1498Szrj be another round, and the basic block is not "hot enough" for the
24238fd1498Szrj current round of trace collection. */
24338fd1498Szrj
24438fd1498Szrj static bool
push_to_next_round_p(const_basic_block bb,int round,int number_of_rounds,profile_count count_th)24538fd1498Szrj push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds,
24638fd1498Szrj profile_count count_th)
24738fd1498Szrj {
24838fd1498Szrj bool there_exists_another_round;
24938fd1498Szrj bool block_not_hot_enough;
25038fd1498Szrj
25138fd1498Szrj there_exists_another_round = round < number_of_rounds - 1;
25238fd1498Szrj
25338fd1498Szrj block_not_hot_enough = (bb->count < count_th
25438fd1498Szrj || probably_never_executed_bb_p (cfun, bb));
25538fd1498Szrj
25638fd1498Szrj if (there_exists_another_round
25738fd1498Szrj && block_not_hot_enough)
25838fd1498Szrj return true;
25938fd1498Szrj else
26038fd1498Szrj return false;
26138fd1498Szrj }
26238fd1498Szrj
26338fd1498Szrj /* Find the traces for Software Trace Cache. Chain each trace through
26438fd1498Szrj RBI()->next. Store the number of traces to N_TRACES and description of
26538fd1498Szrj traces to TRACES. */
26638fd1498Szrj
26738fd1498Szrj static void
find_traces(int * n_traces,struct trace * traces)26838fd1498Szrj find_traces (int *n_traces, struct trace *traces)
26938fd1498Szrj {
27038fd1498Szrj int i;
27138fd1498Szrj int number_of_rounds;
27238fd1498Szrj edge e;
27338fd1498Szrj edge_iterator ei;
27438fd1498Szrj bb_heap_t *heap = new bb_heap_t (LONG_MIN);
27538fd1498Szrj
27638fd1498Szrj /* Add one extra round of trace collection when partitioning hot/cold
27738fd1498Szrj basic blocks into separate sections. The last round is for all the
27838fd1498Szrj cold blocks (and ONLY the cold blocks). */
27938fd1498Szrj
28038fd1498Szrj number_of_rounds = N_ROUNDS - 1;
28138fd1498Szrj
28238fd1498Szrj /* Insert entry points of function into heap. */
28338fd1498Szrj max_entry_count = profile_count::zero ();
28438fd1498Szrj FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
28538fd1498Szrj {
28638fd1498Szrj bbd[e->dest->index].heap = heap;
28738fd1498Szrj bbd[e->dest->index].node = heap->insert (bb_to_key (e->dest), e->dest);
28838fd1498Szrj if (e->dest->count > max_entry_count)
28938fd1498Szrj max_entry_count = e->dest->count;
29038fd1498Szrj }
29138fd1498Szrj
29238fd1498Szrj /* Find the traces. */
29338fd1498Szrj for (i = 0; i < number_of_rounds; i++)
29438fd1498Szrj {
29538fd1498Szrj profile_count count_threshold;
29638fd1498Szrj
29738fd1498Szrj if (dump_file)
29838fd1498Szrj fprintf (dump_file, "STC - round %d\n", i + 1);
29938fd1498Szrj
30038fd1498Szrj count_threshold = max_entry_count.apply_scale (exec_threshold[i], 1000);
30138fd1498Szrj
30238fd1498Szrj find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / 1000,
30338fd1498Szrj count_threshold, traces, n_traces, i, &heap,
30438fd1498Szrj number_of_rounds);
30538fd1498Szrj }
30638fd1498Szrj delete heap;
30738fd1498Szrj
30838fd1498Szrj if (dump_file)
30938fd1498Szrj {
31038fd1498Szrj for (i = 0; i < *n_traces; i++)
31138fd1498Szrj {
31238fd1498Szrj basic_block bb;
31338fd1498Szrj fprintf (dump_file, "Trace %d (round %d): ", i + 1,
31438fd1498Szrj traces[i].round + 1);
31538fd1498Szrj for (bb = traces[i].first;
31638fd1498Szrj bb != traces[i].last;
31738fd1498Szrj bb = (basic_block) bb->aux)
31838fd1498Szrj {
31938fd1498Szrj fprintf (dump_file, "%d [", bb->index);
32038fd1498Szrj bb->count.dump (dump_file);
32138fd1498Szrj fprintf (dump_file, "] ");
32238fd1498Szrj }
32338fd1498Szrj fprintf (dump_file, "%d [", bb->index);
32438fd1498Szrj bb->count.dump (dump_file);
32538fd1498Szrj fprintf (dump_file, "]\n");
32638fd1498Szrj }
32738fd1498Szrj fflush (dump_file);
32838fd1498Szrj }
32938fd1498Szrj }
33038fd1498Szrj
33138fd1498Szrj /* Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE
33238fd1498Szrj (with sequential number TRACE_N). */
33338fd1498Szrj
33438fd1498Szrj static basic_block
rotate_loop(edge back_edge,struct trace * trace,int trace_n)33538fd1498Szrj rotate_loop (edge back_edge, struct trace *trace, int trace_n)
33638fd1498Szrj {
33738fd1498Szrj basic_block bb;
33838fd1498Szrj
33938fd1498Szrj /* Information about the best end (end after rotation) of the loop. */
34038fd1498Szrj basic_block best_bb = NULL;
34138fd1498Szrj edge best_edge = NULL;
34238fd1498Szrj profile_count best_count = profile_count::uninitialized ();
34338fd1498Szrj /* The best edge is preferred when its destination is not visited yet
34438fd1498Szrj or is a start block of some trace. */
34538fd1498Szrj bool is_preferred = false;
34638fd1498Szrj
34738fd1498Szrj /* Find the most frequent edge that goes out from current trace. */
34838fd1498Szrj bb = back_edge->dest;
34938fd1498Szrj do
35038fd1498Szrj {
35138fd1498Szrj edge e;
35238fd1498Szrj edge_iterator ei;
35338fd1498Szrj
35438fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
35538fd1498Szrj if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
35638fd1498Szrj && bb_visited_trace (e->dest) != trace_n
35738fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU)
35838fd1498Szrj && !(e->flags & EDGE_COMPLEX))
35938fd1498Szrj {
36038fd1498Szrj if (is_preferred)
36138fd1498Szrj {
36238fd1498Szrj /* The best edge is preferred. */
36338fd1498Szrj if (!bb_visited_trace (e->dest)
36438fd1498Szrj || bbd[e->dest->index].start_of_trace >= 0)
36538fd1498Szrj {
36638fd1498Szrj /* The current edge E is also preferred. */
36738fd1498Szrj if (e->count () > best_count)
36838fd1498Szrj {
36938fd1498Szrj best_count = e->count ();
37038fd1498Szrj best_edge = e;
37138fd1498Szrj best_bb = bb;
37238fd1498Szrj }
37338fd1498Szrj }
37438fd1498Szrj }
37538fd1498Szrj else
37638fd1498Szrj {
37738fd1498Szrj if (!bb_visited_trace (e->dest)
37838fd1498Szrj || bbd[e->dest->index].start_of_trace >= 0)
37938fd1498Szrj {
38038fd1498Szrj /* The current edge E is preferred. */
38138fd1498Szrj is_preferred = true;
38238fd1498Szrj best_count = e->count ();
38338fd1498Szrj best_edge = e;
38438fd1498Szrj best_bb = bb;
38538fd1498Szrj }
38638fd1498Szrj else
38738fd1498Szrj {
38838fd1498Szrj if (!best_edge || e->count () > best_count)
38938fd1498Szrj {
39038fd1498Szrj best_count = e->count ();
39138fd1498Szrj best_edge = e;
39238fd1498Szrj best_bb = bb;
39338fd1498Szrj }
39438fd1498Szrj }
39538fd1498Szrj }
39638fd1498Szrj }
39738fd1498Szrj bb = (basic_block) bb->aux;
39838fd1498Szrj }
39938fd1498Szrj while (bb != back_edge->dest);
40038fd1498Szrj
40138fd1498Szrj if (best_bb)
40238fd1498Szrj {
40338fd1498Szrj /* Rotate the loop so that the BEST_EDGE goes out from the last block of
40438fd1498Szrj the trace. */
40538fd1498Szrj if (back_edge->dest == trace->first)
40638fd1498Szrj {
40738fd1498Szrj trace->first = (basic_block) best_bb->aux;
40838fd1498Szrj }
40938fd1498Szrj else
41038fd1498Szrj {
41138fd1498Szrj basic_block prev_bb;
41238fd1498Szrj
41338fd1498Szrj for (prev_bb = trace->first;
41438fd1498Szrj prev_bb->aux != back_edge->dest;
41538fd1498Szrj prev_bb = (basic_block) prev_bb->aux)
41638fd1498Szrj ;
41738fd1498Szrj prev_bb->aux = best_bb->aux;
41838fd1498Szrj
41938fd1498Szrj /* Try to get rid of uncond jump to cond jump. */
42038fd1498Szrj if (single_succ_p (prev_bb))
42138fd1498Szrj {
42238fd1498Szrj basic_block header = single_succ (prev_bb);
42338fd1498Szrj
42438fd1498Szrj /* Duplicate HEADER if it is a small block containing cond jump
42538fd1498Szrj in the end. */
42638fd1498Szrj if (any_condjump_p (BB_END (header)) && copy_bb_p (header, 0)
42738fd1498Szrj && !CROSSING_JUMP_P (BB_END (header)))
42838fd1498Szrj copy_bb (header, single_succ_edge (prev_bb), prev_bb, trace_n);
42938fd1498Szrj }
43038fd1498Szrj }
43138fd1498Szrj }
43238fd1498Szrj else
43338fd1498Szrj {
43438fd1498Szrj /* We have not found suitable loop tail so do no rotation. */
43538fd1498Szrj best_bb = back_edge->src;
43638fd1498Szrj }
43738fd1498Szrj best_bb->aux = NULL;
43838fd1498Szrj return best_bb;
43938fd1498Szrj }
44038fd1498Szrj
44138fd1498Szrj /* One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do
44238fd1498Szrj not include basic blocks whose probability is lower than BRANCH_TH or whose
44338fd1498Szrj count is lower than EXEC_TH into traces (or whose count is lower than
44438fd1498Szrj COUNT_TH). Store the new traces into TRACES and modify the number of
44538fd1498Szrj traces *N_TRACES. Set the round (which the trace belongs to) to ROUND.
44638fd1498Szrj The function expects starting basic blocks to be in *HEAP and will delete
44738fd1498Szrj *HEAP and store starting points for the next round into new *HEAP. */
44838fd1498Szrj
44938fd1498Szrj static void
find_traces_1_round(int branch_th,profile_count count_th,struct trace * traces,int * n_traces,int round,bb_heap_t ** heap,int number_of_rounds)45038fd1498Szrj find_traces_1_round (int branch_th, profile_count count_th,
45138fd1498Szrj struct trace *traces, int *n_traces, int round,
45238fd1498Szrj bb_heap_t **heap, int number_of_rounds)
45338fd1498Szrj {
45438fd1498Szrj /* Heap for discarded basic blocks which are possible starting points for
45538fd1498Szrj the next round. */
45638fd1498Szrj bb_heap_t *new_heap = new bb_heap_t (LONG_MIN);
45738fd1498Szrj bool for_size = optimize_function_for_size_p (cfun);
45838fd1498Szrj
45938fd1498Szrj while (!(*heap)->empty ())
46038fd1498Szrj {
46138fd1498Szrj basic_block bb;
46238fd1498Szrj struct trace *trace;
46338fd1498Szrj edge best_edge, e;
46438fd1498Szrj long key;
46538fd1498Szrj edge_iterator ei;
46638fd1498Szrj
46738fd1498Szrj bb = (*heap)->extract_min ();
46838fd1498Szrj bbd[bb->index].heap = NULL;
46938fd1498Szrj bbd[bb->index].node = NULL;
47038fd1498Szrj
47138fd1498Szrj if (dump_file)
47238fd1498Szrj fprintf (dump_file, "Getting bb %d\n", bb->index);
47338fd1498Szrj
47438fd1498Szrj /* If the BB's count is too low, send BB to the next round. When
47538fd1498Szrj partitioning hot/cold blocks into separate sections, make sure all
47638fd1498Szrj the cold blocks (and ONLY the cold blocks) go into the (extra) final
47738fd1498Szrj round. When optimizing for size, do not push to next round. */
47838fd1498Szrj
47938fd1498Szrj if (!for_size
48038fd1498Szrj && push_to_next_round_p (bb, round, number_of_rounds,
48138fd1498Szrj count_th))
48238fd1498Szrj {
48338fd1498Szrj int key = bb_to_key (bb);
48438fd1498Szrj bbd[bb->index].heap = new_heap;
48538fd1498Szrj bbd[bb->index].node = new_heap->insert (key, bb);
48638fd1498Szrj
48738fd1498Szrj if (dump_file)
48838fd1498Szrj fprintf (dump_file,
48938fd1498Szrj " Possible start point of next round: %d (key: %d)\n",
49038fd1498Szrj bb->index, key);
49138fd1498Szrj continue;
49238fd1498Szrj }
49338fd1498Szrj
49438fd1498Szrj trace = traces + *n_traces;
49538fd1498Szrj trace->first = bb;
49638fd1498Szrj trace->round = round;
49738fd1498Szrj trace->length = 0;
49838fd1498Szrj bbd[bb->index].in_trace = *n_traces;
49938fd1498Szrj (*n_traces)++;
50038fd1498Szrj
50138fd1498Szrj do
50238fd1498Szrj {
50338fd1498Szrj bool ends_in_call;
50438fd1498Szrj
50538fd1498Szrj /* The probability and count of the best edge. */
50638fd1498Szrj profile_probability best_prob = profile_probability::uninitialized ();
50738fd1498Szrj profile_count best_count = profile_count::uninitialized ();
50838fd1498Szrj
50938fd1498Szrj best_edge = NULL;
51038fd1498Szrj mark_bb_visited (bb, *n_traces);
51138fd1498Szrj trace->length++;
51238fd1498Szrj
51338fd1498Szrj if (dump_file)
51438fd1498Szrj fprintf (dump_file, "Basic block %d was visited in trace %d\n",
51538fd1498Szrj bb->index, *n_traces);
51638fd1498Szrj
51738fd1498Szrj ends_in_call = block_ends_with_call_p (bb);
51838fd1498Szrj
51938fd1498Szrj /* Select the successor that will be placed after BB. */
52038fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
52138fd1498Szrj {
52238fd1498Szrj gcc_assert (!(e->flags & EDGE_FAKE));
52338fd1498Szrj
52438fd1498Szrj if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
52538fd1498Szrj continue;
52638fd1498Szrj
52738fd1498Szrj if (bb_visited_trace (e->dest)
52838fd1498Szrj && bb_visited_trace (e->dest) != *n_traces)
52938fd1498Szrj continue;
53038fd1498Szrj
53138fd1498Szrj /* If partitioning hot/cold basic blocks, don't consider edges
53238fd1498Szrj that cross section boundaries. */
53338fd1498Szrj if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
53438fd1498Szrj continue;
53538fd1498Szrj
53638fd1498Szrj profile_probability prob = e->probability;
53738fd1498Szrj profile_count count = e->dest->count;
53838fd1498Szrj
53938fd1498Szrj /* The only sensible preference for a call instruction is the
54038fd1498Szrj fallthru edge. Don't bother selecting anything else. */
54138fd1498Szrj if (ends_in_call)
54238fd1498Szrj {
54338fd1498Szrj if (e->flags & EDGE_CAN_FALLTHRU)
54438fd1498Szrj {
54538fd1498Szrj best_edge = e;
54638fd1498Szrj best_prob = prob;
54738fd1498Szrj best_count = count;
54838fd1498Szrj }
54938fd1498Szrj continue;
55038fd1498Szrj }
55138fd1498Szrj
55238fd1498Szrj /* Edge that cannot be fallthru or improbable or infrequent
55338fd1498Szrj successor (i.e. it is unsuitable successor). When optimizing
55438fd1498Szrj for size, ignore the probability and count. */
55538fd1498Szrj if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX)
55638fd1498Szrj || !prob.initialized_p ()
55738fd1498Szrj || ((prob.to_reg_br_prob_base () < branch_th
55838fd1498Szrj || e->count () < count_th) && (!for_size)))
55938fd1498Szrj continue;
56038fd1498Szrj
56138fd1498Szrj if (better_edge_p (bb, e, prob, count, best_prob, best_count,
56238fd1498Szrj best_edge))
56338fd1498Szrj {
56438fd1498Szrj best_edge = e;
56538fd1498Szrj best_prob = prob;
56638fd1498Szrj best_count = count;
56738fd1498Szrj }
56838fd1498Szrj }
56938fd1498Szrj
57038fd1498Szrj /* If the best destination has multiple predecessors and can be
57138fd1498Szrj duplicated cheaper than a jump, don't allow it to be added to
57238fd1498Szrj a trace; we'll duplicate it when connecting the traces later.
57338fd1498Szrj However, we need to check that this duplication wouldn't leave
57438fd1498Szrj the best destination with only crossing predecessors, because
57538fd1498Szrj this would change its effective partition from hot to cold. */
57638fd1498Szrj if (best_edge
57738fd1498Szrj && EDGE_COUNT (best_edge->dest->preds) >= 2
57838fd1498Szrj && copy_bb_p (best_edge->dest, 0))
57938fd1498Szrj {
58038fd1498Szrj bool only_crossing_preds = true;
58138fd1498Szrj edge e;
58238fd1498Szrj edge_iterator ei;
58338fd1498Szrj FOR_EACH_EDGE (e, ei, best_edge->dest->preds)
58438fd1498Szrj if (e != best_edge && !(e->flags & EDGE_CROSSING))
58538fd1498Szrj {
58638fd1498Szrj only_crossing_preds = false;
58738fd1498Szrj break;
58838fd1498Szrj }
58938fd1498Szrj if (!only_crossing_preds)
59038fd1498Szrj best_edge = NULL;
59138fd1498Szrj }
59238fd1498Szrj
59338fd1498Szrj /* If the best destination has multiple successors or predecessors,
59438fd1498Szrj don't allow it to be added when optimizing for size. This makes
59538fd1498Szrj sure predecessors with smaller index are handled before the best
59638fd1498Szrj destinarion. It breaks long trace and reduces long jumps.
59738fd1498Szrj
59838fd1498Szrj Take if-then-else as an example.
59938fd1498Szrj A
60038fd1498Szrj / \
60138fd1498Szrj B C
60238fd1498Szrj \ /
60338fd1498Szrj D
60438fd1498Szrj If we do not remove the best edge B->D/C->D, the final order might
60538fd1498Szrj be A B D ... C. C is at the end of the program. If D's successors
60638fd1498Szrj and D are complicated, might need long jumps for A->C and C->D.
60738fd1498Szrj Similar issue for order: A C D ... B.
60838fd1498Szrj
60938fd1498Szrj After removing the best edge, the final result will be ABCD/ ACBD.
61038fd1498Szrj It does not add jump compared with the previous order. But it
61138fd1498Szrj reduces the possibility of long jumps. */
61238fd1498Szrj if (best_edge && for_size
61338fd1498Szrj && (EDGE_COUNT (best_edge->dest->succs) > 1
61438fd1498Szrj || EDGE_COUNT (best_edge->dest->preds) > 1))
61538fd1498Szrj best_edge = NULL;
61638fd1498Szrj
61738fd1498Szrj /* Add all non-selected successors to the heaps. */
61838fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
61938fd1498Szrj {
62038fd1498Szrj if (e == best_edge
62138fd1498Szrj || e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
62238fd1498Szrj || bb_visited_trace (e->dest))
62338fd1498Szrj continue;
62438fd1498Szrj
62538fd1498Szrj key = bb_to_key (e->dest);
62638fd1498Szrj
62738fd1498Szrj if (bbd[e->dest->index].heap)
62838fd1498Szrj {
62938fd1498Szrj /* E->DEST is already in some heap. */
63038fd1498Szrj if (key != bbd[e->dest->index].node->get_key ())
63138fd1498Szrj {
63238fd1498Szrj if (dump_file)
63338fd1498Szrj {
63438fd1498Szrj fprintf (dump_file,
63538fd1498Szrj "Changing key for bb %d from %ld to %ld.\n",
63638fd1498Szrj e->dest->index,
63738fd1498Szrj (long) bbd[e->dest->index].node->get_key (),
63838fd1498Szrj key);
63938fd1498Szrj }
64038fd1498Szrj bbd[e->dest->index].heap->replace_key
64138fd1498Szrj (bbd[e->dest->index].node, key);
64238fd1498Szrj }
64338fd1498Szrj }
64438fd1498Szrj else
64538fd1498Szrj {
64638fd1498Szrj bb_heap_t *which_heap = *heap;
64738fd1498Szrj
64838fd1498Szrj profile_probability prob = e->probability;
64938fd1498Szrj
65038fd1498Szrj if (!(e->flags & EDGE_CAN_FALLTHRU)
65138fd1498Szrj || (e->flags & EDGE_COMPLEX)
65238fd1498Szrj || !prob.initialized_p ()
65338fd1498Szrj || prob.to_reg_br_prob_base () < branch_th
65438fd1498Szrj || e->count () < count_th)
65538fd1498Szrj {
65638fd1498Szrj /* When partitioning hot/cold basic blocks, make sure
65738fd1498Szrj the cold blocks (and only the cold blocks) all get
65838fd1498Szrj pushed to the last round of trace collection. When
65938fd1498Szrj optimizing for size, do not push to next round. */
66038fd1498Szrj
66138fd1498Szrj if (!for_size && push_to_next_round_p (e->dest, round,
66238fd1498Szrj number_of_rounds,
66338fd1498Szrj count_th))
66438fd1498Szrj which_heap = new_heap;
66538fd1498Szrj }
66638fd1498Szrj
66738fd1498Szrj bbd[e->dest->index].heap = which_heap;
66838fd1498Szrj bbd[e->dest->index].node = which_heap->insert (key, e->dest);
66938fd1498Szrj
67038fd1498Szrj if (dump_file)
67138fd1498Szrj {
67238fd1498Szrj fprintf (dump_file,
67338fd1498Szrj " Possible start of %s round: %d (key: %ld)\n",
67438fd1498Szrj (which_heap == new_heap) ? "next" : "this",
67538fd1498Szrj e->dest->index, (long) key);
67638fd1498Szrj }
67738fd1498Szrj
67838fd1498Szrj }
67938fd1498Szrj }
68038fd1498Szrj
68138fd1498Szrj if (best_edge) /* Suitable successor was found. */
68238fd1498Szrj {
68338fd1498Szrj if (bb_visited_trace (best_edge->dest) == *n_traces)
68438fd1498Szrj {
68538fd1498Szrj /* We do nothing with one basic block loops. */
68638fd1498Szrj if (best_edge->dest != bb)
68738fd1498Szrj {
68838fd1498Szrj if (best_edge->count ()
68938fd1498Szrj > best_edge->dest->count.apply_scale (4, 5))
69038fd1498Szrj {
69138fd1498Szrj /* The loop has at least 4 iterations. If the loop
69238fd1498Szrj header is not the first block of the function
69338fd1498Szrj we can rotate the loop. */
69438fd1498Szrj
69538fd1498Szrj if (best_edge->dest
69638fd1498Szrj != ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
69738fd1498Szrj {
69838fd1498Szrj if (dump_file)
69938fd1498Szrj {
70038fd1498Szrj fprintf (dump_file,
70138fd1498Szrj "Rotating loop %d - %d\n",
70238fd1498Szrj best_edge->dest->index, bb->index);
70338fd1498Szrj }
70438fd1498Szrj bb->aux = best_edge->dest;
70538fd1498Szrj bbd[best_edge->dest->index].in_trace =
70638fd1498Szrj (*n_traces) - 1;
70738fd1498Szrj bb = rotate_loop (best_edge, trace, *n_traces);
70838fd1498Szrj }
70938fd1498Szrj }
71038fd1498Szrj else
71138fd1498Szrj {
71238fd1498Szrj /* The loop has less than 4 iterations. */
71338fd1498Szrj
71438fd1498Szrj if (single_succ_p (bb)
71538fd1498Szrj && copy_bb_p (best_edge->dest,
71638fd1498Szrj optimize_edge_for_speed_p
71738fd1498Szrj (best_edge)))
71838fd1498Szrj {
71938fd1498Szrj bb = copy_bb (best_edge->dest, best_edge, bb,
72038fd1498Szrj *n_traces);
72138fd1498Szrj trace->length++;
72238fd1498Szrj }
72338fd1498Szrj }
72438fd1498Szrj }
72538fd1498Szrj
72638fd1498Szrj /* Terminate the trace. */
72738fd1498Szrj break;
72838fd1498Szrj }
72938fd1498Szrj else
73038fd1498Szrj {
73138fd1498Szrj /* Check for a situation
73238fd1498Szrj
73338fd1498Szrj A
73438fd1498Szrj /|
73538fd1498Szrj B |
73638fd1498Szrj \|
73738fd1498Szrj C
73838fd1498Szrj
73938fd1498Szrj where
74038fd1498Szrj AB->count () + BC->count () >= AC->count ().
74138fd1498Szrj (i.e. 2 * B->count >= AC->count )
74238fd1498Szrj Best ordering is then A B C.
74338fd1498Szrj
74438fd1498Szrj When optimizing for size, A B C is always the best order.
74538fd1498Szrj
74638fd1498Szrj This situation is created for example by:
74738fd1498Szrj
74838fd1498Szrj if (A) B;
74938fd1498Szrj C;
75038fd1498Szrj
75138fd1498Szrj */
75238fd1498Szrj
75338fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
75438fd1498Szrj if (e != best_edge
75538fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU)
75638fd1498Szrj && !(e->flags & EDGE_COMPLEX)
75738fd1498Szrj && !bb_visited_trace (e->dest)
75838fd1498Szrj && single_pred_p (e->dest)
75938fd1498Szrj && !(e->flags & EDGE_CROSSING)
76038fd1498Szrj && single_succ_p (e->dest)
76138fd1498Szrj && (single_succ_edge (e->dest)->flags
76238fd1498Szrj & EDGE_CAN_FALLTHRU)
76338fd1498Szrj && !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX)
76438fd1498Szrj && single_succ (e->dest) == best_edge->dest
76538fd1498Szrj && (e->dest->count.apply_scale (2, 1)
76638fd1498Szrj >= best_edge->count () || for_size))
76738fd1498Szrj {
76838fd1498Szrj best_edge = e;
76938fd1498Szrj if (dump_file)
77038fd1498Szrj fprintf (dump_file, "Selecting BB %d\n",
77138fd1498Szrj best_edge->dest->index);
77238fd1498Szrj break;
77338fd1498Szrj }
77438fd1498Szrj
77538fd1498Szrj bb->aux = best_edge->dest;
77638fd1498Szrj bbd[best_edge->dest->index].in_trace = (*n_traces) - 1;
77738fd1498Szrj bb = best_edge->dest;
77838fd1498Szrj }
77938fd1498Szrj }
78038fd1498Szrj }
78138fd1498Szrj while (best_edge);
78238fd1498Szrj trace->last = bb;
78338fd1498Szrj bbd[trace->first->index].start_of_trace = *n_traces - 1;
78438fd1498Szrj if (bbd[trace->last->index].end_of_trace != *n_traces - 1)
78538fd1498Szrj {
78638fd1498Szrj bbd[trace->last->index].end_of_trace = *n_traces - 1;
78738fd1498Szrj /* Update the cached maximum frequency for interesting predecessor
78838fd1498Szrj edges for successors of the new trace end. */
78938fd1498Szrj FOR_EACH_EDGE (e, ei, trace->last->succs)
79038fd1498Szrj if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority)
79138fd1498Szrj bbd[e->dest->index].priority = EDGE_FREQUENCY (e);
79238fd1498Szrj }
79338fd1498Szrj
79438fd1498Szrj /* The trace is terminated so we have to recount the keys in heap
79538fd1498Szrj (some block can have a lower key because now one of its predecessors
79638fd1498Szrj is an end of the trace). */
79738fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
79838fd1498Szrj {
79938fd1498Szrj if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
80038fd1498Szrj || bb_visited_trace (e->dest))
80138fd1498Szrj continue;
80238fd1498Szrj
80338fd1498Szrj if (bbd[e->dest->index].heap)
80438fd1498Szrj {
80538fd1498Szrj key = bb_to_key (e->dest);
80638fd1498Szrj if (key != bbd[e->dest->index].node->get_key ())
80738fd1498Szrj {
80838fd1498Szrj if (dump_file)
80938fd1498Szrj {
81038fd1498Szrj fprintf (dump_file,
81138fd1498Szrj "Changing key for bb %d from %ld to %ld.\n",
81238fd1498Szrj e->dest->index,
81338fd1498Szrj (long) bbd[e->dest->index].node->get_key (), key);
81438fd1498Szrj }
81538fd1498Szrj bbd[e->dest->index].heap->replace_key
81638fd1498Szrj (bbd[e->dest->index].node, key);
81738fd1498Szrj }
81838fd1498Szrj }
81938fd1498Szrj }
82038fd1498Szrj }
82138fd1498Szrj
82238fd1498Szrj delete (*heap);
82338fd1498Szrj
82438fd1498Szrj /* "Return" the new heap. */
82538fd1498Szrj *heap = new_heap;
82638fd1498Szrj }
82738fd1498Szrj
82838fd1498Szrj /* Create a duplicate of the basic block OLD_BB and redirect edge E to it, add
82938fd1498Szrj it to trace after BB, mark OLD_BB visited and update pass' data structures
83038fd1498Szrj (TRACE is a number of trace which OLD_BB is duplicated to). */
83138fd1498Szrj
83238fd1498Szrj static basic_block
copy_bb(basic_block old_bb,edge e,basic_block bb,int trace)83338fd1498Szrj copy_bb (basic_block old_bb, edge e, basic_block bb, int trace)
83438fd1498Szrj {
83538fd1498Szrj basic_block new_bb;
83638fd1498Szrj
83738fd1498Szrj new_bb = duplicate_block (old_bb, e, bb);
83838fd1498Szrj BB_COPY_PARTITION (new_bb, old_bb);
83938fd1498Szrj
84038fd1498Szrj gcc_assert (e->dest == new_bb);
84138fd1498Szrj
84238fd1498Szrj if (dump_file)
84338fd1498Szrj fprintf (dump_file,
84438fd1498Szrj "Duplicated bb %d (created bb %d)\n",
84538fd1498Szrj old_bb->index, new_bb->index);
84638fd1498Szrj
84738fd1498Szrj if (new_bb->index >= array_size
84838fd1498Szrj || last_basic_block_for_fn (cfun) > array_size)
84938fd1498Szrj {
85038fd1498Szrj int i;
85138fd1498Szrj int new_size;
85238fd1498Szrj
85338fd1498Szrj new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + 1);
85438fd1498Szrj new_size = GET_ARRAY_SIZE (new_size);
85538fd1498Szrj bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size);
85638fd1498Szrj for (i = array_size; i < new_size; i++)
85738fd1498Szrj {
85838fd1498Szrj bbd[i].start_of_trace = -1;
85938fd1498Szrj bbd[i].end_of_trace = -1;
86038fd1498Szrj bbd[i].in_trace = -1;
86138fd1498Szrj bbd[i].visited = 0;
86238fd1498Szrj bbd[i].priority = -1;
86338fd1498Szrj bbd[i].heap = NULL;
86438fd1498Szrj bbd[i].node = NULL;
86538fd1498Szrj }
86638fd1498Szrj array_size = new_size;
86738fd1498Szrj
86838fd1498Szrj if (dump_file)
86938fd1498Szrj {
87038fd1498Szrj fprintf (dump_file,
87138fd1498Szrj "Growing the dynamic array to %d elements.\n",
87238fd1498Szrj array_size);
87338fd1498Szrj }
87438fd1498Szrj }
87538fd1498Szrj
87638fd1498Szrj gcc_assert (!bb_visited_trace (e->dest));
87738fd1498Szrj mark_bb_visited (new_bb, trace);
87838fd1498Szrj new_bb->aux = bb->aux;
87938fd1498Szrj bb->aux = new_bb;
88038fd1498Szrj
88138fd1498Szrj bbd[new_bb->index].in_trace = trace;
88238fd1498Szrj
88338fd1498Szrj return new_bb;
88438fd1498Szrj }
88538fd1498Szrj
88638fd1498Szrj /* Compute and return the key (for the heap) of the basic block BB. */
88738fd1498Szrj
88838fd1498Szrj static long
bb_to_key(basic_block bb)88938fd1498Szrj bb_to_key (basic_block bb)
89038fd1498Szrj {
89138fd1498Szrj edge e;
89238fd1498Szrj edge_iterator ei;
89338fd1498Szrj
89438fd1498Szrj /* Use index as key to align with its original order. */
89538fd1498Szrj if (optimize_function_for_size_p (cfun))
89638fd1498Szrj return bb->index;
89738fd1498Szrj
89838fd1498Szrj /* Do not start in probably never executed blocks. */
89938fd1498Szrj
90038fd1498Szrj if (BB_PARTITION (bb) == BB_COLD_PARTITION
90138fd1498Szrj || probably_never_executed_bb_p (cfun, bb))
90238fd1498Szrj return BB_FREQ_MAX;
90338fd1498Szrj
90438fd1498Szrj /* Prefer blocks whose predecessor is an end of some trace
90538fd1498Szrj or whose predecessor edge is EDGE_DFS_BACK. */
90638fd1498Szrj int priority = bbd[bb->index].priority;
90738fd1498Szrj if (priority == -1)
90838fd1498Szrj {
90938fd1498Szrj priority = 0;
91038fd1498Szrj FOR_EACH_EDGE (e, ei, bb->preds)
91138fd1498Szrj {
91238fd1498Szrj if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
91338fd1498Szrj && bbd[e->src->index].end_of_trace >= 0)
91438fd1498Szrj || (e->flags & EDGE_DFS_BACK))
91538fd1498Szrj {
91638fd1498Szrj int edge_freq = EDGE_FREQUENCY (e);
91738fd1498Szrj
91838fd1498Szrj if (edge_freq > priority)
91938fd1498Szrj priority = edge_freq;
92038fd1498Szrj }
92138fd1498Szrj }
92238fd1498Szrj bbd[bb->index].priority = priority;
92338fd1498Szrj }
92438fd1498Szrj
92538fd1498Szrj if (priority)
92638fd1498Szrj /* The block with priority should have significantly lower key. */
92738fd1498Szrj return -(100 * BB_FREQ_MAX + 100 * priority + bb->count.to_frequency (cfun));
92838fd1498Szrj
92938fd1498Szrj return -bb->count.to_frequency (cfun);
93038fd1498Szrj }
93138fd1498Szrj
93238fd1498Szrj /* Return true when the edge E from basic block BB is better than the temporary
93338fd1498Szrj best edge (details are in function). The probability of edge E is PROB. The
93438fd1498Szrj count of the successor is COUNT. The current best probability is
93538fd1498Szrj BEST_PROB, the best count is BEST_COUNT.
93638fd1498Szrj The edge is considered to be equivalent when PROB does not differ much from
93738fd1498Szrj BEST_PROB; similarly for count. */
93838fd1498Szrj
93938fd1498Szrj static bool
better_edge_p(const_basic_block bb,const_edge e,profile_probability prob,profile_count count,profile_probability best_prob,profile_count best_count,const_edge cur_best_edge)94038fd1498Szrj better_edge_p (const_basic_block bb, const_edge e, profile_probability prob,
94138fd1498Szrj profile_count count, profile_probability best_prob,
94238fd1498Szrj profile_count best_count, const_edge cur_best_edge)
94338fd1498Szrj {
94438fd1498Szrj bool is_better_edge;
94538fd1498Szrj
94638fd1498Szrj /* The BEST_* values do not have to be best, but can be a bit smaller than
94738fd1498Szrj maximum values. */
94838fd1498Szrj profile_probability diff_prob = best_prob.apply_scale (1, 10);
94938fd1498Szrj
95038fd1498Szrj /* The smaller one is better to keep the original order. */
95138fd1498Szrj if (optimize_function_for_size_p (cfun))
95238fd1498Szrj return !cur_best_edge
95338fd1498Szrj || cur_best_edge->dest->index > e->dest->index;
95438fd1498Szrj
95538fd1498Szrj /* Those edges are so expensive that continuing a trace is not useful
95638fd1498Szrj performance wise. */
95738fd1498Szrj if (e->flags & (EDGE_ABNORMAL | EDGE_EH))
95838fd1498Szrj return false;
95938fd1498Szrj
96038fd1498Szrj if (prob > best_prob + diff_prob
96138fd1498Szrj || (!best_prob.initialized_p ()
96238fd1498Szrj && prob > profile_probability::guessed_never ()))
96338fd1498Szrj /* The edge has higher probability than the temporary best edge. */
96438fd1498Szrj is_better_edge = true;
96538fd1498Szrj else if (prob < best_prob - diff_prob)
96638fd1498Szrj /* The edge has lower probability than the temporary best edge. */
96738fd1498Szrj is_better_edge = false;
96838fd1498Szrj else
96938fd1498Szrj {
97038fd1498Szrj profile_count diff_count = best_count.apply_scale (1, 10);
97138fd1498Szrj if (count < best_count - diff_count
97238fd1498Szrj || (!best_count.initialized_p ()
97338fd1498Szrj && count.nonzero_p ()))
97438fd1498Szrj /* The edge and the temporary best edge have almost equivalent
97538fd1498Szrj probabilities. The higher countuency of a successor now means
97638fd1498Szrj that there is another edge going into that successor.
97738fd1498Szrj This successor has lower countuency so it is better. */
97838fd1498Szrj is_better_edge = true;
97938fd1498Szrj else if (count > best_count + diff_count)
98038fd1498Szrj /* This successor has higher countuency so it is worse. */
98138fd1498Szrj is_better_edge = false;
98238fd1498Szrj else if (e->dest->prev_bb == bb)
98338fd1498Szrj /* The edges have equivalent probabilities and the successors
98438fd1498Szrj have equivalent frequencies. Select the previous successor. */
98538fd1498Szrj is_better_edge = true;
98638fd1498Szrj else
98738fd1498Szrj is_better_edge = false;
98838fd1498Szrj }
98938fd1498Szrj
99038fd1498Szrj return is_better_edge;
99138fd1498Szrj }
99238fd1498Szrj
99338fd1498Szrj /* Return true when the edge E is better than the temporary best edge
99438fd1498Szrj CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of
99538fd1498Szrj E and CUR_BEST_EDGE; otherwise it will compare the dest bb.
99638fd1498Szrj BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE.
99738fd1498Szrj TRACES record the information about traces.
99838fd1498Szrj When optimizing for size, the edge with smaller index is better.
99938fd1498Szrj When optimizing for speed, the edge with bigger probability or longer trace
100038fd1498Szrj is better. */
100138fd1498Szrj
100238fd1498Szrj static bool
connect_better_edge_p(const_edge e,bool src_index_p,int best_len,const_edge cur_best_edge,struct trace * traces)100338fd1498Szrj connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
100438fd1498Szrj const_edge cur_best_edge, struct trace *traces)
100538fd1498Szrj {
100638fd1498Szrj int e_index;
100738fd1498Szrj int b_index;
100838fd1498Szrj bool is_better_edge;
100938fd1498Szrj
101038fd1498Szrj if (!cur_best_edge)
101138fd1498Szrj return true;
101238fd1498Szrj
101338fd1498Szrj if (optimize_function_for_size_p (cfun))
101438fd1498Szrj {
101538fd1498Szrj e_index = src_index_p ? e->src->index : e->dest->index;
101638fd1498Szrj b_index = src_index_p ? cur_best_edge->src->index
101738fd1498Szrj : cur_best_edge->dest->index;
101838fd1498Szrj /* The smaller one is better to keep the original order. */
101938fd1498Szrj return b_index > e_index;
102038fd1498Szrj }
102138fd1498Szrj
102238fd1498Szrj if (src_index_p)
102338fd1498Szrj {
102438fd1498Szrj e_index = e->src->index;
102538fd1498Szrj
102638fd1498Szrj /* We are looking for predecessor, so probabilities are not that
102738fd1498Szrj informative. We do not want to connect A to B becuse A has
102838fd1498Szrj only one sucessor (probablity is 100%) while there is edge
102938fd1498Szrj A' to B where probability is 90% but which is much more frequent. */
103038fd1498Szrj if (e->count () > cur_best_edge->count ())
103138fd1498Szrj /* The edge has higher probability than the temporary best edge. */
103238fd1498Szrj is_better_edge = true;
103338fd1498Szrj else if (e->count () < cur_best_edge->count ())
103438fd1498Szrj /* The edge has lower probability than the temporary best edge. */
103538fd1498Szrj is_better_edge = false;
103638fd1498Szrj if (e->probability > cur_best_edge->probability)
103738fd1498Szrj /* The edge has higher probability than the temporary best edge. */
103838fd1498Szrj is_better_edge = true;
103938fd1498Szrj else if (e->probability < cur_best_edge->probability)
104038fd1498Szrj /* The edge has lower probability than the temporary best edge. */
104138fd1498Szrj is_better_edge = false;
104238fd1498Szrj else if (traces[bbd[e_index].end_of_trace].length > best_len)
104338fd1498Szrj /* The edge and the temporary best edge have equivalent probabilities.
104438fd1498Szrj The edge with longer trace is better. */
104538fd1498Szrj is_better_edge = true;
104638fd1498Szrj else
104738fd1498Szrj is_better_edge = false;
104838fd1498Szrj }
104938fd1498Szrj else
105038fd1498Szrj {
105138fd1498Szrj e_index = e->dest->index;
105238fd1498Szrj
105338fd1498Szrj if (e->probability > cur_best_edge->probability)
105438fd1498Szrj /* The edge has higher probability than the temporary best edge. */
105538fd1498Szrj is_better_edge = true;
105638fd1498Szrj else if (e->probability < cur_best_edge->probability)
105738fd1498Szrj /* The edge has lower probability than the temporary best edge. */
105838fd1498Szrj is_better_edge = false;
105938fd1498Szrj else if (traces[bbd[e_index].start_of_trace].length > best_len)
106038fd1498Szrj /* The edge and the temporary best edge have equivalent probabilities.
106138fd1498Szrj The edge with longer trace is better. */
106238fd1498Szrj is_better_edge = true;
106338fd1498Szrj else
106438fd1498Szrj is_better_edge = false;
106538fd1498Szrj }
106638fd1498Szrj
106738fd1498Szrj return is_better_edge;
106838fd1498Szrj }
106938fd1498Szrj
107038fd1498Szrj /* Connect traces in array TRACES, N_TRACES is the count of traces. */
107138fd1498Szrj
107238fd1498Szrj static void
connect_traces(int n_traces,struct trace * traces)107338fd1498Szrj connect_traces (int n_traces, struct trace *traces)
107438fd1498Szrj {
107538fd1498Szrj int i;
107638fd1498Szrj bool *connected;
107738fd1498Szrj bool two_passes;
107838fd1498Szrj int last_trace;
107938fd1498Szrj int current_pass;
108038fd1498Szrj int current_partition;
108138fd1498Szrj profile_count count_threshold;
108238fd1498Szrj bool for_size = optimize_function_for_size_p (cfun);
108338fd1498Szrj
108438fd1498Szrj count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, 1000);
108538fd1498Szrj
108638fd1498Szrj connected = XCNEWVEC (bool, n_traces);
108738fd1498Szrj last_trace = -1;
108838fd1498Szrj current_pass = 1;
108938fd1498Szrj current_partition = BB_PARTITION (traces[0].first);
109038fd1498Szrj two_passes = false;
109138fd1498Szrj
109238fd1498Szrj if (crtl->has_bb_partition)
109338fd1498Szrj for (i = 0; i < n_traces && !two_passes; i++)
109438fd1498Szrj if (BB_PARTITION (traces[0].first)
109538fd1498Szrj != BB_PARTITION (traces[i].first))
109638fd1498Szrj two_passes = true;
109738fd1498Szrj
109838fd1498Szrj for (i = 0; i < n_traces || (two_passes && current_pass == 1) ; i++)
109938fd1498Szrj {
110038fd1498Szrj int t = i;
110138fd1498Szrj int t2;
110238fd1498Szrj edge e, best;
110338fd1498Szrj int best_len;
110438fd1498Szrj
110538fd1498Szrj if (i >= n_traces)
110638fd1498Szrj {
110738fd1498Szrj gcc_assert (two_passes && current_pass == 1);
110838fd1498Szrj i = 0;
110938fd1498Szrj t = i;
111038fd1498Szrj current_pass = 2;
111138fd1498Szrj if (current_partition == BB_HOT_PARTITION)
111238fd1498Szrj current_partition = BB_COLD_PARTITION;
111338fd1498Szrj else
111438fd1498Szrj current_partition = BB_HOT_PARTITION;
111538fd1498Szrj }
111638fd1498Szrj
111738fd1498Szrj if (connected[t])
111838fd1498Szrj continue;
111938fd1498Szrj
112038fd1498Szrj if (two_passes
112138fd1498Szrj && BB_PARTITION (traces[t].first) != current_partition)
112238fd1498Szrj continue;
112338fd1498Szrj
112438fd1498Szrj connected[t] = true;
112538fd1498Szrj
112638fd1498Szrj /* Find the predecessor traces. */
112738fd1498Szrj for (t2 = t; t2 > 0;)
112838fd1498Szrj {
112938fd1498Szrj edge_iterator ei;
113038fd1498Szrj best = NULL;
113138fd1498Szrj best_len = 0;
113238fd1498Szrj FOR_EACH_EDGE (e, ei, traces[t2].first->preds)
113338fd1498Szrj {
113438fd1498Szrj int si = e->src->index;
113538fd1498Szrj
113638fd1498Szrj if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
113738fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU)
113838fd1498Szrj && !(e->flags & EDGE_COMPLEX)
113938fd1498Szrj && bbd[si].end_of_trace >= 0
114038fd1498Szrj && !connected[bbd[si].end_of_trace]
114138fd1498Szrj && (BB_PARTITION (e->src) == current_partition)
114238fd1498Szrj && connect_better_edge_p (e, true, best_len, best, traces))
114338fd1498Szrj {
114438fd1498Szrj best = e;
114538fd1498Szrj best_len = traces[bbd[si].end_of_trace].length;
114638fd1498Szrj }
114738fd1498Szrj }
114838fd1498Szrj if (best)
114938fd1498Szrj {
115038fd1498Szrj best->src->aux = best->dest;
115138fd1498Szrj t2 = bbd[best->src->index].end_of_trace;
115238fd1498Szrj connected[t2] = true;
115338fd1498Szrj
115438fd1498Szrj if (dump_file)
115538fd1498Szrj {
115638fd1498Szrj fprintf (dump_file, "Connection: %d %d\n",
115738fd1498Szrj best->src->index, best->dest->index);
115838fd1498Szrj }
115938fd1498Szrj }
116038fd1498Szrj else
116138fd1498Szrj break;
116238fd1498Szrj }
116338fd1498Szrj
116438fd1498Szrj if (last_trace >= 0)
116538fd1498Szrj traces[last_trace].last->aux = traces[t2].first;
116638fd1498Szrj last_trace = t;
116738fd1498Szrj
116838fd1498Szrj /* Find the successor traces. */
116938fd1498Szrj while (1)
117038fd1498Szrj {
117138fd1498Szrj /* Find the continuation of the chain. */
117238fd1498Szrj edge_iterator ei;
117338fd1498Szrj best = NULL;
117438fd1498Szrj best_len = 0;
117538fd1498Szrj FOR_EACH_EDGE (e, ei, traces[t].last->succs)
117638fd1498Szrj {
117738fd1498Szrj int di = e->dest->index;
117838fd1498Szrj
117938fd1498Szrj if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
118038fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU)
118138fd1498Szrj && !(e->flags & EDGE_COMPLEX)
118238fd1498Szrj && bbd[di].start_of_trace >= 0
118338fd1498Szrj && !connected[bbd[di].start_of_trace]
118438fd1498Szrj && (BB_PARTITION (e->dest) == current_partition)
118538fd1498Szrj && connect_better_edge_p (e, false, best_len, best, traces))
118638fd1498Szrj {
118738fd1498Szrj best = e;
118838fd1498Szrj best_len = traces[bbd[di].start_of_trace].length;
118938fd1498Szrj }
119038fd1498Szrj }
119138fd1498Szrj
119238fd1498Szrj if (for_size)
119338fd1498Szrj {
119438fd1498Szrj if (!best)
119538fd1498Szrj /* Stop finding the successor traces. */
119638fd1498Szrj break;
119738fd1498Szrj
119838fd1498Szrj /* It is OK to connect block n with block n + 1 or a block
119938fd1498Szrj before n. For others, only connect to the loop header. */
120038fd1498Szrj if (best->dest->index > (traces[t].last->index + 1))
120138fd1498Szrj {
120238fd1498Szrj int count = EDGE_COUNT (best->dest->preds);
120338fd1498Szrj
120438fd1498Szrj FOR_EACH_EDGE (e, ei, best->dest->preds)
120538fd1498Szrj if (e->flags & EDGE_DFS_BACK)
120638fd1498Szrj count--;
120738fd1498Szrj
120838fd1498Szrj /* If dest has multiple predecessors, skip it. We expect
120938fd1498Szrj that one predecessor with smaller index connects with it
121038fd1498Szrj later. */
121138fd1498Szrj if (count != 1)
121238fd1498Szrj break;
121338fd1498Szrj }
121438fd1498Szrj
121538fd1498Szrj /* Only connect Trace n with Trace n + 1. It is conservative
121638fd1498Szrj to keep the order as close as possible to the original order.
121738fd1498Szrj It also helps to reduce long jumps. */
121838fd1498Szrj if (last_trace != bbd[best->dest->index].start_of_trace - 1)
121938fd1498Szrj break;
122038fd1498Szrj
122138fd1498Szrj if (dump_file)
122238fd1498Szrj fprintf (dump_file, "Connection: %d %d\n",
122338fd1498Szrj best->src->index, best->dest->index);
122438fd1498Szrj
122538fd1498Szrj t = bbd[best->dest->index].start_of_trace;
122638fd1498Szrj traces[last_trace].last->aux = traces[t].first;
122738fd1498Szrj connected[t] = true;
122838fd1498Szrj last_trace = t;
122938fd1498Szrj }
123038fd1498Szrj else if (best)
123138fd1498Szrj {
123238fd1498Szrj if (dump_file)
123338fd1498Szrj {
123438fd1498Szrj fprintf (dump_file, "Connection: %d %d\n",
123538fd1498Szrj best->src->index, best->dest->index);
123638fd1498Szrj }
123738fd1498Szrj t = bbd[best->dest->index].start_of_trace;
123838fd1498Szrj traces[last_trace].last->aux = traces[t].first;
123938fd1498Szrj connected[t] = true;
124038fd1498Szrj last_trace = t;
124138fd1498Szrj }
124238fd1498Szrj else
124338fd1498Szrj {
124438fd1498Szrj /* Try to connect the traces by duplication of 1 block. */
124538fd1498Szrj edge e2;
124638fd1498Szrj basic_block next_bb = NULL;
124738fd1498Szrj bool try_copy = false;
124838fd1498Szrj
124938fd1498Szrj FOR_EACH_EDGE (e, ei, traces[t].last->succs)
125038fd1498Szrj if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
125138fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU)
125238fd1498Szrj && !(e->flags & EDGE_COMPLEX)
125338fd1498Szrj && (!best || e->probability > best->probability))
125438fd1498Szrj {
125538fd1498Szrj edge_iterator ei;
125638fd1498Szrj edge best2 = NULL;
125738fd1498Szrj int best2_len = 0;
125838fd1498Szrj
125938fd1498Szrj /* If the destination is a start of a trace which is only
126038fd1498Szrj one block long, then no need to search the successor
126138fd1498Szrj blocks of the trace. Accept it. */
126238fd1498Szrj if (bbd[e->dest->index].start_of_trace >= 0
126338fd1498Szrj && traces[bbd[e->dest->index].start_of_trace].length
126438fd1498Szrj == 1)
126538fd1498Szrj {
126638fd1498Szrj best = e;
126738fd1498Szrj try_copy = true;
126838fd1498Szrj continue;
126938fd1498Szrj }
127038fd1498Szrj
127138fd1498Szrj FOR_EACH_EDGE (e2, ei, e->dest->succs)
127238fd1498Szrj {
127338fd1498Szrj int di = e2->dest->index;
127438fd1498Szrj
127538fd1498Szrj if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
127638fd1498Szrj || ((e2->flags & EDGE_CAN_FALLTHRU)
127738fd1498Szrj && !(e2->flags & EDGE_COMPLEX)
127838fd1498Szrj && bbd[di].start_of_trace >= 0
127938fd1498Szrj && !connected[bbd[di].start_of_trace]
128038fd1498Szrj && BB_PARTITION (e2->dest) == current_partition
128138fd1498Szrj && e2->count () >= count_threshold
128238fd1498Szrj && (!best2
128338fd1498Szrj || e2->probability > best2->probability
128438fd1498Szrj || (e2->probability == best2->probability
128538fd1498Szrj && traces[bbd[di].start_of_trace].length
128638fd1498Szrj > best2_len))))
128738fd1498Szrj {
128838fd1498Szrj best = e;
128938fd1498Szrj best2 = e2;
129038fd1498Szrj if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
129138fd1498Szrj best2_len = traces[bbd[di].start_of_trace].length;
129238fd1498Szrj else
129338fd1498Szrj best2_len = INT_MAX;
129438fd1498Szrj next_bb = e2->dest;
129538fd1498Szrj try_copy = true;
129638fd1498Szrj }
129738fd1498Szrj }
129838fd1498Szrj }
129938fd1498Szrj
130038fd1498Szrj /* Copy tiny blocks always; copy larger blocks only when the
130138fd1498Szrj edge is traversed frequently enough. */
130238fd1498Szrj if (try_copy
130338fd1498Szrj && BB_PARTITION (best->src) == BB_PARTITION (best->dest)
130438fd1498Szrj && copy_bb_p (best->dest,
130538fd1498Szrj optimize_edge_for_speed_p (best)
130638fd1498Szrj && (!best->count ().initialized_p ()
130738fd1498Szrj || best->count () >= count_threshold)))
130838fd1498Szrj {
130938fd1498Szrj basic_block new_bb;
131038fd1498Szrj
131138fd1498Szrj if (dump_file)
131238fd1498Szrj {
131338fd1498Szrj fprintf (dump_file, "Connection: %d %d ",
131438fd1498Szrj traces[t].last->index, best->dest->index);
131538fd1498Szrj if (!next_bb)
131638fd1498Szrj fputc ('\n', dump_file);
131738fd1498Szrj else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
131838fd1498Szrj fprintf (dump_file, "exit\n");
131938fd1498Szrj else
132038fd1498Szrj fprintf (dump_file, "%d\n", next_bb->index);
132138fd1498Szrj }
132238fd1498Szrj
132338fd1498Szrj new_bb = copy_bb (best->dest, best, traces[t].last, t);
132438fd1498Szrj traces[t].last = new_bb;
132538fd1498Szrj if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
132638fd1498Szrj {
132738fd1498Szrj t = bbd[next_bb->index].start_of_trace;
132838fd1498Szrj traces[last_trace].last->aux = traces[t].first;
132938fd1498Szrj connected[t] = true;
133038fd1498Szrj last_trace = t;
133138fd1498Szrj }
133238fd1498Szrj else
133338fd1498Szrj break; /* Stop finding the successor traces. */
133438fd1498Szrj }
133538fd1498Szrj else
133638fd1498Szrj break; /* Stop finding the successor traces. */
133738fd1498Szrj }
133838fd1498Szrj }
133938fd1498Szrj }
134038fd1498Szrj
134138fd1498Szrj if (dump_file)
134238fd1498Szrj {
134338fd1498Szrj basic_block bb;
134438fd1498Szrj
134538fd1498Szrj fprintf (dump_file, "Final order:\n");
134638fd1498Szrj for (bb = traces[0].first; bb; bb = (basic_block) bb->aux)
134738fd1498Szrj fprintf (dump_file, "%d ", bb->index);
134838fd1498Szrj fprintf (dump_file, "\n");
134938fd1498Szrj fflush (dump_file);
135038fd1498Szrj }
135138fd1498Szrj
135238fd1498Szrj FREE (connected);
135338fd1498Szrj }
135438fd1498Szrj
135538fd1498Szrj /* Return true when BB can and should be copied. CODE_MAY_GROW is true
135638fd1498Szrj when code size is allowed to grow by duplication. */
135738fd1498Szrj
135838fd1498Szrj static bool
copy_bb_p(const_basic_block bb,int code_may_grow)135938fd1498Szrj copy_bb_p (const_basic_block bb, int code_may_grow)
136038fd1498Szrj {
136138fd1498Szrj int size = 0;
136238fd1498Szrj int max_size = uncond_jump_length;
136338fd1498Szrj rtx_insn *insn;
136438fd1498Szrj
136538fd1498Szrj if (EDGE_COUNT (bb->preds) < 2)
136638fd1498Szrj return false;
136738fd1498Szrj if (!can_duplicate_block_p (bb))
136838fd1498Szrj return false;
136938fd1498Szrj
137038fd1498Szrj /* Avoid duplicating blocks which have many successors (PR/13430). */
137138fd1498Szrj if (EDGE_COUNT (bb->succs) > 8)
137238fd1498Szrj return false;
137338fd1498Szrj
137438fd1498Szrj if (code_may_grow && optimize_bb_for_speed_p (bb))
137538fd1498Szrj max_size *= PARAM_VALUE (PARAM_MAX_GROW_COPY_BB_INSNS);
137638fd1498Szrj
137738fd1498Szrj FOR_BB_INSNS (bb, insn)
137838fd1498Szrj {
137938fd1498Szrj if (INSN_P (insn))
138038fd1498Szrj size += get_attr_min_length (insn);
138138fd1498Szrj }
138238fd1498Szrj
138338fd1498Szrj if (size <= max_size)
138438fd1498Szrj return true;
138538fd1498Szrj
138638fd1498Szrj if (dump_file)
138738fd1498Szrj {
138838fd1498Szrj fprintf (dump_file,
138938fd1498Szrj "Block %d can't be copied because its size = %d.\n",
139038fd1498Szrj bb->index, size);
139138fd1498Szrj }
139238fd1498Szrj
139338fd1498Szrj return false;
139438fd1498Szrj }
139538fd1498Szrj
139638fd1498Szrj /* Return the length of unconditional jump instruction. */
139738fd1498Szrj
139838fd1498Szrj int
get_uncond_jump_length(void)139938fd1498Szrj get_uncond_jump_length (void)
140038fd1498Szrj {
140138fd1498Szrj int length;
140238fd1498Szrj
140338fd1498Szrj start_sequence ();
140438fd1498Szrj rtx_code_label *label = emit_label (gen_label_rtx ());
140538fd1498Szrj rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label));
140638fd1498Szrj length = get_attr_min_length (jump);
140738fd1498Szrj end_sequence ();
140838fd1498Szrj
140938fd1498Szrj return length;
141038fd1498Szrj }
141138fd1498Szrj
1412*58e805e6Szrj /* Create a forwarder block to OLD_BB starting with NEW_LABEL and in the
1413*58e805e6Szrj other partition wrt OLD_BB. */
1414*58e805e6Szrj
1415*58e805e6Szrj static basic_block
create_eh_forwarder_block(rtx_code_label * new_label,basic_block old_bb)1416*58e805e6Szrj create_eh_forwarder_block (rtx_code_label *new_label, basic_block old_bb)
1417*58e805e6Szrj {
1418*58e805e6Szrj /* Split OLD_BB, so that EH pads have always only incoming EH edges,
1419*58e805e6Szrj bb_has_eh_pred bbs are treated specially by DF infrastructure. */
1420*58e805e6Szrj old_bb = split_block_after_labels (old_bb)->dest;
1421*58e805e6Szrj
1422*58e805e6Szrj /* Put the new label and a jump in the new basic block. */
1423*58e805e6Szrj rtx_insn *label = emit_label (new_label);
1424*58e805e6Szrj rtx_code_label *old_label = block_label (old_bb);
1425*58e805e6Szrj rtx_insn *jump = emit_jump_insn (targetm.gen_jump (old_label));
1426*58e805e6Szrj JUMP_LABEL (jump) = old_label;
1427*58e805e6Szrj
1428*58e805e6Szrj /* Create the new basic block and put it in last position. */
1429*58e805e6Szrj basic_block last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
1430*58e805e6Szrj basic_block new_bb = create_basic_block (label, jump, last_bb);
1431*58e805e6Szrj new_bb->aux = last_bb->aux;
1432*58e805e6Szrj new_bb->count = old_bb->count;
1433*58e805e6Szrj last_bb->aux = new_bb;
1434*58e805e6Szrj
1435*58e805e6Szrj emit_barrier_after_bb (new_bb);
1436*58e805e6Szrj
1437*58e805e6Szrj make_single_succ_edge (new_bb, old_bb, 0);
1438*58e805e6Szrj
1439*58e805e6Szrj /* Make sure the new basic block is in the other partition. */
1440*58e805e6Szrj unsigned new_partition = BB_PARTITION (old_bb);
1441*58e805e6Szrj new_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1442*58e805e6Szrj BB_SET_PARTITION (new_bb, new_partition);
1443*58e805e6Szrj
1444*58e805e6Szrj return new_bb;
1445*58e805e6Szrj }
1446*58e805e6Szrj
1447*58e805e6Szrj /* The common landing pad in block OLD_BB has edges from both partitions.
1448*58e805e6Szrj Add a new landing pad that will just jump to the old one and split the
1449*58e805e6Szrj edges so that no EH edge crosses partitions. */
1450*58e805e6Szrj
1451*58e805e6Szrj static void
sjlj_fix_up_crossing_landing_pad(basic_block old_bb)1452*58e805e6Szrj sjlj_fix_up_crossing_landing_pad (basic_block old_bb)
1453*58e805e6Szrj {
1454*58e805e6Szrj const unsigned lp_len = cfun->eh->lp_array->length ();
1455*58e805e6Szrj edge_iterator ei;
1456*58e805e6Szrj edge e;
1457*58e805e6Szrj
1458*58e805e6Szrj /* Generate the new common landing-pad label. */
1459*58e805e6Szrj rtx_code_label *new_label = gen_label_rtx ();
1460*58e805e6Szrj LABEL_PRESERVE_P (new_label) = 1;
1461*58e805e6Szrj
1462*58e805e6Szrj /* Create the forwarder block. */
1463*58e805e6Szrj basic_block new_bb = create_eh_forwarder_block (new_label, old_bb);
1464*58e805e6Szrj
1465*58e805e6Szrj /* Create the map from old to new lp index and initialize it. */
1466*58e805e6Szrj unsigned *index_map = (unsigned *) alloca (lp_len * sizeof (unsigned));
1467*58e805e6Szrj memset (index_map, 0, lp_len * sizeof (unsigned));
1468*58e805e6Szrj
1469*58e805e6Szrj /* Fix up the edges. */
1470*58e805e6Szrj for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1471*58e805e6Szrj if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1472*58e805e6Szrj {
1473*58e805e6Szrj rtx_insn *insn = BB_END (e->src);
1474*58e805e6Szrj rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1475*58e805e6Szrj
1476*58e805e6Szrj gcc_assert (note != NULL);
1477*58e805e6Szrj const unsigned old_index = INTVAL (XEXP (note, 0));
1478*58e805e6Szrj
1479*58e805e6Szrj /* Generate the new landing-pad structure. */
1480*58e805e6Szrj if (index_map[old_index] == 0)
1481*58e805e6Szrj {
1482*58e805e6Szrj eh_landing_pad old_lp = (*cfun->eh->lp_array)[old_index];
1483*58e805e6Szrj eh_landing_pad new_lp = gen_eh_landing_pad (old_lp->region);
1484*58e805e6Szrj new_lp->post_landing_pad = old_lp->post_landing_pad;
1485*58e805e6Szrj new_lp->landing_pad = new_label;
1486*58e805e6Szrj index_map[old_index] = new_lp->index;
1487*58e805e6Szrj }
1488*58e805e6Szrj XEXP (note, 0) = GEN_INT (index_map[old_index]);
1489*58e805e6Szrj
1490*58e805e6Szrj /* Adjust the edge to the new destination. */
1491*58e805e6Szrj redirect_edge_succ (e, new_bb);
1492*58e805e6Szrj }
1493*58e805e6Szrj else
1494*58e805e6Szrj ei_next (&ei);
1495*58e805e6Szrj }
1496*58e805e6Szrj
149738fd1498Szrj /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
149838fd1498Szrj Add a new landing pad that will just jump to the old one and split the
149938fd1498Szrj edges so that no EH edge crosses partitions. */
150038fd1498Szrj
150138fd1498Szrj static void
dw2_fix_up_crossing_landing_pad(eh_landing_pad old_lp,basic_block old_bb)1502*58e805e6Szrj dw2_fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
150338fd1498Szrj {
150438fd1498Szrj eh_landing_pad new_lp;
150538fd1498Szrj edge_iterator ei;
150638fd1498Szrj edge e;
150738fd1498Szrj
150838fd1498Szrj /* Generate the new landing-pad structure. */
150938fd1498Szrj new_lp = gen_eh_landing_pad (old_lp->region);
151038fd1498Szrj new_lp->post_landing_pad = old_lp->post_landing_pad;
151138fd1498Szrj new_lp->landing_pad = gen_label_rtx ();
151238fd1498Szrj LABEL_PRESERVE_P (new_lp->landing_pad) = 1;
151338fd1498Szrj
1514*58e805e6Szrj /* Create the forwarder block. */
1515*58e805e6Szrj basic_block new_bb = create_eh_forwarder_block (new_lp->landing_pad, old_bb);
151638fd1498Szrj
151738fd1498Szrj /* Fix up the edges. */
151838fd1498Szrj for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1519*58e805e6Szrj if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
152038fd1498Szrj {
152138fd1498Szrj rtx_insn *insn = BB_END (e->src);
152238fd1498Szrj rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
152338fd1498Szrj
152438fd1498Szrj gcc_assert (note != NULL);
152538fd1498Szrj gcc_checking_assert (INTVAL (XEXP (note, 0)) == old_lp->index);
152638fd1498Szrj XEXP (note, 0) = GEN_INT (new_lp->index);
152738fd1498Szrj
152838fd1498Szrj /* Adjust the edge to the new destination. */
152938fd1498Szrj redirect_edge_succ (e, new_bb);
153038fd1498Szrj }
153138fd1498Szrj else
153238fd1498Szrj ei_next (&ei);
153338fd1498Szrj }
153438fd1498Szrj
153538fd1498Szrj
153638fd1498Szrj /* Ensure that all hot bbs are included in a hot path through the
153738fd1498Szrj procedure. This is done by calling this function twice, once
153838fd1498Szrj with WALK_UP true (to look for paths from the entry to hot bbs) and
153938fd1498Szrj once with WALK_UP false (to look for paths from hot bbs to the exit).
154038fd1498Szrj Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
154138fd1498Szrj to BBS_IN_HOT_PARTITION. */
154238fd1498Szrj
154338fd1498Szrj static unsigned int
sanitize_hot_paths(bool walk_up,unsigned int cold_bb_count,vec<basic_block> * bbs_in_hot_partition)154438fd1498Szrj sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
154538fd1498Szrj vec<basic_block> *bbs_in_hot_partition)
154638fd1498Szrj {
154738fd1498Szrj /* Callers check this. */
154838fd1498Szrj gcc_checking_assert (cold_bb_count);
154938fd1498Szrj
155038fd1498Szrj /* Keep examining hot bbs while we still have some left to check
155138fd1498Szrj and there are remaining cold bbs. */
155238fd1498Szrj vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
155338fd1498Szrj while (! hot_bbs_to_check.is_empty ()
155438fd1498Szrj && cold_bb_count)
155538fd1498Szrj {
155638fd1498Szrj basic_block bb = hot_bbs_to_check.pop ();
155738fd1498Szrj vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
155838fd1498Szrj edge e;
155938fd1498Szrj edge_iterator ei;
156038fd1498Szrj profile_probability highest_probability
156138fd1498Szrj = profile_probability::uninitialized ();
156238fd1498Szrj profile_count highest_count = profile_count::uninitialized ();
156338fd1498Szrj bool found = false;
156438fd1498Szrj
156538fd1498Szrj /* Walk the preds/succs and check if there is at least one already
156638fd1498Szrj marked hot. Keep track of the most frequent pred/succ so that we
156738fd1498Szrj can mark it hot if we don't find one. */
156838fd1498Szrj FOR_EACH_EDGE (e, ei, edges)
156938fd1498Szrj {
157038fd1498Szrj basic_block reach_bb = walk_up ? e->src : e->dest;
157138fd1498Szrj
157238fd1498Szrj if (e->flags & EDGE_DFS_BACK)
157338fd1498Szrj continue;
157438fd1498Szrj
157538fd1498Szrj /* Do not expect profile insanities when profile was not adjusted. */
157638fd1498Szrj if (e->probability == profile_probability::never ()
157738fd1498Szrj || e->count () == profile_count::zero ())
157838fd1498Szrj continue;
157938fd1498Szrj
158038fd1498Szrj if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
158138fd1498Szrj {
158238fd1498Szrj found = true;
158338fd1498Szrj break;
158438fd1498Szrj }
158538fd1498Szrj /* The following loop will look for the hottest edge via
158638fd1498Szrj the edge count, if it is non-zero, then fallback to
158738fd1498Szrj the edge probability. */
158838fd1498Szrj if (!(e->count () > highest_count))
158938fd1498Szrj highest_count = e->count ();
159038fd1498Szrj if (!highest_probability.initialized_p ()
159138fd1498Szrj || e->probability > highest_probability)
159238fd1498Szrj highest_probability = e->probability;
159338fd1498Szrj }
159438fd1498Szrj
159538fd1498Szrj /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
159638fd1498Szrj block (or unpartitioned, e.g. the entry block) then it is ok. If not,
159738fd1498Szrj then the most frequent pred (or succ) needs to be adjusted. In the
159838fd1498Szrj case where multiple preds/succs have the same frequency (e.g. a
159938fd1498Szrj 50-50 branch), then both will be adjusted. */
160038fd1498Szrj if (found)
160138fd1498Szrj continue;
160238fd1498Szrj
160338fd1498Szrj FOR_EACH_EDGE (e, ei, edges)
160438fd1498Szrj {
160538fd1498Szrj if (e->flags & EDGE_DFS_BACK)
160638fd1498Szrj continue;
160738fd1498Szrj /* Do not expect profile insanities when profile was not adjusted. */
160838fd1498Szrj if (e->probability == profile_probability::never ()
160938fd1498Szrj || e->count () == profile_count::zero ())
161038fd1498Szrj continue;
161138fd1498Szrj /* Select the hottest edge using the edge count, if it is non-zero,
161238fd1498Szrj then fallback to the edge probability. */
161338fd1498Szrj if (highest_count.initialized_p ())
161438fd1498Szrj {
161538fd1498Szrj if (!(e->count () >= highest_count))
161638fd1498Szrj continue;
161738fd1498Szrj }
161838fd1498Szrj else if (!(e->probability >= highest_probability))
161938fd1498Szrj continue;
162038fd1498Szrj
162138fd1498Szrj basic_block reach_bb = walk_up ? e->src : e->dest;
162238fd1498Szrj
162338fd1498Szrj /* We have a hot bb with an immediate dominator that is cold.
162438fd1498Szrj The dominator needs to be re-marked hot. */
162538fd1498Szrj BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
162638fd1498Szrj if (dump_file)
162738fd1498Szrj fprintf (dump_file, "Promoting bb %i to hot partition to sanitize "
162838fd1498Szrj "profile of bb %i in %s walk\n", reach_bb->index,
162938fd1498Szrj bb->index, walk_up ? "backward" : "forward");
163038fd1498Szrj cold_bb_count--;
163138fd1498Szrj
163238fd1498Szrj /* Now we need to examine newly-hot reach_bb to see if it is also
163338fd1498Szrj dominated by a cold bb. */
163438fd1498Szrj bbs_in_hot_partition->safe_push (reach_bb);
163538fd1498Szrj hot_bbs_to_check.safe_push (reach_bb);
163638fd1498Szrj }
163738fd1498Szrj }
1638*58e805e6Szrj hot_bbs_to_check.release ();
163938fd1498Szrj
164038fd1498Szrj return cold_bb_count;
164138fd1498Szrj }
164238fd1498Szrj
164338fd1498Szrj
164438fd1498Szrj /* Find the basic blocks that are rarely executed and need to be moved to
164538fd1498Szrj a separate section of the .o file (to cut down on paging and improve
164638fd1498Szrj cache locality). Return a vector of all edges that cross. */
164738fd1498Szrj
164838fd1498Szrj static vec<edge>
find_rarely_executed_basic_blocks_and_crossing_edges(void)164938fd1498Szrj find_rarely_executed_basic_blocks_and_crossing_edges (void)
165038fd1498Szrj {
165138fd1498Szrj vec<edge> crossing_edges = vNULL;
165238fd1498Szrj basic_block bb;
165338fd1498Szrj edge e;
165438fd1498Szrj edge_iterator ei;
165538fd1498Szrj unsigned int cold_bb_count = 0;
165638fd1498Szrj auto_vec<basic_block> bbs_in_hot_partition;
165738fd1498Szrj
165838fd1498Szrj propagate_unlikely_bbs_forward ();
165938fd1498Szrj
166038fd1498Szrj /* Mark which partition (hot/cold) each basic block belongs in. */
166138fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
166238fd1498Szrj {
166338fd1498Szrj bool cold_bb = false;
166438fd1498Szrj
166538fd1498Szrj if (probably_never_executed_bb_p (cfun, bb))
166638fd1498Szrj {
166738fd1498Szrj /* Handle profile insanities created by upstream optimizations
166838fd1498Szrj by also checking the incoming edge weights. If there is a non-cold
166938fd1498Szrj incoming edge, conservatively prevent this block from being split
167038fd1498Szrj into the cold section. */
167138fd1498Szrj cold_bb = true;
167238fd1498Szrj FOR_EACH_EDGE (e, ei, bb->preds)
167338fd1498Szrj if (!probably_never_executed_edge_p (cfun, e))
167438fd1498Szrj {
167538fd1498Szrj cold_bb = false;
167638fd1498Szrj break;
167738fd1498Szrj }
167838fd1498Szrj }
167938fd1498Szrj if (cold_bb)
168038fd1498Szrj {
168138fd1498Szrj BB_SET_PARTITION (bb, BB_COLD_PARTITION);
168238fd1498Szrj cold_bb_count++;
168338fd1498Szrj }
168438fd1498Szrj else
168538fd1498Szrj {
168638fd1498Szrj BB_SET_PARTITION (bb, BB_HOT_PARTITION);
168738fd1498Szrj bbs_in_hot_partition.safe_push (bb);
168838fd1498Szrj }
168938fd1498Szrj }
169038fd1498Szrj
169138fd1498Szrj /* Ensure that hot bbs are included along a hot path from the entry to exit.
169238fd1498Szrj Several different possibilities may include cold bbs along all paths
169338fd1498Szrj to/from a hot bb. One is that there are edge weight insanities
169438fd1498Szrj due to optimization phases that do not properly update basic block profile
169538fd1498Szrj counts. The second is that the entry of the function may not be hot, because
169638fd1498Szrj it is entered fewer times than the number of profile training runs, but there
169738fd1498Szrj is a loop inside the function that causes blocks within the function to be
169838fd1498Szrj above the threshold for hotness. This is fixed by walking up from hot bbs
169938fd1498Szrj to the entry block, and then down from hot bbs to the exit, performing
170038fd1498Szrj partitioning fixups as necessary. */
170138fd1498Szrj if (cold_bb_count)
170238fd1498Szrj {
170338fd1498Szrj mark_dfs_back_edges ();
170438fd1498Szrj cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
170538fd1498Szrj &bbs_in_hot_partition);
170638fd1498Szrj if (cold_bb_count)
170738fd1498Szrj sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
170838fd1498Szrj
170938fd1498Szrj hash_set <basic_block> set;
171038fd1498Szrj find_bbs_reachable_by_hot_paths (&set);
171138fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
171238fd1498Szrj if (!set.contains (bb))
171338fd1498Szrj BB_SET_PARTITION (bb, BB_COLD_PARTITION);
171438fd1498Szrj }
171538fd1498Szrj
171638fd1498Szrj /* The format of .gcc_except_table does not allow landing pads to
171738fd1498Szrj be in a different partition as the throw. Fix this by either
1718*58e805e6Szrj moving the landing pads or inserting forwarder landing pads. */
171938fd1498Szrj if (cfun->eh->lp_array)
172038fd1498Szrj {
1721*58e805e6Szrj const bool sjlj
1722*58e805e6Szrj = (targetm_common.except_unwind_info (&global_options) == UI_SJLJ);
172338fd1498Szrj unsigned i;
172438fd1498Szrj eh_landing_pad lp;
172538fd1498Szrj
172638fd1498Szrj FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp)
172738fd1498Szrj {
172838fd1498Szrj bool all_same, all_diff;
172938fd1498Szrj
173038fd1498Szrj if (lp == NULL
173138fd1498Szrj || lp->landing_pad == NULL_RTX
173238fd1498Szrj || !LABEL_P (lp->landing_pad))
173338fd1498Szrj continue;
173438fd1498Szrj
173538fd1498Szrj all_same = all_diff = true;
173638fd1498Szrj bb = BLOCK_FOR_INSN (lp->landing_pad);
173738fd1498Szrj FOR_EACH_EDGE (e, ei, bb->preds)
173838fd1498Szrj {
173938fd1498Szrj gcc_assert (e->flags & EDGE_EH);
174038fd1498Szrj if (BB_PARTITION (bb) == BB_PARTITION (e->src))
174138fd1498Szrj all_diff = false;
174238fd1498Szrj else
174338fd1498Szrj all_same = false;
174438fd1498Szrj }
174538fd1498Szrj
174638fd1498Szrj if (all_same)
174738fd1498Szrj ;
174838fd1498Szrj else if (all_diff)
174938fd1498Szrj {
175038fd1498Szrj int which = BB_PARTITION (bb);
175138fd1498Szrj which ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
175238fd1498Szrj BB_SET_PARTITION (bb, which);
175338fd1498Szrj }
1754*58e805e6Szrj else if (sjlj)
1755*58e805e6Szrj sjlj_fix_up_crossing_landing_pad (bb);
175638fd1498Szrj else
1757*58e805e6Szrj dw2_fix_up_crossing_landing_pad (lp, bb);
1758*58e805e6Szrj
1759*58e805e6Szrj /* There is a single, common landing pad in SJLJ mode. */
1760*58e805e6Szrj if (sjlj)
1761*58e805e6Szrj break;
176238fd1498Szrj }
176338fd1498Szrj }
176438fd1498Szrj
176538fd1498Szrj /* Mark every edge that crosses between sections. */
176638fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
176738fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
176838fd1498Szrj {
176938fd1498Szrj unsigned int flags = e->flags;
177038fd1498Szrj
177138fd1498Szrj /* We should never have EDGE_CROSSING set yet. */
177238fd1498Szrj gcc_checking_assert ((flags & EDGE_CROSSING) == 0);
177338fd1498Szrj
177438fd1498Szrj if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
177538fd1498Szrj && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
177638fd1498Szrj && BB_PARTITION (e->src) != BB_PARTITION (e->dest))
177738fd1498Szrj {
177838fd1498Szrj crossing_edges.safe_push (e);
177938fd1498Szrj flags |= EDGE_CROSSING;
178038fd1498Szrj }
178138fd1498Szrj
178238fd1498Szrj /* Now that we've split eh edges as appropriate, allow landing pads
178338fd1498Szrj to be merged with the post-landing pads. */
178438fd1498Szrj flags &= ~EDGE_PRESERVE;
178538fd1498Szrj
178638fd1498Szrj e->flags = flags;
178738fd1498Szrj }
178838fd1498Szrj
178938fd1498Szrj return crossing_edges;
179038fd1498Szrj }
179138fd1498Szrj
179238fd1498Szrj /* Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. */
179338fd1498Szrj
179438fd1498Szrj static void
set_edge_can_fallthru_flag(void)179538fd1498Szrj set_edge_can_fallthru_flag (void)
179638fd1498Szrj {
179738fd1498Szrj basic_block bb;
179838fd1498Szrj
179938fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
180038fd1498Szrj {
180138fd1498Szrj edge e;
180238fd1498Szrj edge_iterator ei;
180338fd1498Szrj
180438fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
180538fd1498Szrj {
180638fd1498Szrj e->flags &= ~EDGE_CAN_FALLTHRU;
180738fd1498Szrj
180838fd1498Szrj /* The FALLTHRU edge is also CAN_FALLTHRU edge. */
180938fd1498Szrj if (e->flags & EDGE_FALLTHRU)
181038fd1498Szrj e->flags |= EDGE_CAN_FALLTHRU;
181138fd1498Szrj }
181238fd1498Szrj
181338fd1498Szrj /* If the BB ends with an invertible condjump all (2) edges are
181438fd1498Szrj CAN_FALLTHRU edges. */
181538fd1498Szrj if (EDGE_COUNT (bb->succs) != 2)
181638fd1498Szrj continue;
181738fd1498Szrj if (!any_condjump_p (BB_END (bb)))
181838fd1498Szrj continue;
181938fd1498Szrj
182038fd1498Szrj rtx_jump_insn *bb_end_jump = as_a <rtx_jump_insn *> (BB_END (bb));
182138fd1498Szrj if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0))
182238fd1498Szrj continue;
182338fd1498Szrj invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0);
182438fd1498Szrj EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU;
182538fd1498Szrj EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU;
182638fd1498Szrj }
182738fd1498Szrj }
182838fd1498Szrj
182938fd1498Szrj /* If any destination of a crossing edge does not have a label, add label;
183038fd1498Szrj Convert any easy fall-through crossing edges to unconditional jumps. */
183138fd1498Szrj
183238fd1498Szrj static void
add_labels_and_missing_jumps(vec<edge> crossing_edges)183338fd1498Szrj add_labels_and_missing_jumps (vec<edge> crossing_edges)
183438fd1498Szrj {
183538fd1498Szrj size_t i;
183638fd1498Szrj edge e;
183738fd1498Szrj
183838fd1498Szrj FOR_EACH_VEC_ELT (crossing_edges, i, e)
183938fd1498Szrj {
184038fd1498Szrj basic_block src = e->src;
184138fd1498Szrj basic_block dest = e->dest;
184238fd1498Szrj rtx_jump_insn *new_jump;
184338fd1498Szrj
184438fd1498Szrj if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
184538fd1498Szrj continue;
184638fd1498Szrj
184738fd1498Szrj /* Make sure dest has a label. */
184838fd1498Szrj rtx_code_label *label = block_label (dest);
184938fd1498Szrj
185038fd1498Szrj /* Nothing to do for non-fallthru edges. */
185138fd1498Szrj if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun))
185238fd1498Szrj continue;
185338fd1498Szrj if ((e->flags & EDGE_FALLTHRU) == 0)
185438fd1498Szrj continue;
185538fd1498Szrj
185638fd1498Szrj /* If the block does not end with a control flow insn, then we
185738fd1498Szrj can trivially add a jump to the end to fixup the crossing.
185838fd1498Szrj Otherwise the jump will have to go in a new bb, which will
185938fd1498Szrj be handled by fix_up_fall_thru_edges function. */
186038fd1498Szrj if (control_flow_insn_p (BB_END (src)))
186138fd1498Szrj continue;
186238fd1498Szrj
186338fd1498Szrj /* Make sure there's only one successor. */
186438fd1498Szrj gcc_assert (single_succ_p (src));
186538fd1498Szrj
186638fd1498Szrj new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src));
186738fd1498Szrj BB_END (src) = new_jump;
186838fd1498Szrj JUMP_LABEL (new_jump) = label;
186938fd1498Szrj LABEL_NUSES (label) += 1;
187038fd1498Szrj
187138fd1498Szrj emit_barrier_after_bb (src);
187238fd1498Szrj
187338fd1498Szrj /* Mark edge as non-fallthru. */
187438fd1498Szrj e->flags &= ~EDGE_FALLTHRU;
187538fd1498Szrj }
187638fd1498Szrj }
187738fd1498Szrj
187838fd1498Szrj /* Find any bb's where the fall-through edge is a crossing edge (note that
187938fd1498Szrj these bb's must also contain a conditional jump or end with a call
188038fd1498Szrj instruction; we've already dealt with fall-through edges for blocks
188138fd1498Szrj that didn't have a conditional jump or didn't end with call instruction
188238fd1498Szrj in the call to add_labels_and_missing_jumps). Convert the fall-through
188338fd1498Szrj edge to non-crossing edge by inserting a new bb to fall-through into.
188438fd1498Szrj The new bb will contain an unconditional jump (crossing edge) to the
188538fd1498Szrj original fall through destination. */
188638fd1498Szrj
188738fd1498Szrj static void
fix_up_fall_thru_edges(void)188838fd1498Szrj fix_up_fall_thru_edges (void)
188938fd1498Szrj {
189038fd1498Szrj basic_block cur_bb;
189138fd1498Szrj
189238fd1498Szrj FOR_EACH_BB_FN (cur_bb, cfun)
189338fd1498Szrj {
189438fd1498Szrj edge succ1;
189538fd1498Szrj edge succ2;
189638fd1498Szrj edge fall_thru = NULL;
189738fd1498Szrj edge cond_jump = NULL;
189838fd1498Szrj
189938fd1498Szrj fall_thru = NULL;
190038fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 0)
190138fd1498Szrj succ1 = EDGE_SUCC (cur_bb, 0);
190238fd1498Szrj else
190338fd1498Szrj succ1 = NULL;
190438fd1498Szrj
190538fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 1)
190638fd1498Szrj succ2 = EDGE_SUCC (cur_bb, 1);
190738fd1498Szrj else
190838fd1498Szrj succ2 = NULL;
190938fd1498Szrj
191038fd1498Szrj /* Find the fall-through edge. */
191138fd1498Szrj
191238fd1498Szrj if (succ1
191338fd1498Szrj && (succ1->flags & EDGE_FALLTHRU))
191438fd1498Szrj {
191538fd1498Szrj fall_thru = succ1;
191638fd1498Szrj cond_jump = succ2;
191738fd1498Szrj }
191838fd1498Szrj else if (succ2
191938fd1498Szrj && (succ2->flags & EDGE_FALLTHRU))
192038fd1498Szrj {
192138fd1498Szrj fall_thru = succ2;
192238fd1498Szrj cond_jump = succ1;
192338fd1498Szrj }
192438fd1498Szrj else if (succ2 && EDGE_COUNT (cur_bb->succs) > 2)
192538fd1498Szrj fall_thru = find_fallthru_edge (cur_bb->succs);
192638fd1498Szrj
192738fd1498Szrj if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)))
192838fd1498Szrj {
192938fd1498Szrj /* Check to see if the fall-thru edge is a crossing edge. */
193038fd1498Szrj
193138fd1498Szrj if (fall_thru->flags & EDGE_CROSSING)
193238fd1498Szrj {
193338fd1498Szrj /* The fall_thru edge crosses; now check the cond jump edge, if
193438fd1498Szrj it exists. */
193538fd1498Szrj
193638fd1498Szrj bool cond_jump_crosses = true;
193738fd1498Szrj int invert_worked = 0;
193838fd1498Szrj rtx_insn *old_jump = BB_END (cur_bb);
193938fd1498Szrj
194038fd1498Szrj /* Find the jump instruction, if there is one. */
194138fd1498Szrj
194238fd1498Szrj if (cond_jump)
194338fd1498Szrj {
194438fd1498Szrj if (!(cond_jump->flags & EDGE_CROSSING))
194538fd1498Szrj cond_jump_crosses = false;
194638fd1498Szrj
194738fd1498Szrj /* We know the fall-thru edge crosses; if the cond
194838fd1498Szrj jump edge does NOT cross, and its destination is the
194938fd1498Szrj next block in the bb order, invert the jump
195038fd1498Szrj (i.e. fix it so the fall through does not cross and
195138fd1498Szrj the cond jump does). */
195238fd1498Szrj
195338fd1498Szrj if (!cond_jump_crosses)
195438fd1498Szrj {
195538fd1498Szrj /* Find label in fall_thru block. We've already added
195638fd1498Szrj any missing labels, so there must be one. */
195738fd1498Szrj
195838fd1498Szrj rtx_code_label *fall_thru_label
195938fd1498Szrj = block_label (fall_thru->dest);
196038fd1498Szrj
196138fd1498Szrj if (old_jump && fall_thru_label)
196238fd1498Szrj {
196338fd1498Szrj rtx_jump_insn *old_jump_insn
196438fd1498Szrj = dyn_cast <rtx_jump_insn *> (old_jump);
196538fd1498Szrj if (old_jump_insn)
196638fd1498Szrj invert_worked = invert_jump (old_jump_insn,
196738fd1498Szrj fall_thru_label, 0);
196838fd1498Szrj }
196938fd1498Szrj
197038fd1498Szrj if (invert_worked)
197138fd1498Szrj {
197238fd1498Szrj fall_thru->flags &= ~EDGE_FALLTHRU;
197338fd1498Szrj cond_jump->flags |= EDGE_FALLTHRU;
197438fd1498Szrj update_br_prob_note (cur_bb);
197538fd1498Szrj std::swap (fall_thru, cond_jump);
197638fd1498Szrj cond_jump->flags |= EDGE_CROSSING;
197738fd1498Szrj fall_thru->flags &= ~EDGE_CROSSING;
197838fd1498Szrj }
197938fd1498Szrj }
198038fd1498Szrj }
198138fd1498Szrj
198238fd1498Szrj if (cond_jump_crosses || !invert_worked)
198338fd1498Szrj {
198438fd1498Szrj /* This is the case where both edges out of the basic
198538fd1498Szrj block are crossing edges. Here we will fix up the
198638fd1498Szrj fall through edge. The jump edge will be taken care
198738fd1498Szrj of later. The EDGE_CROSSING flag of fall_thru edge
198838fd1498Szrj is unset before the call to force_nonfallthru
198938fd1498Szrj function because if a new basic-block is created
199038fd1498Szrj this edge remains in the current section boundary
199138fd1498Szrj while the edge between new_bb and the fall_thru->dest
199238fd1498Szrj becomes EDGE_CROSSING. */
199338fd1498Szrj
199438fd1498Szrj fall_thru->flags &= ~EDGE_CROSSING;
199538fd1498Szrj basic_block new_bb = force_nonfallthru (fall_thru);
199638fd1498Szrj
199738fd1498Szrj if (new_bb)
199838fd1498Szrj {
199938fd1498Szrj new_bb->aux = cur_bb->aux;
200038fd1498Szrj cur_bb->aux = new_bb;
200138fd1498Szrj
200238fd1498Szrj /* This is done by force_nonfallthru_and_redirect. */
200338fd1498Szrj gcc_assert (BB_PARTITION (new_bb)
200438fd1498Szrj == BB_PARTITION (cur_bb));
200538fd1498Szrj
200638fd1498Szrj single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
200738fd1498Szrj }
200838fd1498Szrj else
200938fd1498Szrj {
201038fd1498Szrj /* If a new basic-block was not created; restore
201138fd1498Szrj the EDGE_CROSSING flag. */
201238fd1498Szrj fall_thru->flags |= EDGE_CROSSING;
201338fd1498Szrj }
201438fd1498Szrj
201538fd1498Szrj /* Add barrier after new jump */
201638fd1498Szrj emit_barrier_after_bb (new_bb ? new_bb : cur_bb);
201738fd1498Szrj }
201838fd1498Szrj }
201938fd1498Szrj }
202038fd1498Szrj }
202138fd1498Szrj }
202238fd1498Szrj
202338fd1498Szrj /* This function checks the destination block of a "crossing jump" to
202438fd1498Szrj see if it has any crossing predecessors that begin with a code label
202538fd1498Szrj and end with an unconditional jump. If so, it returns that predecessor
202638fd1498Szrj block. (This is to avoid creating lots of new basic blocks that all
202738fd1498Szrj contain unconditional jumps to the same destination). */
202838fd1498Szrj
202938fd1498Szrj static basic_block
find_jump_block(basic_block jump_dest)203038fd1498Szrj find_jump_block (basic_block jump_dest)
203138fd1498Szrj {
203238fd1498Szrj basic_block source_bb = NULL;
203338fd1498Szrj edge e;
203438fd1498Szrj rtx_insn *insn;
203538fd1498Szrj edge_iterator ei;
203638fd1498Szrj
203738fd1498Szrj FOR_EACH_EDGE (e, ei, jump_dest->preds)
203838fd1498Szrj if (e->flags & EDGE_CROSSING)
203938fd1498Szrj {
204038fd1498Szrj basic_block src = e->src;
204138fd1498Szrj
204238fd1498Szrj /* Check each predecessor to see if it has a label, and contains
204338fd1498Szrj only one executable instruction, which is an unconditional jump.
204438fd1498Szrj If so, we can use it. */
204538fd1498Szrj
204638fd1498Szrj if (LABEL_P (BB_HEAD (src)))
204738fd1498Szrj for (insn = BB_HEAD (src);
204838fd1498Szrj !INSN_P (insn) && insn != NEXT_INSN (BB_END (src));
204938fd1498Szrj insn = NEXT_INSN (insn))
205038fd1498Szrj {
205138fd1498Szrj if (INSN_P (insn)
205238fd1498Szrj && insn == BB_END (src)
205338fd1498Szrj && JUMP_P (insn)
205438fd1498Szrj && !any_condjump_p (insn))
205538fd1498Szrj {
205638fd1498Szrj source_bb = src;
205738fd1498Szrj break;
205838fd1498Szrj }
205938fd1498Szrj }
206038fd1498Szrj
206138fd1498Szrj if (source_bb)
206238fd1498Szrj break;
206338fd1498Szrj }
206438fd1498Szrj
206538fd1498Szrj return source_bb;
206638fd1498Szrj }
206738fd1498Szrj
206838fd1498Szrj /* Find all BB's with conditional jumps that are crossing edges;
206938fd1498Szrj insert a new bb and make the conditional jump branch to the new
207038fd1498Szrj bb instead (make the new bb same color so conditional branch won't
207138fd1498Szrj be a 'crossing' edge). Insert an unconditional jump from the
207238fd1498Szrj new bb to the original destination of the conditional jump. */
207338fd1498Szrj
207438fd1498Szrj static void
fix_crossing_conditional_branches(void)207538fd1498Szrj fix_crossing_conditional_branches (void)
207638fd1498Szrj {
207738fd1498Szrj basic_block cur_bb;
207838fd1498Szrj basic_block new_bb;
207938fd1498Szrj basic_block dest;
208038fd1498Szrj edge succ1;
208138fd1498Szrj edge succ2;
208238fd1498Szrj edge crossing_edge;
208338fd1498Szrj edge new_edge;
208438fd1498Szrj rtx set_src;
208538fd1498Szrj rtx old_label = NULL_RTX;
208638fd1498Szrj rtx_code_label *new_label;
208738fd1498Szrj
208838fd1498Szrj FOR_EACH_BB_FN (cur_bb, cfun)
208938fd1498Szrj {
209038fd1498Szrj crossing_edge = NULL;
209138fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 0)
209238fd1498Szrj succ1 = EDGE_SUCC (cur_bb, 0);
209338fd1498Szrj else
209438fd1498Szrj succ1 = NULL;
209538fd1498Szrj
209638fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 1)
209738fd1498Szrj succ2 = EDGE_SUCC (cur_bb, 1);
209838fd1498Szrj else
209938fd1498Szrj succ2 = NULL;
210038fd1498Szrj
210138fd1498Szrj /* We already took care of fall-through edges, so only one successor
210238fd1498Szrj can be a crossing edge. */
210338fd1498Szrj
210438fd1498Szrj if (succ1 && (succ1->flags & EDGE_CROSSING))
210538fd1498Szrj crossing_edge = succ1;
210638fd1498Szrj else if (succ2 && (succ2->flags & EDGE_CROSSING))
210738fd1498Szrj crossing_edge = succ2;
210838fd1498Szrj
210938fd1498Szrj if (crossing_edge)
211038fd1498Szrj {
211138fd1498Szrj rtx_insn *old_jump = BB_END (cur_bb);
211238fd1498Szrj
211338fd1498Szrj /* Check to make sure the jump instruction is a
211438fd1498Szrj conditional jump. */
211538fd1498Szrj
211638fd1498Szrj set_src = NULL_RTX;
211738fd1498Szrj
211838fd1498Szrj if (any_condjump_p (old_jump))
211938fd1498Szrj {
212038fd1498Szrj if (GET_CODE (PATTERN (old_jump)) == SET)
212138fd1498Szrj set_src = SET_SRC (PATTERN (old_jump));
212238fd1498Szrj else if (GET_CODE (PATTERN (old_jump)) == PARALLEL)
212338fd1498Szrj {
212438fd1498Szrj set_src = XVECEXP (PATTERN (old_jump), 0,0);
212538fd1498Szrj if (GET_CODE (set_src) == SET)
212638fd1498Szrj set_src = SET_SRC (set_src);
212738fd1498Szrj else
212838fd1498Szrj set_src = NULL_RTX;
212938fd1498Szrj }
213038fd1498Szrj }
213138fd1498Szrj
213238fd1498Szrj if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE))
213338fd1498Szrj {
213438fd1498Szrj rtx_jump_insn *old_jump_insn =
213538fd1498Szrj as_a <rtx_jump_insn *> (old_jump);
213638fd1498Szrj
213738fd1498Szrj if (GET_CODE (XEXP (set_src, 1)) == PC)
213838fd1498Szrj old_label = XEXP (set_src, 2);
213938fd1498Szrj else if (GET_CODE (XEXP (set_src, 2)) == PC)
214038fd1498Szrj old_label = XEXP (set_src, 1);
214138fd1498Szrj
214238fd1498Szrj /* Check to see if new bb for jumping to that dest has
214338fd1498Szrj already been created; if so, use it; if not, create
214438fd1498Szrj a new one. */
214538fd1498Szrj
214638fd1498Szrj new_bb = find_jump_block (crossing_edge->dest);
214738fd1498Szrj
214838fd1498Szrj if (new_bb)
214938fd1498Szrj new_label = block_label (new_bb);
215038fd1498Szrj else
215138fd1498Szrj {
215238fd1498Szrj basic_block last_bb;
215338fd1498Szrj rtx_code_label *old_jump_target;
215438fd1498Szrj rtx_jump_insn *new_jump;
215538fd1498Szrj
215638fd1498Szrj /* Create new basic block to be dest for
215738fd1498Szrj conditional jump. */
215838fd1498Szrj
215938fd1498Szrj /* Put appropriate instructions in new bb. */
216038fd1498Szrj
216138fd1498Szrj new_label = gen_label_rtx ();
216238fd1498Szrj emit_label (new_label);
216338fd1498Szrj
216438fd1498Szrj gcc_assert (GET_CODE (old_label) == LABEL_REF);
216538fd1498Szrj old_jump_target = old_jump_insn->jump_target ();
216638fd1498Szrj new_jump = as_a <rtx_jump_insn *>
216738fd1498Szrj (emit_jump_insn (targetm.gen_jump (old_jump_target)));
216838fd1498Szrj new_jump->set_jump_target (old_jump_target);
216938fd1498Szrj
217038fd1498Szrj last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
217138fd1498Szrj new_bb = create_basic_block (new_label, new_jump, last_bb);
217238fd1498Szrj new_bb->aux = last_bb->aux;
217338fd1498Szrj last_bb->aux = new_bb;
217438fd1498Szrj
217538fd1498Szrj emit_barrier_after_bb (new_bb);
217638fd1498Szrj
217738fd1498Szrj /* Make sure new bb is in same partition as source
217838fd1498Szrj of conditional branch. */
217938fd1498Szrj BB_COPY_PARTITION (new_bb, cur_bb);
218038fd1498Szrj }
218138fd1498Szrj
218238fd1498Szrj /* Make old jump branch to new bb. */
218338fd1498Szrj
218438fd1498Szrj redirect_jump (old_jump_insn, new_label, 0);
218538fd1498Szrj
218638fd1498Szrj /* Remove crossing_edge as predecessor of 'dest'. */
218738fd1498Szrj
218838fd1498Szrj dest = crossing_edge->dest;
218938fd1498Szrj
219038fd1498Szrj redirect_edge_succ (crossing_edge, new_bb);
219138fd1498Szrj
219238fd1498Szrj /* Make a new edge from new_bb to old dest; new edge
219338fd1498Szrj will be a successor for new_bb and a predecessor
219438fd1498Szrj for 'dest'. */
219538fd1498Szrj
219638fd1498Szrj if (EDGE_COUNT (new_bb->succs) == 0)
219738fd1498Szrj new_edge = make_single_succ_edge (new_bb, dest, 0);
219838fd1498Szrj else
219938fd1498Szrj new_edge = EDGE_SUCC (new_bb, 0);
220038fd1498Szrj
220138fd1498Szrj crossing_edge->flags &= ~EDGE_CROSSING;
220238fd1498Szrj new_edge->flags |= EDGE_CROSSING;
220338fd1498Szrj }
220438fd1498Szrj }
220538fd1498Szrj }
220638fd1498Szrj }
220738fd1498Szrj
220838fd1498Szrj /* Find any unconditional branches that cross between hot and cold
220938fd1498Szrj sections. Convert them into indirect jumps instead. */
221038fd1498Szrj
221138fd1498Szrj static void
fix_crossing_unconditional_branches(void)221238fd1498Szrj fix_crossing_unconditional_branches (void)
221338fd1498Szrj {
221438fd1498Szrj basic_block cur_bb;
221538fd1498Szrj rtx_insn *last_insn;
221638fd1498Szrj rtx label;
221738fd1498Szrj rtx label_addr;
221838fd1498Szrj rtx_insn *indirect_jump_sequence;
221938fd1498Szrj rtx_insn *jump_insn = NULL;
222038fd1498Szrj rtx new_reg;
222138fd1498Szrj rtx_insn *cur_insn;
222238fd1498Szrj edge succ;
222338fd1498Szrj
222438fd1498Szrj FOR_EACH_BB_FN (cur_bb, cfun)
222538fd1498Szrj {
222638fd1498Szrj last_insn = BB_END (cur_bb);
222738fd1498Szrj
222838fd1498Szrj if (EDGE_COUNT (cur_bb->succs) < 1)
222938fd1498Szrj continue;
223038fd1498Szrj
223138fd1498Szrj succ = EDGE_SUCC (cur_bb, 0);
223238fd1498Szrj
223338fd1498Szrj /* Check to see if bb ends in a crossing (unconditional) jump. At
223438fd1498Szrj this point, no crossing jumps should be conditional. */
223538fd1498Szrj
223638fd1498Szrj if (JUMP_P (last_insn)
223738fd1498Szrj && (succ->flags & EDGE_CROSSING))
223838fd1498Szrj {
223938fd1498Szrj gcc_assert (!any_condjump_p (last_insn));
224038fd1498Szrj
224138fd1498Szrj /* Make sure the jump is not already an indirect or table jump. */
224238fd1498Szrj
224338fd1498Szrj if (!computed_jump_p (last_insn)
224438fd1498Szrj && !tablejump_p (last_insn, NULL, NULL))
224538fd1498Szrj {
224638fd1498Szrj /* We have found a "crossing" unconditional branch. Now
224738fd1498Szrj we must convert it to an indirect jump. First create
224838fd1498Szrj reference of label, as target for jump. */
224938fd1498Szrj
225038fd1498Szrj label = JUMP_LABEL (last_insn);
225138fd1498Szrj label_addr = gen_rtx_LABEL_REF (Pmode, label);
225238fd1498Szrj LABEL_NUSES (label) += 1;
225338fd1498Szrj
225438fd1498Szrj /* Get a register to use for the indirect jump. */
225538fd1498Szrj
225638fd1498Szrj new_reg = gen_reg_rtx (Pmode);
225738fd1498Szrj
225838fd1498Szrj /* Generate indirect the jump sequence. */
225938fd1498Szrj
226038fd1498Szrj start_sequence ();
226138fd1498Szrj emit_move_insn (new_reg, label_addr);
226238fd1498Szrj emit_indirect_jump (new_reg);
226338fd1498Szrj indirect_jump_sequence = get_insns ();
226438fd1498Szrj end_sequence ();
226538fd1498Szrj
226638fd1498Szrj /* Make sure every instruction in the new jump sequence has
226738fd1498Szrj its basic block set to be cur_bb. */
226838fd1498Szrj
226938fd1498Szrj for (cur_insn = indirect_jump_sequence; cur_insn;
227038fd1498Szrj cur_insn = NEXT_INSN (cur_insn))
227138fd1498Szrj {
227238fd1498Szrj if (!BARRIER_P (cur_insn))
227338fd1498Szrj BLOCK_FOR_INSN (cur_insn) = cur_bb;
227438fd1498Szrj if (JUMP_P (cur_insn))
227538fd1498Szrj jump_insn = cur_insn;
227638fd1498Szrj }
227738fd1498Szrj
227838fd1498Szrj /* Insert the new (indirect) jump sequence immediately before
227938fd1498Szrj the unconditional jump, then delete the unconditional jump. */
228038fd1498Szrj
228138fd1498Szrj emit_insn_before (indirect_jump_sequence, last_insn);
228238fd1498Szrj delete_insn (last_insn);
228338fd1498Szrj
228438fd1498Szrj JUMP_LABEL (jump_insn) = label;
228538fd1498Szrj LABEL_NUSES (label)++;
228638fd1498Szrj
228738fd1498Szrj /* Make BB_END for cur_bb be the jump instruction (NOT the
228838fd1498Szrj barrier instruction at the end of the sequence...). */
228938fd1498Szrj
229038fd1498Szrj BB_END (cur_bb) = jump_insn;
229138fd1498Szrj }
229238fd1498Szrj }
229338fd1498Szrj }
229438fd1498Szrj }
229538fd1498Szrj
229638fd1498Szrj /* Update CROSSING_JUMP_P flags on all jump insns. */
229738fd1498Szrj
229838fd1498Szrj static void
update_crossing_jump_flags(void)229938fd1498Szrj update_crossing_jump_flags (void)
230038fd1498Szrj {
230138fd1498Szrj basic_block bb;
230238fd1498Szrj edge e;
230338fd1498Szrj edge_iterator ei;
230438fd1498Szrj
230538fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
230638fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs)
230738fd1498Szrj if (e->flags & EDGE_CROSSING)
230838fd1498Szrj {
230938fd1498Szrj if (JUMP_P (BB_END (bb)))
231038fd1498Szrj CROSSING_JUMP_P (BB_END (bb)) = 1;
231138fd1498Szrj break;
231238fd1498Szrj }
231338fd1498Szrj }
231438fd1498Szrj
231538fd1498Szrj /* Reorder basic blocks using the software trace cache (STC) algorithm. */
231638fd1498Szrj
231738fd1498Szrj static void
reorder_basic_blocks_software_trace_cache(void)231838fd1498Szrj reorder_basic_blocks_software_trace_cache (void)
231938fd1498Szrj {
232038fd1498Szrj if (dump_file)
232138fd1498Szrj fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
232238fd1498Szrj
232338fd1498Szrj int n_traces;
232438fd1498Szrj int i;
232538fd1498Szrj struct trace *traces;
232638fd1498Szrj
232738fd1498Szrj /* We are estimating the length of uncond jump insn only once since the code
232838fd1498Szrj for getting the insn length always returns the minimal length now. */
232938fd1498Szrj if (uncond_jump_length == 0)
233038fd1498Szrj uncond_jump_length = get_uncond_jump_length ();
233138fd1498Szrj
233238fd1498Szrj /* We need to know some information for each basic block. */
233338fd1498Szrj array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun));
233438fd1498Szrj bbd = XNEWVEC (bbro_basic_block_data, array_size);
233538fd1498Szrj for (i = 0; i < array_size; i++)
233638fd1498Szrj {
233738fd1498Szrj bbd[i].start_of_trace = -1;
233838fd1498Szrj bbd[i].end_of_trace = -1;
233938fd1498Szrj bbd[i].in_trace = -1;
234038fd1498Szrj bbd[i].visited = 0;
234138fd1498Szrj bbd[i].priority = -1;
234238fd1498Szrj bbd[i].heap = NULL;
234338fd1498Szrj bbd[i].node = NULL;
234438fd1498Szrj }
234538fd1498Szrj
234638fd1498Szrj traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun));
234738fd1498Szrj n_traces = 0;
234838fd1498Szrj find_traces (&n_traces, traces);
234938fd1498Szrj connect_traces (n_traces, traces);
235038fd1498Szrj FREE (traces);
235138fd1498Szrj FREE (bbd);
235238fd1498Szrj }
235338fd1498Szrj
235438fd1498Szrj /* Return true if edge E1 is more desirable as a fallthrough edge than
235538fd1498Szrj edge E2 is. */
235638fd1498Szrj
235738fd1498Szrj static bool
edge_order(edge e1,edge e2)235838fd1498Szrj edge_order (edge e1, edge e2)
235938fd1498Szrj {
236038fd1498Szrj return e1->count () > e2->count ();
236138fd1498Szrj }
236238fd1498Szrj
236338fd1498Szrj /* Reorder basic blocks using the "simple" algorithm. This tries to
236438fd1498Szrj maximize the dynamic number of branches that are fallthrough, without
236538fd1498Szrj copying instructions. The algorithm is greedy, looking at the most
236638fd1498Szrj frequently executed branch first. */
236738fd1498Szrj
236838fd1498Szrj static void
reorder_basic_blocks_simple(void)236938fd1498Szrj reorder_basic_blocks_simple (void)
237038fd1498Szrj {
237138fd1498Szrj if (dump_file)
237238fd1498Szrj fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
237338fd1498Szrj
237438fd1498Szrj edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
237538fd1498Szrj
237638fd1498Szrj /* First, collect all edges that can be optimized by reordering blocks:
237738fd1498Szrj simple jumps and conditional jumps, as well as the function entry edge. */
237838fd1498Szrj
237938fd1498Szrj int n = 0;
238038fd1498Szrj edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
238138fd1498Szrj
238238fd1498Szrj basic_block bb;
238338fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
238438fd1498Szrj {
238538fd1498Szrj rtx_insn *end = BB_END (bb);
238638fd1498Szrj
238738fd1498Szrj if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
238838fd1498Szrj continue;
238938fd1498Szrj
239038fd1498Szrj /* We cannot optimize asm goto. */
239138fd1498Szrj if (JUMP_P (end) && extract_asm_operands (end))
239238fd1498Szrj continue;
239338fd1498Szrj
239438fd1498Szrj if (single_succ_p (bb))
239538fd1498Szrj edges[n++] = EDGE_SUCC (bb, 0);
239638fd1498Szrj else if (any_condjump_p (end))
239738fd1498Szrj {
239838fd1498Szrj edge e0 = EDGE_SUCC (bb, 0);
239938fd1498Szrj edge e1 = EDGE_SUCC (bb, 1);
240038fd1498Szrj /* When optimizing for size it is best to keep the original
240138fd1498Szrj fallthrough edges. */
240238fd1498Szrj if (e1->flags & EDGE_FALLTHRU)
240338fd1498Szrj std::swap (e0, e1);
240438fd1498Szrj edges[n++] = e0;
240538fd1498Szrj edges[n++] = e1;
240638fd1498Szrj }
240738fd1498Szrj }
240838fd1498Szrj
240938fd1498Szrj /* Sort the edges, the most desirable first. When optimizing for size
241038fd1498Szrj all edges are equally desirable. */
241138fd1498Szrj
241238fd1498Szrj if (optimize_function_for_speed_p (cfun))
241338fd1498Szrj std::stable_sort (edges, edges + n, edge_order);
241438fd1498Szrj
241538fd1498Szrj /* Now decide which of those edges to make fallthrough edges. We set
241638fd1498Szrj BB_VISITED if a block already has a fallthrough successor assigned
241738fd1498Szrj to it. We make ->AUX of an endpoint point to the opposite endpoint
241838fd1498Szrj of a sequence of blocks that fall through, and ->AUX will be NULL
241938fd1498Szrj for a block that is in such a sequence but not an endpoint anymore.
242038fd1498Szrj
242138fd1498Szrj To start with, everything points to itself, nothing is assigned yet. */
242238fd1498Szrj
242338fd1498Szrj FOR_ALL_BB_FN (bb, cfun)
242438fd1498Szrj {
242538fd1498Szrj bb->aux = bb;
242638fd1498Szrj bb->flags &= ~BB_VISITED;
242738fd1498Szrj }
242838fd1498Szrj
242938fd1498Szrj EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
243038fd1498Szrj
243138fd1498Szrj /* Now for all edges, the most desirable first, see if that edge can
243238fd1498Szrj connect two sequences. If it can, update AUX and BB_VISITED; if it
243338fd1498Szrj cannot, zero out the edge in the table. */
243438fd1498Szrj
243538fd1498Szrj for (int j = 0; j < n; j++)
243638fd1498Szrj {
243738fd1498Szrj edge e = edges[j];
243838fd1498Szrj
243938fd1498Szrj basic_block tail_a = e->src;
244038fd1498Szrj basic_block head_b = e->dest;
244138fd1498Szrj basic_block head_a = (basic_block) tail_a->aux;
244238fd1498Szrj basic_block tail_b = (basic_block) head_b->aux;
244338fd1498Szrj
244438fd1498Szrj /* An edge cannot connect two sequences if:
244538fd1498Szrj - it crosses partitions;
244638fd1498Szrj - its src is not a current endpoint;
244738fd1498Szrj - its dest is not a current endpoint;
244838fd1498Szrj - or, it would create a loop. */
244938fd1498Szrj
245038fd1498Szrj if (e->flags & EDGE_CROSSING
245138fd1498Szrj || tail_a->flags & BB_VISITED
245238fd1498Szrj || !tail_b
245338fd1498Szrj || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
245438fd1498Szrj || tail_a == tail_b)
245538fd1498Szrj {
245638fd1498Szrj edges[j] = 0;
245738fd1498Szrj continue;
245838fd1498Szrj }
245938fd1498Szrj
246038fd1498Szrj tail_a->aux = 0;
246138fd1498Szrj head_b->aux = 0;
246238fd1498Szrj head_a->aux = tail_b;
246338fd1498Szrj tail_b->aux = head_a;
246438fd1498Szrj tail_a->flags |= BB_VISITED;
246538fd1498Szrj }
246638fd1498Szrj
246738fd1498Szrj /* Put the pieces together, in the same order that the start blocks of
246838fd1498Szrj the sequences already had. The hot/cold partitioning gives a little
246938fd1498Szrj complication: as a first pass only do this for blocks in the same
247038fd1498Szrj partition as the start block, and (if there is anything left to do)
247138fd1498Szrj in a second pass handle the other partition. */
247238fd1498Szrj
247338fd1498Szrj basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
247438fd1498Szrj
247538fd1498Szrj int current_partition
247638fd1498Szrj = BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun)
247738fd1498Szrj ? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0)->dest
247838fd1498Szrj : last_tail);
247938fd1498Szrj bool need_another_pass = true;
248038fd1498Szrj
248138fd1498Szrj for (int pass = 0; pass < 2 && need_another_pass; pass++)
248238fd1498Szrj {
248338fd1498Szrj need_another_pass = false;
248438fd1498Szrj
248538fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
248638fd1498Szrj if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
248738fd1498Szrj {
248838fd1498Szrj if (BB_PARTITION (bb) != current_partition)
248938fd1498Szrj {
249038fd1498Szrj need_another_pass = true;
249138fd1498Szrj continue;
249238fd1498Szrj }
249338fd1498Szrj
249438fd1498Szrj last_tail->aux = bb;
249538fd1498Szrj last_tail = (basic_block) bb->aux;
249638fd1498Szrj }
249738fd1498Szrj
249838fd1498Szrj current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
249938fd1498Szrj }
250038fd1498Szrj
250138fd1498Szrj last_tail->aux = 0;
250238fd1498Szrj
250338fd1498Szrj /* Finally, link all the chosen fallthrough edges. */
250438fd1498Szrj
250538fd1498Szrj for (int j = 0; j < n; j++)
250638fd1498Szrj if (edges[j])
250738fd1498Szrj edges[j]->src->aux = edges[j]->dest;
250838fd1498Szrj
250938fd1498Szrj delete[] edges;
251038fd1498Szrj
251138fd1498Szrj /* If the entry edge no longer falls through we have to make a new
251238fd1498Szrj block so it can do so again. */
251338fd1498Szrj
251438fd1498Szrj edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
251538fd1498Szrj if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
251638fd1498Szrj {
251738fd1498Szrj force_nonfallthru (e);
251838fd1498Szrj e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
251938fd1498Szrj }
252038fd1498Szrj }
252138fd1498Szrj
252238fd1498Szrj /* Reorder basic blocks. The main entry point to this file. */
252338fd1498Szrj
252438fd1498Szrj static void
reorder_basic_blocks(void)252538fd1498Szrj reorder_basic_blocks (void)
252638fd1498Szrj {
252738fd1498Szrj gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
252838fd1498Szrj
252938fd1498Szrj if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
253038fd1498Szrj return;
253138fd1498Szrj
253238fd1498Szrj set_edge_can_fallthru_flag ();
253338fd1498Szrj mark_dfs_back_edges ();
253438fd1498Szrj
253538fd1498Szrj switch (flag_reorder_blocks_algorithm)
253638fd1498Szrj {
253738fd1498Szrj case REORDER_BLOCKS_ALGORITHM_SIMPLE:
253838fd1498Szrj reorder_basic_blocks_simple ();
253938fd1498Szrj break;
254038fd1498Szrj
254138fd1498Szrj case REORDER_BLOCKS_ALGORITHM_STC:
254238fd1498Szrj reorder_basic_blocks_software_trace_cache ();
254338fd1498Szrj break;
254438fd1498Szrj
254538fd1498Szrj default:
254638fd1498Szrj gcc_unreachable ();
254738fd1498Szrj }
254838fd1498Szrj
254938fd1498Szrj relink_block_chain (/*stay_in_cfglayout_mode=*/true);
255038fd1498Szrj
255138fd1498Szrj if (dump_file)
255238fd1498Szrj {
255338fd1498Szrj if (dump_flags & TDF_DETAILS)
255438fd1498Szrj dump_reg_info (dump_file);
255538fd1498Szrj dump_flow_info (dump_file, dump_flags);
255638fd1498Szrj }
255738fd1498Szrj
255838fd1498Szrj /* Signal that rtl_verify_flow_info_1 can now verify that there
255938fd1498Szrj is at most one switch between hot/cold sections. */
256038fd1498Szrj crtl->bb_reorder_complete = true;
256138fd1498Szrj }
256238fd1498Szrj
256338fd1498Szrj /* Determine which partition the first basic block in the function
256438fd1498Szrj belongs to, then find the first basic block in the current function
256538fd1498Szrj that belongs to a different section, and insert a
256638fd1498Szrj NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the
256738fd1498Szrj instruction stream. When writing out the assembly code,
256838fd1498Szrj encountering this note will make the compiler switch between the
256938fd1498Szrj hot and cold text sections. */
257038fd1498Szrj
257138fd1498Szrj void
insert_section_boundary_note(void)257238fd1498Szrj insert_section_boundary_note (void)
257338fd1498Szrj {
257438fd1498Szrj basic_block bb;
257538fd1498Szrj bool switched_sections = false;
257638fd1498Szrj int current_partition = 0;
257738fd1498Szrj
257838fd1498Szrj if (!crtl->has_bb_partition)
257938fd1498Szrj return;
258038fd1498Szrj
258138fd1498Szrj FOR_EACH_BB_FN (bb, cfun)
258238fd1498Szrj {
258338fd1498Szrj if (!current_partition)
258438fd1498Szrj current_partition = BB_PARTITION (bb);
258538fd1498Szrj if (BB_PARTITION (bb) != current_partition)
258638fd1498Szrj {
258738fd1498Szrj gcc_assert (!switched_sections);
258838fd1498Szrj switched_sections = true;
258938fd1498Szrj emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
259038fd1498Szrj current_partition = BB_PARTITION (bb);
259138fd1498Szrj }
259238fd1498Szrj }
259338fd1498Szrj
259438fd1498Szrj /* Make sure crtl->has_bb_partition matches reality even if bbpart finds
259538fd1498Szrj some hot and some cold basic blocks, but later one of those kinds is
259638fd1498Szrj optimized away. */
259738fd1498Szrj crtl->has_bb_partition = switched_sections;
259838fd1498Szrj }
259938fd1498Szrj
260038fd1498Szrj namespace {
260138fd1498Szrj
260238fd1498Szrj const pass_data pass_data_reorder_blocks =
260338fd1498Szrj {
260438fd1498Szrj RTL_PASS, /* type */
260538fd1498Szrj "bbro", /* name */
260638fd1498Szrj OPTGROUP_NONE, /* optinfo_flags */
260738fd1498Szrj TV_REORDER_BLOCKS, /* tv_id */
260838fd1498Szrj 0, /* properties_required */
260938fd1498Szrj 0, /* properties_provided */
261038fd1498Szrj 0, /* properties_destroyed */
261138fd1498Szrj 0, /* todo_flags_start */
261238fd1498Szrj 0, /* todo_flags_finish */
261338fd1498Szrj };
261438fd1498Szrj
261538fd1498Szrj class pass_reorder_blocks : public rtl_opt_pass
261638fd1498Szrj {
261738fd1498Szrj public:
pass_reorder_blocks(gcc::context * ctxt)261838fd1498Szrj pass_reorder_blocks (gcc::context *ctxt)
261938fd1498Szrj : rtl_opt_pass (pass_data_reorder_blocks, ctxt)
262038fd1498Szrj {}
262138fd1498Szrj
262238fd1498Szrj /* opt_pass methods: */
gate(function *)262338fd1498Szrj virtual bool gate (function *)
262438fd1498Szrj {
262538fd1498Szrj if (targetm.cannot_modify_jumps_p ())
262638fd1498Szrj return false;
262738fd1498Szrj return (optimize > 0
262838fd1498Szrj && (flag_reorder_blocks || flag_reorder_blocks_and_partition));
262938fd1498Szrj }
263038fd1498Szrj
263138fd1498Szrj virtual unsigned int execute (function *);
263238fd1498Szrj
263338fd1498Szrj }; // class pass_reorder_blocks
263438fd1498Szrj
263538fd1498Szrj unsigned int
execute(function * fun)263638fd1498Szrj pass_reorder_blocks::execute (function *fun)
263738fd1498Szrj {
263838fd1498Szrj basic_block bb;
263938fd1498Szrj
264038fd1498Szrj /* Last attempt to optimize CFG, as scheduling, peepholing and insn
264138fd1498Szrj splitting possibly introduced more crossjumping opportunities. */
264238fd1498Szrj cfg_layout_initialize (CLEANUP_EXPENSIVE);
264338fd1498Szrj
264438fd1498Szrj reorder_basic_blocks ();
264538fd1498Szrj cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING);
264638fd1498Szrj
264738fd1498Szrj FOR_EACH_BB_FN (bb, fun)
264838fd1498Szrj if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
264938fd1498Szrj bb->aux = bb->next_bb;
265038fd1498Szrj cfg_layout_finalize ();
265138fd1498Szrj
265238fd1498Szrj return 0;
265338fd1498Szrj }
265438fd1498Szrj
265538fd1498Szrj } // anon namespace
265638fd1498Szrj
265738fd1498Szrj rtl_opt_pass *
make_pass_reorder_blocks(gcc::context * ctxt)265838fd1498Szrj make_pass_reorder_blocks (gcc::context *ctxt)
265938fd1498Szrj {
266038fd1498Szrj return new pass_reorder_blocks (ctxt);
266138fd1498Szrj }
266238fd1498Szrj
266338fd1498Szrj /* Duplicate a block (that we already know ends in a computed jump) into its
266438fd1498Szrj predecessors, where possible. Return whether anything is changed. */
266538fd1498Szrj static bool
maybe_duplicate_computed_goto(basic_block bb,int max_size)266638fd1498Szrj maybe_duplicate_computed_goto (basic_block bb, int max_size)
266738fd1498Szrj {
266838fd1498Szrj if (single_pred_p (bb))
266938fd1498Szrj return false;
267038fd1498Szrj
267138fd1498Szrj /* Make sure that the block is small enough. */
267238fd1498Szrj rtx_insn *insn;
267338fd1498Szrj FOR_BB_INSNS (bb, insn)
267438fd1498Szrj if (INSN_P (insn))
267538fd1498Szrj {
267638fd1498Szrj max_size -= get_attr_min_length (insn);
267738fd1498Szrj if (max_size < 0)
267838fd1498Szrj return false;
267938fd1498Szrj }
268038fd1498Szrj
268138fd1498Szrj bool changed = false;
268238fd1498Szrj edge e;
268338fd1498Szrj edge_iterator ei;
268438fd1498Szrj for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
268538fd1498Szrj {
268638fd1498Szrj basic_block pred = e->src;
268738fd1498Szrj
268838fd1498Szrj /* Do not duplicate BB into PRED if that is the last predecessor, or if
268938fd1498Szrj we cannot merge a copy of BB with PRED. */
269038fd1498Szrj if (single_pred_p (bb)
269138fd1498Szrj || !single_succ_p (pred)
269238fd1498Szrj || e->flags & EDGE_COMPLEX
269338fd1498Szrj || pred->index < NUM_FIXED_BLOCKS
269438fd1498Szrj || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
269538fd1498Szrj || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
269638fd1498Szrj {
269738fd1498Szrj ei_next (&ei);
269838fd1498Szrj continue;
269938fd1498Szrj }
270038fd1498Szrj
270138fd1498Szrj if (dump_file)
270238fd1498Szrj fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
270338fd1498Szrj bb->index, e->src->index);
270438fd1498Szrj
270538fd1498Szrj /* Remember if PRED can be duplicated; if so, the copy of BB merged
270638fd1498Szrj with PRED can be duplicated as well. */
270738fd1498Szrj bool can_dup_more = can_duplicate_block_p (pred);
270838fd1498Szrj
270938fd1498Szrj /* Make a copy of BB, merge it into PRED. */
271038fd1498Szrj basic_block copy = duplicate_block (bb, e, NULL);
271138fd1498Szrj emit_barrier_after_bb (copy);
271238fd1498Szrj reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
271338fd1498Szrj merge_blocks (pred, copy);
271438fd1498Szrj
271538fd1498Szrj changed = true;
271638fd1498Szrj
271738fd1498Szrj /* Try to merge the resulting merged PRED into further predecessors. */
271838fd1498Szrj if (can_dup_more)
271938fd1498Szrj maybe_duplicate_computed_goto (pred, max_size);
272038fd1498Szrj }
272138fd1498Szrj
272238fd1498Szrj return changed;
272338fd1498Szrj }
272438fd1498Szrj
272538fd1498Szrj /* Duplicate the blocks containing computed gotos. This basically unfactors
272638fd1498Szrj computed gotos that were factored early on in the compilation process to
272738fd1498Szrj speed up edge based data flow. We used to not unfactor them again, which
272838fd1498Szrj can seriously pessimize code with many computed jumps in the source code,
272938fd1498Szrj such as interpreters. See e.g. PR15242. */
273038fd1498Szrj static void
duplicate_computed_gotos(function * fun)273138fd1498Szrj duplicate_computed_gotos (function *fun)
273238fd1498Szrj {
273338fd1498Szrj /* We are estimating the length of uncond jump insn only once
273438fd1498Szrj since the code for getting the insn length always returns
273538fd1498Szrj the minimal length now. */
273638fd1498Szrj if (uncond_jump_length == 0)
273738fd1498Szrj uncond_jump_length = get_uncond_jump_length ();
273838fd1498Szrj
273938fd1498Szrj /* Never copy a block larger than this. */
274038fd1498Szrj int max_size
274138fd1498Szrj = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
274238fd1498Szrj
274338fd1498Szrj bool changed = false;
274438fd1498Szrj
274538fd1498Szrj /* Try to duplicate all blocks that end in a computed jump and that
274638fd1498Szrj can be duplicated at all. */
274738fd1498Szrj basic_block bb;
274838fd1498Szrj FOR_EACH_BB_FN (bb, fun)
274938fd1498Szrj if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
275038fd1498Szrj changed |= maybe_duplicate_computed_goto (bb, max_size);
275138fd1498Szrj
275238fd1498Szrj /* Duplicating blocks will redirect edges and may cause hot blocks
275338fd1498Szrj previously reached by both hot and cold blocks to become dominated
275438fd1498Szrj only by cold blocks. */
275538fd1498Szrj if (changed)
275638fd1498Szrj fixup_partitions ();
275738fd1498Szrj }
275838fd1498Szrj
275938fd1498Szrj namespace {
276038fd1498Szrj
276138fd1498Szrj const pass_data pass_data_duplicate_computed_gotos =
276238fd1498Szrj {
276338fd1498Szrj RTL_PASS, /* type */
276438fd1498Szrj "compgotos", /* name */
276538fd1498Szrj OPTGROUP_NONE, /* optinfo_flags */
276638fd1498Szrj TV_REORDER_BLOCKS, /* tv_id */
276738fd1498Szrj 0, /* properties_required */
276838fd1498Szrj 0, /* properties_provided */
276938fd1498Szrj 0, /* properties_destroyed */
277038fd1498Szrj 0, /* todo_flags_start */
277138fd1498Szrj 0, /* todo_flags_finish */
277238fd1498Szrj };
277338fd1498Szrj
277438fd1498Szrj class pass_duplicate_computed_gotos : public rtl_opt_pass
277538fd1498Szrj {
277638fd1498Szrj public:
pass_duplicate_computed_gotos(gcc::context * ctxt)277738fd1498Szrj pass_duplicate_computed_gotos (gcc::context *ctxt)
277838fd1498Szrj : rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt)
277938fd1498Szrj {}
278038fd1498Szrj
278138fd1498Szrj /* opt_pass methods: */
278238fd1498Szrj virtual bool gate (function *);
278338fd1498Szrj virtual unsigned int execute (function *);
278438fd1498Szrj
278538fd1498Szrj }; // class pass_duplicate_computed_gotos
278638fd1498Szrj
278738fd1498Szrj bool
gate(function * fun)278838fd1498Szrj pass_duplicate_computed_gotos::gate (function *fun)
278938fd1498Szrj {
279038fd1498Szrj if (targetm.cannot_modify_jumps_p ())
279138fd1498Szrj return false;
279238fd1498Szrj return (optimize > 0
279338fd1498Szrj && flag_expensive_optimizations
279438fd1498Szrj && ! optimize_function_for_size_p (fun));
279538fd1498Szrj }
279638fd1498Szrj
279738fd1498Szrj unsigned int
execute(function * fun)279838fd1498Szrj pass_duplicate_computed_gotos::execute (function *fun)
279938fd1498Szrj {
280038fd1498Szrj duplicate_computed_gotos (fun);
280138fd1498Szrj
280238fd1498Szrj return 0;
280338fd1498Szrj }
280438fd1498Szrj
280538fd1498Szrj } // anon namespace
280638fd1498Szrj
280738fd1498Szrj rtl_opt_pass *
make_pass_duplicate_computed_gotos(gcc::context * ctxt)280838fd1498Szrj make_pass_duplicate_computed_gotos (gcc::context *ctxt)
280938fd1498Szrj {
281038fd1498Szrj return new pass_duplicate_computed_gotos (ctxt);
281138fd1498Szrj }
281238fd1498Szrj
281338fd1498Szrj /* This function is the main 'entrance' for the optimization that
281438fd1498Szrj partitions hot and cold basic blocks into separate sections of the
281538fd1498Szrj .o file (to improve performance and cache locality). Ideally it
281638fd1498Szrj would be called after all optimizations that rearrange the CFG have
281738fd1498Szrj been called. However part of this optimization may introduce new
281838fd1498Szrj register usage, so it must be called before register allocation has
281938fd1498Szrj occurred. This means that this optimization is actually called
282038fd1498Szrj well before the optimization that reorders basic blocks (see
282138fd1498Szrj function above).
282238fd1498Szrj
282338fd1498Szrj This optimization checks the feedback information to determine
282438fd1498Szrj which basic blocks are hot/cold, updates flags on the basic blocks
282538fd1498Szrj to indicate which section they belong in. This information is
282638fd1498Szrj later used for writing out sections in the .o file. Because hot
282738fd1498Szrj and cold sections can be arbitrarily large (within the bounds of
282838fd1498Szrj memory), far beyond the size of a single function, it is necessary
282938fd1498Szrj to fix up all edges that cross section boundaries, to make sure the
283038fd1498Szrj instructions used can actually span the required distance. The
283138fd1498Szrj fixes are described below.
283238fd1498Szrj
283338fd1498Szrj Fall-through edges must be changed into jumps; it is not safe or
283438fd1498Szrj legal to fall through across a section boundary. Whenever a
283538fd1498Szrj fall-through edge crossing a section boundary is encountered, a new
283638fd1498Szrj basic block is inserted (in the same section as the fall-through
283738fd1498Szrj source), and the fall through edge is redirected to the new basic
283838fd1498Szrj block. The new basic block contains an unconditional jump to the
283938fd1498Szrj original fall-through target. (If the unconditional jump is
284038fd1498Szrj insufficient to cross section boundaries, that is dealt with a
284138fd1498Szrj little later, see below).
284238fd1498Szrj
284338fd1498Szrj In order to deal with architectures that have short conditional
284438fd1498Szrj branches (which cannot span all of memory) we take any conditional
284538fd1498Szrj jump that attempts to cross a section boundary and add a level of
284638fd1498Szrj indirection: it becomes a conditional jump to a new basic block, in
284738fd1498Szrj the same section. The new basic block contains an unconditional
284838fd1498Szrj jump to the original target, in the other section.
284938fd1498Szrj
285038fd1498Szrj For those architectures whose unconditional branch is also
285138fd1498Szrj incapable of reaching all of memory, those unconditional jumps are
285238fd1498Szrj converted into indirect jumps, through a register.
285338fd1498Szrj
285438fd1498Szrj IMPORTANT NOTE: This optimization causes some messy interactions
285538fd1498Szrj with the cfg cleanup optimizations; those optimizations want to
285638fd1498Szrj merge blocks wherever possible, and to collapse indirect jump
285738fd1498Szrj sequences (change "A jumps to B jumps to C" directly into "A jumps
285838fd1498Szrj to C"). Those optimizations can undo the jump fixes that
285938fd1498Szrj partitioning is required to make (see above), in order to ensure
286038fd1498Szrj that jumps attempting to cross section boundaries are really able
286138fd1498Szrj to cover whatever distance the jump requires (on many architectures
286238fd1498Szrj conditional or unconditional jumps are not able to reach all of
286338fd1498Szrj memory). Therefore tests have to be inserted into each such
286438fd1498Szrj optimization to make sure that it does not undo stuff necessary to
286538fd1498Szrj cross partition boundaries. This would be much less of a problem
286638fd1498Szrj if we could perform this optimization later in the compilation, but
286738fd1498Szrj unfortunately the fact that we may need to create indirect jumps
286838fd1498Szrj (through registers) requires that this optimization be performed
286938fd1498Szrj before register allocation.
287038fd1498Szrj
287138fd1498Szrj Hot and cold basic blocks are partitioned and put in separate
287238fd1498Szrj sections of the .o file, to reduce paging and improve cache
287338fd1498Szrj performance (hopefully). This can result in bits of code from the
287438fd1498Szrj same function being widely separated in the .o file. However this
287538fd1498Szrj is not obvious to the current bb structure. Therefore we must take
287638fd1498Szrj care to ensure that: 1). There are no fall_thru edges that cross
287738fd1498Szrj between sections; 2). For those architectures which have "short"
287838fd1498Szrj conditional branches, all conditional branches that attempt to
287938fd1498Szrj cross between sections are converted to unconditional branches;
288038fd1498Szrj and, 3). For those architectures which have "short" unconditional
288138fd1498Szrj branches, all unconditional branches that attempt to cross between
288238fd1498Szrj sections are converted to indirect jumps.
288338fd1498Szrj
288438fd1498Szrj The code for fixing up fall_thru edges that cross between hot and
288538fd1498Szrj cold basic blocks does so by creating new basic blocks containing
288638fd1498Szrj unconditional branches to the appropriate label in the "other"
288738fd1498Szrj section. The new basic block is then put in the same (hot or cold)
288838fd1498Szrj section as the original conditional branch, and the fall_thru edge
288938fd1498Szrj is modified to fall into the new basic block instead. By adding
289038fd1498Szrj this level of indirection we end up with only unconditional branches
289138fd1498Szrj crossing between hot and cold sections.
289238fd1498Szrj
289338fd1498Szrj Conditional branches are dealt with by adding a level of indirection.
289438fd1498Szrj A new basic block is added in the same (hot/cold) section as the
289538fd1498Szrj conditional branch, and the conditional branch is retargeted to the
289638fd1498Szrj new basic block. The new basic block contains an unconditional branch
289738fd1498Szrj to the original target of the conditional branch (in the other section).
289838fd1498Szrj
289938fd1498Szrj Unconditional branches are dealt with by converting them into
290038fd1498Szrj indirect jumps. */
290138fd1498Szrj
290238fd1498Szrj namespace {
290338fd1498Szrj
290438fd1498Szrj const pass_data pass_data_partition_blocks =
290538fd1498Szrj {
290638fd1498Szrj RTL_PASS, /* type */
290738fd1498Szrj "bbpart", /* name */
290838fd1498Szrj OPTGROUP_NONE, /* optinfo_flags */
290938fd1498Szrj TV_REORDER_BLOCKS, /* tv_id */
291038fd1498Szrj PROP_cfglayout, /* properties_required */
291138fd1498Szrj 0, /* properties_provided */
291238fd1498Szrj 0, /* properties_destroyed */
291338fd1498Szrj 0, /* todo_flags_start */
291438fd1498Szrj 0, /* todo_flags_finish */
291538fd1498Szrj };
291638fd1498Szrj
291738fd1498Szrj class pass_partition_blocks : public rtl_opt_pass
291838fd1498Szrj {
291938fd1498Szrj public:
pass_partition_blocks(gcc::context * ctxt)292038fd1498Szrj pass_partition_blocks (gcc::context *ctxt)
292138fd1498Szrj : rtl_opt_pass (pass_data_partition_blocks, ctxt)
292238fd1498Szrj {}
292338fd1498Szrj
292438fd1498Szrj /* opt_pass methods: */
292538fd1498Szrj virtual bool gate (function *);
292638fd1498Szrj virtual unsigned int execute (function *);
292738fd1498Szrj
292838fd1498Szrj }; // class pass_partition_blocks
292938fd1498Szrj
293038fd1498Szrj bool
gate(function * fun)293138fd1498Szrj pass_partition_blocks::gate (function *fun)
293238fd1498Szrj {
293338fd1498Szrj /* The optimization to partition hot/cold basic blocks into separate
293438fd1498Szrj sections of the .o file does not work well with linkonce or with
2935*58e805e6Szrj user defined section attributes or with naked attribute. Don't call
2936*58e805e6Szrj it if either case arises. */
293738fd1498Szrj return (flag_reorder_blocks_and_partition
293838fd1498Szrj && optimize
293938fd1498Szrj /* See pass_reorder_blocks::gate. We should not partition if
294038fd1498Szrj we are going to omit the reordering. */
294138fd1498Szrj && optimize_function_for_speed_p (fun)
294238fd1498Szrj && !DECL_COMDAT_GROUP (current_function_decl)
294338fd1498Szrj && !lookup_attribute ("section", DECL_ATTRIBUTES (fun->decl))
2944*58e805e6Szrj && !lookup_attribute ("naked", DECL_ATTRIBUTES (fun->decl))
294538fd1498Szrj /* Workaround a bug in GDB where read_partial_die doesn't cope
294638fd1498Szrj with DIEs with DW_AT_ranges, see PR81115. */
294738fd1498Szrj && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl))));
294838fd1498Szrj }
294938fd1498Szrj
295038fd1498Szrj unsigned
execute(function * fun)295138fd1498Szrj pass_partition_blocks::execute (function *fun)
295238fd1498Szrj {
295338fd1498Szrj vec<edge> crossing_edges;
295438fd1498Szrj
295538fd1498Szrj if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
295638fd1498Szrj return 0;
295738fd1498Szrj
295838fd1498Szrj df_set_flags (DF_DEFER_INSN_RESCAN);
295938fd1498Szrj
296038fd1498Szrj crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
296138fd1498Szrj if (!crossing_edges.exists ())
296238fd1498Szrj /* Make sure to process deferred rescans and clear changeable df flags. */
296338fd1498Szrj return TODO_df_finish;
296438fd1498Szrj
296538fd1498Szrj crtl->has_bb_partition = true;
296638fd1498Szrj
296738fd1498Szrj /* Make sure the source of any crossing edge ends in a jump and the
296838fd1498Szrj destination of any crossing edge has a label. */
296938fd1498Szrj add_labels_and_missing_jumps (crossing_edges);
297038fd1498Szrj
297138fd1498Szrj /* Convert all crossing fall_thru edges to non-crossing fall
297238fd1498Szrj thrus to unconditional jumps (that jump to the original fall
297338fd1498Szrj through dest). */
297438fd1498Szrj fix_up_fall_thru_edges ();
297538fd1498Szrj
297638fd1498Szrj /* If the architecture does not have conditional branches that can
297738fd1498Szrj span all of memory, convert crossing conditional branches into
297838fd1498Szrj crossing unconditional branches. */
297938fd1498Szrj if (!HAS_LONG_COND_BRANCH)
298038fd1498Szrj fix_crossing_conditional_branches ();
298138fd1498Szrj
298238fd1498Szrj /* If the architecture does not have unconditional branches that
298338fd1498Szrj can span all of memory, convert crossing unconditional branches
298438fd1498Szrj into indirect jumps. Since adding an indirect jump also adds
298538fd1498Szrj a new register usage, update the register usage information as
298638fd1498Szrj well. */
298738fd1498Szrj if (!HAS_LONG_UNCOND_BRANCH)
298838fd1498Szrj fix_crossing_unconditional_branches ();
298938fd1498Szrj
299038fd1498Szrj update_crossing_jump_flags ();
299138fd1498Szrj
299238fd1498Szrj /* Clear bb->aux fields that the above routines were using. */
299338fd1498Szrj clear_aux_for_blocks ();
299438fd1498Szrj
299538fd1498Szrj crossing_edges.release ();
299638fd1498Szrj
299738fd1498Szrj /* ??? FIXME: DF generates the bb info for a block immediately.
299838fd1498Szrj And by immediately, I mean *during* creation of the block.
299938fd1498Szrj
300038fd1498Szrj #0 df_bb_refs_collect
300138fd1498Szrj #1 in df_bb_refs_record
300238fd1498Szrj #2 in create_basic_block_structure
300338fd1498Szrj
300438fd1498Szrj Which means that the bb_has_eh_pred test in df_bb_refs_collect
300538fd1498Szrj will *always* fail, because no edges can have been added to the
300638fd1498Szrj block yet. Which of course means we don't add the right
300738fd1498Szrj artificial refs, which means we fail df_verify (much) later.
300838fd1498Szrj
300938fd1498Szrj Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply
301038fd1498Szrj that we also shouldn't grab data from the new blocks those new
301138fd1498Szrj insns are in either. In this way one can create the block, link
301238fd1498Szrj it up properly, and have everything Just Work later, when deferred
301338fd1498Szrj insns are processed.
301438fd1498Szrj
301538fd1498Szrj In the meantime, we have no other option but to throw away all
301638fd1498Szrj of the DF data and recompute it all. */
301738fd1498Szrj if (fun->eh->lp_array)
301838fd1498Szrj {
301938fd1498Szrj df_finish_pass (true);
302038fd1498Szrj df_scan_alloc (NULL);
302138fd1498Szrj df_scan_blocks ();
302238fd1498Szrj /* Not all post-landing pads use all of the EH_RETURN_DATA_REGNO
302338fd1498Szrj data. We blindly generated all of them when creating the new
302438fd1498Szrj landing pad. Delete those assignments we don't use. */
302538fd1498Szrj df_set_flags (DF_LR_RUN_DCE);
302638fd1498Szrj df_analyze ();
302738fd1498Szrj }
302838fd1498Szrj
302938fd1498Szrj /* Make sure to process deferred rescans and clear changeable df flags. */
303038fd1498Szrj return TODO_df_finish;
303138fd1498Szrj }
303238fd1498Szrj
303338fd1498Szrj } // anon namespace
303438fd1498Szrj
303538fd1498Szrj rtl_opt_pass *
make_pass_partition_blocks(gcc::context * ctxt)303638fd1498Szrj make_pass_partition_blocks (gcc::context *ctxt)
303738fd1498Szrj {
303838fd1498Szrj return new pass_partition_blocks (ctxt);
303938fd1498Szrj }
3040