gcc/doc/analyzer.texi

*4c3eb207Smrg@c Copyright (C) 2019 Free Software Foundation, Inc.
*4c3eb207Smrg@c This is part of the GCC manual.
*4c3eb207Smrg@c For copying conditions, see the file gcc.texi.
*4c3eb207Smrg@c Contributed by David Malcolm <dmalcolm@redhat.com>.
*4c3eb207Smrg
*4c3eb207Smrg@node Static Analyzer
*4c3eb207Smrg@chapter Static Analyzer
*4c3eb207Smrg@cindex analyzer
*4c3eb207Smrg@cindex static analysis
*4c3eb207Smrg@cindex static analyzer
*4c3eb207Smrg
*4c3eb207Smrg@menu
*4c3eb207Smrg* Analyzer Internals::       Analyzer Internals
*4c3eb207Smrg* Debugging the Analyzer::   Useful debugging tips
*4c3eb207Smrg@end menu
*4c3eb207Smrg
*4c3eb207Smrg@node Analyzer Internals
*4c3eb207Smrg@section Analyzer Internals
*4c3eb207Smrg@cindex analyzer, internals
*4c3eb207Smrg@cindex static analyzer, internals
*4c3eb207Smrg
*4c3eb207Smrg@subsection Overview
*4c3eb207Smrg
*4c3eb207SmrgThe analyzer implementation works on the gimple-SSA representation.
*4c3eb207Smrg(I chose this in the hopes of making it easy to work with LTO to
*4c3eb207Smrgdo whole-program analysis).
*4c3eb207Smrg
*4c3eb207SmrgThe implementation is read-only: it doesn't attempt to change anything,
*4c3eb207Smrgjust emit warnings.
*4c3eb207Smrg
*4c3eb207SmrgThe gimple representation can be seen using @option{-fdump-ipa-analyzer}.
*4c3eb207Smrg
*4c3eb207SmrgFirst, we build a @code{supergraph} which combines the callgraph and all
*4c3eb207Smrgof the CFGs into a single directed graph, with both interprocedural and
*4c3eb207Smrgintraprocedural edges.  The nodes and edges in the supergraph are called
*4c3eb207Smrg``supernodes'' and ``superedges'', and often referred to in code as
*4c3eb207Smrg@code{snodes} and @code{sedges}.  Basic blocks in the CFGs are split at
*4c3eb207Smrginterprocedural calls, so there can be more than one supernode per
*4c3eb207Smrgbasic block.  Most statements will be in just one supernode, but a call
*4c3eb207Smrgstatement can appear in two supernodes: at the end of one for the call,
*4c3eb207Smrgand again at the start of another for the return.
*4c3eb207Smrg
*4c3eb207SmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph}.
*4c3eb207Smrg
*4c3eb207SmrgWe then build an @code{analysis_plan} which walks the callgraph to
*4c3eb207Smrgdetermine which calls might be suitable for being summarized (rather
*4c3eb207Smrgthan fully explored) and thus in what order to explore the functions.
*4c3eb207Smrg
*4c3eb207SmrgNext is the heart of the analyzer: we use a worklist to explore state
*4c3eb207Smrgwithin the supergraph, building an "exploded graph".
*4c3eb207SmrgNodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
*4c3eb207Smrg     "Precise Interprocedural Dataflow Analysis via Graph Reachability"
*4c3eb207Smrg     (Thomas Reps, Susan Horwitz and Mooly Sagiv).
*4c3eb207Smrg
*4c3eb207SmrgWe reuse nodes for <point, state> pairs we've already seen, and avoid
*4c3eb207Smrgtracking state too closely, so that (hopefully) we rapidly converge
*4c3eb207Smrgon a final exploded graph, and terminate the analysis.  We also bail
*4c3eb207Smrgout if the number of exploded <end-of-basic-block, state> nodes gets
*4c3eb207Smrglarger than a particular multiple of the total number of basic blocks
*4c3eb207Smrg(to ensure termination in the face of pathological state-explosion
*4c3eb207Smrgcases, or bugs).  We also stop exploring a point once we hit a limit
*4c3eb207Smrgof states for that point.
*4c3eb207Smrg
*4c3eb207SmrgWe can identify problems directly when processing a <point,@w{ }state>
*4c3eb207Smrginstance.  For example, if we're finding the successors of
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg   <point: before-stmt: "free (ptr);",
*4c3eb207Smrg    state: @{"ptr": freed@}>
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgthen we can detect a double-free of "ptr".  We can then emit a path
*4c3eb207Smrgto reach the problem by finding the simplest route through the graph.
*4c3eb207Smrg
*4c3eb207SmrgProgram points in the analysis are much more fine-grained than in the
*4c3eb207SmrgCFG and supergraph, with points (and thus potentially exploded nodes)
*4c3eb207Smrgfor various events, including before individual statements.
*4c3eb207SmrgBy default the exploded graph merges multiple consecutive statements
*4c3eb207Smrgin a supernode into one exploded edge to minimize the size of the
*4c3eb207Smrgexploded graph.  This can be suppressed via
*4c3eb207Smrg@option{-fanalyzer-fine-grained}.
*4c3eb207SmrgThe fine-grained approach seems to make things simpler and more debuggable
*4c3eb207Smrgthat other approaches I tried, in that each point is responsible for one
*4c3eb207Smrgthing.
*4c3eb207Smrg
*4c3eb207SmrgProgram points in the analysis also have a "call string" identifying the
*4c3eb207Smrgstack of callsites below them, so that paths in the exploded graph
*4c3eb207Smrgcorrespond to interprocedurally valid paths: we always return to the
*4c3eb207Smrgcorrect call site, propagating state information accordingly.
*4c3eb207SmrgWe avoid infinite recursion by stopping the analysis if a callsite
*4c3eb207Smrgappears more than @code{analyzer-max-recursion-depth} in a callstring
*4c3eb207Smrg(defaulting to 2).
*4c3eb207Smrg
*4c3eb207Smrg@subsection Graphs
*4c3eb207Smrg
*4c3eb207SmrgNodes and edges in the exploded graph are called ``exploded nodes'' and
*4c3eb207Smrg``exploded edges'' and often referred to in the code as
*4c3eb207Smrg@code{enodes} and @code{eedges} (especially when distinguishing them
*4c3eb207Smrgfrom the @code{snodes} and @code{sedges} in the supergraph).
*4c3eb207Smrg
*4c3eb207SmrgEach graph numbers its nodes, giving unique identifiers - supernodes
*4c3eb207Smrgare referred to throughout dumps in the form @samp{SN': @var{index}} and
*4c3eb207Smrgexploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
*4c3eb207Smrg@samp{EN:29}).
*4c3eb207Smrg
*4c3eb207SmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.
*4c3eb207Smrg
*4c3eb207SmrgThe exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
*4c3eb207Smrgand other dump options.  Exploded nodes are color-coded in the .dot output
*4c3eb207Smrgbased on state-machine states to make it easier to see state changes at
*4c3eb207Smrga glance.
*4c3eb207Smrg
*4c3eb207Smrg@subsection State Tracking
*4c3eb207Smrg
*4c3eb207SmrgThere's a tension between:
*4c3eb207Smrg@itemize @bullet
*4c3eb207Smrg@item
*4c3eb207Smrgprecision of analysis in the straight-line case, vs
*4c3eb207Smrg@item
*4c3eb207Smrgexponential blow-up in the face of control flow.
*4c3eb207Smrg@end itemize
*4c3eb207Smrg
*4c3eb207SmrgFor example, in general, given this CFG:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg      A
*4c3eb207Smrg     / \
*4c3eb207Smrg    B   C
*4c3eb207Smrg     \ /
*4c3eb207Smrg      D
*4c3eb207Smrg     / \
*4c3eb207Smrg    E   F
*4c3eb207Smrg     \ /
*4c3eb207Smrg      G
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgwe want to avoid differences in state-tracking in B and C from
*4c3eb207Smrgleading to blow-up.  If we don't prevent state blowup, we end up
*4c3eb207Smrgwith exponential growth of the exploded graph like this:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg
*4c3eb207Smrg           1:A
*4c3eb207Smrg          /   \
*4c3eb207Smrg         /     \
*4c3eb207Smrg        /       \
*4c3eb207Smrg      2:B       3:C
*4c3eb207Smrg       |         |
*4c3eb207Smrg      4:D       5:D        (2 exploded nodes for D)
*4c3eb207Smrg     /   \     /   \
*4c3eb207Smrg   6:E   7:F 8:E   9:F
*4c3eb207Smrg    |     |   |     |
*4c3eb207Smrg   10:G 11:G 12:G  13:G    (4 exploded nodes for G)
*4c3eb207Smrg
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207SmrgSimilar issues arise with loops.
*4c3eb207Smrg
*4c3eb207SmrgTo prevent this, we follow various approaches:
*4c3eb207Smrg
*4c3eb207Smrg@enumerate a
*4c3eb207Smrg@item
*4c3eb207Smrgstate pruning: which tries to discard state that won't be relevant
*4c3eb207Smrglater on withing the function.
*4c3eb207SmrgThis can be disabled via @option{-fno-analyzer-state-purge}.
*4c3eb207Smrg
*4c3eb207Smrg@item
*4c3eb207Smrgstate merging.  We can try to find the commonality between two
*4c3eb207Smrgprogram_state instances to make a third, simpler program_state.
*4c3eb207SmrgWe have two strategies here:
*4c3eb207Smrg
*4c3eb207Smrg  @enumerate
*4c3eb207Smrg  @item
*4c3eb207Smrg     the worklist keeps new nodes for the same program_point together,
*4c3eb207Smrg     and tries to merge them before processing, and thus before they have
*4c3eb207Smrg     successors.  Hence, in the above, the two nodes for D (4 and 5) reach
*4c3eb207Smrg     the front of the worklist together, and we create a node for D with
*4c3eb207Smrg     the merger of the incoming states.
*4c3eb207Smrg
*4c3eb207Smrg  @item
*4c3eb207Smrg     try merging with the state of existing enodes for the program_point
*4c3eb207Smrg     (which may have already been explored).  There will be duplication,
*4c3eb207Smrg     but only one set of duplication; subsequent duplicates are more likely
*4c3eb207Smrg     to hit the cache.  In particular, (hopefully) all merger chains are
*4c3eb207Smrg     finite, and so we guarantee termination.
*4c3eb207Smrg     This is intended to help with loops: we ought to explore the first
*4c3eb207Smrg     iteration, and then have a "subsequent iterations" exploration,
*4c3eb207Smrg     which uses a state merged from that of the first, to be more abstract.
*4c3eb207Smrg  @end enumerate
*4c3eb207Smrg
*4c3eb207SmrgWe avoid merging pairs of states that have state-machine differences,
*4c3eb207Smrgas these are the kinds of differences that are likely to be most
*4c3eb207Smrginteresting.  So, for example, given:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg      if (condition)
*4c3eb207Smrg        ptr = malloc (size);
*4c3eb207Smrg      else
*4c3eb207Smrg        ptr = local_buf;
*4c3eb207Smrg
*4c3eb207Smrg      .... do things with 'ptr'
*4c3eb207Smrg
*4c3eb207Smrg      if (condition)
*4c3eb207Smrg        free (ptr);
*4c3eb207Smrg
*4c3eb207Smrg      ...etc
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgthen we end up with an exploded graph that looks like this:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg
*4c3eb207Smrg                   if (condition)
*4c3eb207Smrg                     / T      \ F
*4c3eb207Smrg            ---------          ----------
*4c3eb207Smrg           /                             \
*4c3eb207Smrg      ptr = malloc (size)             ptr = local_buf
*4c3eb207Smrg          |                               |
*4c3eb207Smrg      copy of                         copy of
*4c3eb207Smrg        "do things with 'ptr'"          "do things with 'ptr'"
*4c3eb207Smrg      with ptr: heap-allocated        with ptr: stack-allocated
*4c3eb207Smrg          |                               |
*4c3eb207Smrg      if (condition)                  if (condition)
*4c3eb207Smrg          | known to be T                 | known to be F
*4c3eb207Smrg      free (ptr);                         |
*4c3eb207Smrg           \                             /
*4c3eb207Smrg            -----------------------------
*4c3eb207Smrg                         | ('ptr' is pruned, so states can be merged)
*4c3eb207Smrg                        etc
*4c3eb207Smrg
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgwhere some duplication has occurred, but only for the places where the
*4c3eb207Smrgthe different paths are worth exploringly separately.
*4c3eb207Smrg
*4c3eb207SmrgMerging can be disabled via @option{-fno-analyzer-state-merge}.
*4c3eb207Smrg@end enumerate
*4c3eb207Smrg
*4c3eb207Smrg@subsection Region Model
*4c3eb207Smrg
*4c3eb207SmrgPart of the state stored at a @code{exploded_node} is a @code{region_model}.
*4c3eb207SmrgThis is an implementation of the region-based ternary model described in
*4c3eb207Smrg@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf,
*4c3eb207Smrg"A Memory Model for Static Analysis of C Programs"}
*4c3eb207Smrg(Zhongxing Xu, Ted Kremenek, and Jian Zhang).
*4c3eb207Smrg
*4c3eb207SmrgA @code{region_model} encapsulates a representation of the state of
*4c3eb207Smrgmemory, with a tree of @code{region} instances, along with their associated
*4c3eb207Smrgvalues.  The representation is graph-like because values can be pointers
*4c3eb207Smrgto regions.  It also stores a constraint_manager, capturing relationships
*4c3eb207Smrgbetween the values.
*4c3eb207Smrg
*4c3eb207SmrgBecause each node in the @code{exploded_graph} has a @code{region_model},
*4c3eb207Smrgand each of the latter is graph-like, the @code{exploded_graph} is in some
*4c3eb207Smrgways a graph of graphs.
*4c3eb207Smrg
*4c3eb207SmrgHere's an example of printing a @code{region_model}, showing the ASCII-art
*4c3eb207Smrgused to visualize the region hierarchy (colorized when printing to stderr):
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg(gdb) call debug (*this)
*4c3eb207Smrgr0: @{kind: 'root', parent: null, sval: null@}
*4c3eb207Smrg|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@}
*4c3eb207Smrg|  |: sval: sv1: @{poisoned: uninit@}
*4c3eb207Smrg|  |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@}
*4c3eb207Smrg|  |  `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@}
*4c3eb207Smrg|  |    |: sval: sv3: @{type: 'void *', unknown@}
*4c3eb207Smrg|  |    |: type: 'void *'
*4c3eb207Smrg|  `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@}
*4c3eb207Smrg|    |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
*4c3eb207Smrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
*4c3eb207Smrg|    |  |: type: 'void *'
*4c3eb207Smrg|    |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
*4c3eb207Smrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
*4c3eb207Smrg|    |  |: type: 'void *'
*4c3eb207Smrg|    `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
*4c3eb207Smrg|      |: sval: sv4: @{type: 'void *', &r6@}
*4c3eb207Smrg|      |: type: 'void *'
*4c3eb207Smrg`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@}
*4c3eb207Smrg  |: sval: sv2: @{poisoned: uninit@}
*4c3eb207Smrg  `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@}
*4c3eb207Smrgsvalues:
*4c3eb207Smrg  sv0: @{type: 'size_t', '1024'@}
*4c3eb207Smrg  sv1: @{poisoned: uninit@}
*4c3eb207Smrg  sv2: @{poisoned: uninit@}
*4c3eb207Smrg  sv3: @{type: 'void *', unknown@}
*4c3eb207Smrg  sv4: @{type: 'void *', &r6@}
*4c3eb207Smrgconstraint manager:
*4c3eb207Smrg  equiv classes:
*4c3eb207Smrg    ec0: @{sv0 == '1024'@}
*4c3eb207Smrg    ec1: @{sv4@}
*4c3eb207Smrg  constraints:
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207SmrgThis is the state at the point of returning from @code{calls_malloc} back
*4c3eb207Smrgto @code{test} in the following:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrgvoid *
*4c3eb207Smrgcalls_malloc (void)
*4c3eb207Smrg@{
*4c3eb207Smrg  void *result = malloc (1024);
*4c3eb207Smrg  return result;
*4c3eb207Smrg@}
*4c3eb207Smrg
*4c3eb207Smrgvoid test (void)
*4c3eb207Smrg@{
*4c3eb207Smrg  void *ptr = calls_malloc ();
*4c3eb207Smrg  /* etc.  */
*4c3eb207Smrg@}
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207SmrgThe ``root'' region (``r0'') has a ``stack'' child (``r1''), with two
*4c3eb207Smrgchildren: a frame for @code{test} (``r2''), and a frame for
*4c3eb207Smrg@code{calls_malloc} (``r4'').  These frame regions have child regions for
*4c3eb207Smrgstoring their local variables.  For example, the return region
*4c3eb207Smrgand that of various other regions within the ``calls_malloc'' frame all have
*4c3eb207Smrgvalue ``sv4'', a pointer to a heap-allocated region ``r6''.  Within the parent
*4c3eb207Smrgframe, @code{ptr_3} has value ``sv3'', an unknown @code{void *}.
*4c3eb207Smrg
*4c3eb207Smrg@subsection Analyzer Paths
*4c3eb207Smrg
*4c3eb207SmrgWe need to explain to the user what the problem is, and to persuade them
*4c3eb207Smrgthat there really is a problem.  Hence having a @code{diagnostic_path}
*4c3eb207Smrgisn't just an incidental detail of the analyzer; it's required.
*4c3eb207Smrg
*4c3eb207SmrgPaths ought to be:
*4c3eb207Smrg@itemize @bullet
*4c3eb207Smrg@item
*4c3eb207Smrginterprocedurally-valid
*4c3eb207Smrg@item
*4c3eb207Smrgfeasible
*4c3eb207Smrg@end itemize
*4c3eb207Smrg
*4c3eb207SmrgWithout state-merging, all paths in the exploded graph are feasible
*4c3eb207Smrg(in terms of constraints being satisified).
*4c3eb207SmrgWith state-merging, paths in the exploded graph can be infeasible.
*4c3eb207Smrg
*4c3eb207SmrgWe collate warnings and only emit them for the simplest path
*4c3eb207Smrge.g. for a bug in a utility function, with lots of routes to calling it,
*4c3eb207Smrgwe only emit the simplest path (which could be intraprocedural, if
*4c3eb207Smrgit can be reproduced without a caller).  We apply a check that
*4c3eb207Smrgeach duplicate warning's shortest path is feasible, rejecting any
*4c3eb207Smrgwarnings for which the shortest path is infeasible (which could lead to
*4c3eb207Smrgfalse negatives).
*4c3eb207Smrg
*4c3eb207SmrgWe use the shortest feasible @code{exploded_path} through the
*4c3eb207Smrg@code{exploded_graph} (a list of @code{exploded_edge *}) to build a
*4c3eb207Smrg@code{diagnostic_path} (a list of events for the diagnostic subsystem) -
*4c3eb207Smrgspecifically a @code{checker_path}.
*4c3eb207Smrg
*4c3eb207SmrgHaving built the @code{checker_path}, we prune it to try to eliminate
*4c3eb207Smrgevents that aren't relevant, to minimize how much the user has to read.
*4c3eb207Smrg
*4c3eb207SmrgAfter pruning, we notify each event in the path of its ID and record the
*4c3eb207SmrgIDs of interesting events, allowing for events to refer to other events
*4c3eb207Smrgin their descriptions.  The @code{pending_diagnostic} class has various
*4c3eb207Smrgvfuncs to support emitting more precise descriptions, so that e.g.
*4c3eb207Smrg
*4c3eb207Smrg@itemize @bullet
*4c3eb207Smrg@item
*4c3eb207Smrga deref-of-unchecked-malloc diagnostic might use:
*4c3eb207Smrg@smallexample
*4c3eb207Smrg  returning possibly-NULL pointer to 'make_obj' from 'allocator'
*4c3eb207Smrg@end smallexample
*4c3eb207Smrgfor a @code{return_event} to make it clearer how the unchecked value moves
*4c3eb207Smrgfrom callee back to caller
*4c3eb207Smrg@item
*4c3eb207Smrga double-free diagnostic might use:
*4c3eb207Smrg@smallexample
*4c3eb207Smrg  second 'free' here; first 'free' was at (3)
*4c3eb207Smrg@end smallexample
*4c3eb207Smrgand a use-after-free might use
*4c3eb207Smrg@smallexample
*4c3eb207Smrg  use after 'free' here; memory was freed at (2)
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg@end itemize
*4c3eb207Smrg
*4c3eb207SmrgAt this point we can emit the diagnostic.
*4c3eb207Smrg
*4c3eb207Smrg@subsection Limitations
*4c3eb207Smrg
*4c3eb207Smrg@itemize @bullet
*4c3eb207Smrg@item
*4c3eb207SmrgOnly for C so far
*4c3eb207Smrg@item
*4c3eb207SmrgThe implementation of call summaries is currently very simplistic.
*4c3eb207Smrg@item
*4c3eb207SmrgLack of function pointer analysis
*4c3eb207Smrg@item
*4c3eb207SmrgThe constraint-handling code assumes reflexivity in some places
*4c3eb207Smrg(that values are equal to themselves), which is not the case for NaN.
*4c3eb207SmrgAs a simple workaround, constraints on floating-point values are
*4c3eb207Smrgcurrently ignored.
*4c3eb207Smrg@item
*4c3eb207SmrgThe region model code creates lots of little mutable objects at each
*4c3eb207Smrg@code{region_model} (and thus per @code{exploded_node}) rather than
*4c3eb207Smrgsharing immutable objects and having the mutable state in the
*4c3eb207Smrg@code{program_state} or @code{region_model}.  The latter approach might be
*4c3eb207Smrgmore efficient, and might avoid dealing with IDs rather than pointers
*4c3eb207Smrg(which requires us to impose an ordering to get meaningful equality).
*4c3eb207Smrg@item
*4c3eb207SmrgThe region model code doesn't yet support @code{memcpy}.  At the
*4c3eb207Smrggimple-ssa level these have been optimized to statements like this:
*4c3eb207Smrg@smallexample
*4c3eb207Smrg_10 = MEM <long unsigned int> [(char * @{ref-all@})&c]
*4c3eb207SmrgMEM <long unsigned int> [(char * @{ref-all@})&d] = _10;
*4c3eb207Smrg@end smallexample
*4c3eb207SmrgPerhaps they could be supported via a new @code{compound_svalue} type.
*4c3eb207Smrg@item
*4c3eb207SmrgThere are various other limitations in the region model (grep for TODO/xfail
*4c3eb207Smrgin the testsuite).
*4c3eb207Smrg@item
*4c3eb207SmrgThe constraint_manager's implementation of transitivity is currently too
*4c3eb207Smrgexpensive to enable by default and so must be manually enabled via
*4c3eb207Smrg@option{-fanalyzer-transitivity}).
*4c3eb207Smrg@item
*4c3eb207SmrgThe checkers are currently hardcoded and don't allow for user extensibility
*4c3eb207Smrg(e.g. adding allocate/release pairs).
*4c3eb207Smrg@item
*4c3eb207SmrgAlthough the analyzer's test suite has a proof-of-concept test case for
*4c3eb207SmrgLTO, LTO support hasn't had extensive testing.  There are various
*4c3eb207Smrglang-specific things in the analyzer that assume C rather than LTO.
*4c3eb207SmrgFor example, SSA names are printed to the user in ``raw'' form, rather
*4c3eb207Smrgthan printing the underlying variable name.
*4c3eb207Smrg@end itemize
*4c3eb207Smrg
*4c3eb207SmrgSome ideas for other checkers
*4c3eb207Smrg@itemize @bullet
*4c3eb207Smrg@item
*4c3eb207SmrgFile-descriptor-based APIs
*4c3eb207Smrg@item
*4c3eb207SmrgLinux kernel internal APIs
*4c3eb207Smrg@item
*4c3eb207SmrgSignal handling
*4c3eb207Smrg@end itemize
*4c3eb207Smrg
*4c3eb207Smrg@node Debugging the Analyzer
*4c3eb207Smrg@section Debugging the Analyzer
*4c3eb207Smrg@cindex analyzer, debugging
*4c3eb207Smrg@cindex static analyzer, debugging
*4c3eb207Smrg
*4c3eb207Smrg@subsection Special Functions for Debugging the Analyzer
*4c3eb207Smrg
*4c3eb207SmrgThe analyzer recognizes various special functions by name, for use
*4c3eb207Smrgin debugging the analyzer.  Declarations can be seen in the testsuite
*4c3eb207Smrgin @file{analyzer-decls.h}.  None of these functions are actually
*4c3eb207Smrgimplemented.
*4c3eb207Smrg
*4c3eb207SmrgAdd:
*4c3eb207Smrg@smallexample
*4c3eb207Smrg  __analyzer_break ();
*4c3eb207Smrg@end smallexample
*4c3eb207Smrgto the source being analyzed to trigger a breakpoint in the analyzer when
*4c3eb207Smrgthat source is reached.  By putting a series of these in the source, it's
*4c3eb207Smrgmuch easier to effectively step through the program state as it's analyzed.
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg__analyzer_dump ();
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgwill dump the copious information about the analyzer's state each time it
*4c3eb207Smrgreaches the call in its traversal of the source.
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg__analyzer_dump_path ();
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgwill emit a placeholder ``note'' diagnostic with a path to that call site,
*4c3eb207Smrgif the analyzer finds a feasible path to it.
*4c3eb207Smrg
*4c3eb207SmrgThe builtin @code{__analyzer_dump_exploded_nodes} will emit a warning
*4c3eb207Smrgafter analysis containing information on all of the exploded nodes at that
*4c3eb207Smrgprogram point:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg  __analyzer_dump_exploded_nodes (0);
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgwill output the number of ``processed'' nodes, and the IDs of
*4c3eb207Smrgboth ``processed'' and ``merger'' nodes, such as:
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrgwarning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59]
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207SmrgWith a non-zero argument
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg  __analyzer_dump_exploded_nodes (1);
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgit will also dump all of the states within the ``processed'' nodes.
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg   __analyzer_dump_region_model ();
*4c3eb207Smrg@end smallexample
*4c3eb207Smrgwill dump the region_model's state to stderr.
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg__analyzer_eval (expr);
*4c3eb207Smrg@end smallexample
*4c3eb207Smrgwill emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
*4c3eb207Smrgtruthfulness of the argument.  This is useful for writing DejaGnu tests.
*4c3eb207Smrg
*4c3eb207Smrg
*4c3eb207Smrg@subsection Other Debugging Techniques
*4c3eb207Smrg
*4c3eb207SmrgOne approach when tracking down where a particular bogus state is
*4c3eb207Smrgintroduced into the @code{exploded_graph} is to add custom code to
*4c3eb207Smrg@code{region_model::validate}.
*4c3eb207Smrg
*4c3eb207SmrgFor example, this custom code (added to @code{region_model::validate})
*4c3eb207Smrgbreaks with an assertion failure when a variable called @code{ptr}
*4c3eb207Smrgacquires a value that's unknown, using
*4c3eb207Smrg@code{region_model::get_value_by_name} to locate the variable
*4c3eb207Smrg
*4c3eb207Smrg@smallexample
*4c3eb207Smrg    /* Find a variable matching "ptr".  */
*4c3eb207Smrg    svalue_id sid = get_value_by_name ("ptr");
*4c3eb207Smrg    if (!sid.null_p ())
*4c3eb207Smrg      @{
*4c3eb207Smrg	svalue *sval = get_svalue (sid);
*4c3eb207Smrg	gcc_assert (sval->get_kind () != SK_UNKNOWN);
*4c3eb207Smrg      @}
*4c3eb207Smrg@end smallexample
*4c3eb207Smrg
*4c3eb207Smrgmaking it easier to investigate further in a debugger when this occurs.