xref: /netbsd-src/external/gpl3/gcc.old/dist/gcc/doc/analyzer.texi (revision 4c3eb207d36f67d31994830c0a694161fc1ca39b)
1*4c3eb207Smrg@c Copyright (C) 2019 Free Software Foundation, Inc.
2*4c3eb207Smrg@c This is part of the GCC manual.
3*4c3eb207Smrg@c For copying conditions, see the file gcc.texi.
4*4c3eb207Smrg@c Contributed by David Malcolm <dmalcolm@redhat.com>.
5*4c3eb207Smrg
6*4c3eb207Smrg@node Static Analyzer
7*4c3eb207Smrg@chapter Static Analyzer
8*4c3eb207Smrg@cindex analyzer
9*4c3eb207Smrg@cindex static analysis
10*4c3eb207Smrg@cindex static analyzer
11*4c3eb207Smrg
12*4c3eb207Smrg@menu
13*4c3eb207Smrg* Analyzer Internals::       Analyzer Internals
14*4c3eb207Smrg* Debugging the Analyzer::   Useful debugging tips
15*4c3eb207Smrg@end menu
16*4c3eb207Smrg
17*4c3eb207Smrg@node Analyzer Internals
18*4c3eb207Smrg@section Analyzer Internals
19*4c3eb207Smrg@cindex analyzer, internals
20*4c3eb207Smrg@cindex static analyzer, internals
21*4c3eb207Smrg
22*4c3eb207Smrg@subsection Overview
23*4c3eb207Smrg
24*4c3eb207SmrgThe analyzer implementation works on the gimple-SSA representation.
25*4c3eb207Smrg(I chose this in the hopes of making it easy to work with LTO to
26*4c3eb207Smrgdo whole-program analysis).
27*4c3eb207Smrg
28*4c3eb207SmrgThe implementation is read-only: it doesn't attempt to change anything,
29*4c3eb207Smrgjust emit warnings.
30*4c3eb207Smrg
31*4c3eb207SmrgThe gimple representation can be seen using @option{-fdump-ipa-analyzer}.
32*4c3eb207Smrg
33*4c3eb207SmrgFirst, we build a @code{supergraph} which combines the callgraph and all
34*4c3eb207Smrgof the CFGs into a single directed graph, with both interprocedural and
35*4c3eb207Smrgintraprocedural edges.  The nodes and edges in the supergraph are called
36*4c3eb207Smrg``supernodes'' and ``superedges'', and often referred to in code as
37*4c3eb207Smrg@code{snodes} and @code{sedges}.  Basic blocks in the CFGs are split at
38*4c3eb207Smrginterprocedural calls, so there can be more than one supernode per
39*4c3eb207Smrgbasic block.  Most statements will be in just one supernode, but a call
40*4c3eb207Smrgstatement can appear in two supernodes: at the end of one for the call,
41*4c3eb207Smrgand again at the start of another for the return.
42*4c3eb207Smrg
43*4c3eb207SmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph}.
44*4c3eb207Smrg
45*4c3eb207SmrgWe then build an @code{analysis_plan} which walks the callgraph to
46*4c3eb207Smrgdetermine which calls might be suitable for being summarized (rather
47*4c3eb207Smrgthan fully explored) and thus in what order to explore the functions.
48*4c3eb207Smrg
49*4c3eb207SmrgNext is the heart of the analyzer: we use a worklist to explore state
50*4c3eb207Smrgwithin the supergraph, building an "exploded graph".
51*4c3eb207SmrgNodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
52*4c3eb207Smrg     "Precise Interprocedural Dataflow Analysis via Graph Reachability"
53*4c3eb207Smrg     (Thomas Reps, Susan Horwitz and Mooly Sagiv).
54*4c3eb207Smrg
55*4c3eb207SmrgWe reuse nodes for <point, state> pairs we've already seen, and avoid
56*4c3eb207Smrgtracking state too closely, so that (hopefully) we rapidly converge
57*4c3eb207Smrgon a final exploded graph, and terminate the analysis.  We also bail
58*4c3eb207Smrgout if the number of exploded <end-of-basic-block, state> nodes gets
59*4c3eb207Smrglarger than a particular multiple of the total number of basic blocks
60*4c3eb207Smrg(to ensure termination in the face of pathological state-explosion
61*4c3eb207Smrgcases, or bugs).  We also stop exploring a point once we hit a limit
62*4c3eb207Smrgof states for that point.
63*4c3eb207Smrg
64*4c3eb207SmrgWe can identify problems directly when processing a <point,@w{ }state>
65*4c3eb207Smrginstance.  For example, if we're finding the successors of
66*4c3eb207Smrg
67*4c3eb207Smrg@smallexample
68*4c3eb207Smrg   <point: before-stmt: "free (ptr);",
69*4c3eb207Smrg    state: @{"ptr": freed@}>
70*4c3eb207Smrg@end smallexample
71*4c3eb207Smrg
72*4c3eb207Smrgthen we can detect a double-free of "ptr".  We can then emit a path
73*4c3eb207Smrgto reach the problem by finding the simplest route through the graph.
74*4c3eb207Smrg
75*4c3eb207SmrgProgram points in the analysis are much more fine-grained than in the
76*4c3eb207SmrgCFG and supergraph, with points (and thus potentially exploded nodes)
77*4c3eb207Smrgfor various events, including before individual statements.
78*4c3eb207SmrgBy default the exploded graph merges multiple consecutive statements
79*4c3eb207Smrgin a supernode into one exploded edge to minimize the size of the
80*4c3eb207Smrgexploded graph.  This can be suppressed via
81*4c3eb207Smrg@option{-fanalyzer-fine-grained}.
82*4c3eb207SmrgThe fine-grained approach seems to make things simpler and more debuggable
83*4c3eb207Smrgthat other approaches I tried, in that each point is responsible for one
84*4c3eb207Smrgthing.
85*4c3eb207Smrg
86*4c3eb207SmrgProgram points in the analysis also have a "call string" identifying the
87*4c3eb207Smrgstack of callsites below them, so that paths in the exploded graph
88*4c3eb207Smrgcorrespond to interprocedurally valid paths: we always return to the
89*4c3eb207Smrgcorrect call site, propagating state information accordingly.
90*4c3eb207SmrgWe avoid infinite recursion by stopping the analysis if a callsite
91*4c3eb207Smrgappears more than @code{analyzer-max-recursion-depth} in a callstring
92*4c3eb207Smrg(defaulting to 2).
93*4c3eb207Smrg
94*4c3eb207Smrg@subsection Graphs
95*4c3eb207Smrg
96*4c3eb207SmrgNodes and edges in the exploded graph are called ``exploded nodes'' and
97*4c3eb207Smrg``exploded edges'' and often referred to in the code as
98*4c3eb207Smrg@code{enodes} and @code{eedges} (especially when distinguishing them
99*4c3eb207Smrgfrom the @code{snodes} and @code{sedges} in the supergraph).
100*4c3eb207Smrg
101*4c3eb207SmrgEach graph numbers its nodes, giving unique identifiers - supernodes
102*4c3eb207Smrgare referred to throughout dumps in the form @samp{SN': @var{index}} and
103*4c3eb207Smrgexploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
104*4c3eb207Smrg@samp{EN:29}).
105*4c3eb207Smrg
106*4c3eb207SmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.
107*4c3eb207Smrg
108*4c3eb207SmrgThe exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
109*4c3eb207Smrgand other dump options.  Exploded nodes are color-coded in the .dot output
110*4c3eb207Smrgbased on state-machine states to make it easier to see state changes at
111*4c3eb207Smrga glance.
112*4c3eb207Smrg
113*4c3eb207Smrg@subsection State Tracking
114*4c3eb207Smrg
115*4c3eb207SmrgThere's a tension between:
116*4c3eb207Smrg@itemize @bullet
117*4c3eb207Smrg@item
118*4c3eb207Smrgprecision of analysis in the straight-line case, vs
119*4c3eb207Smrg@item
120*4c3eb207Smrgexponential blow-up in the face of control flow.
121*4c3eb207Smrg@end itemize
122*4c3eb207Smrg
123*4c3eb207SmrgFor example, in general, given this CFG:
124*4c3eb207Smrg
125*4c3eb207Smrg@smallexample
126*4c3eb207Smrg      A
127*4c3eb207Smrg     / \
128*4c3eb207Smrg    B   C
129*4c3eb207Smrg     \ /
130*4c3eb207Smrg      D
131*4c3eb207Smrg     / \
132*4c3eb207Smrg    E   F
133*4c3eb207Smrg     \ /
134*4c3eb207Smrg      G
135*4c3eb207Smrg@end smallexample
136*4c3eb207Smrg
137*4c3eb207Smrgwe want to avoid differences in state-tracking in B and C from
138*4c3eb207Smrgleading to blow-up.  If we don't prevent state blowup, we end up
139*4c3eb207Smrgwith exponential growth of the exploded graph like this:
140*4c3eb207Smrg
141*4c3eb207Smrg@smallexample
142*4c3eb207Smrg
143*4c3eb207Smrg           1:A
144*4c3eb207Smrg          /   \
145*4c3eb207Smrg         /     \
146*4c3eb207Smrg        /       \
147*4c3eb207Smrg      2:B       3:C
148*4c3eb207Smrg       |         |
149*4c3eb207Smrg      4:D       5:D        (2 exploded nodes for D)
150*4c3eb207Smrg     /   \     /   \
151*4c3eb207Smrg   6:E   7:F 8:E   9:F
152*4c3eb207Smrg    |     |   |     |
153*4c3eb207Smrg   10:G 11:G 12:G  13:G    (4 exploded nodes for G)
154*4c3eb207Smrg
155*4c3eb207Smrg@end smallexample
156*4c3eb207Smrg
157*4c3eb207SmrgSimilar issues arise with loops.
158*4c3eb207Smrg
159*4c3eb207SmrgTo prevent this, we follow various approaches:
160*4c3eb207Smrg
161*4c3eb207Smrg@enumerate a
162*4c3eb207Smrg@item
163*4c3eb207Smrgstate pruning: which tries to discard state that won't be relevant
164*4c3eb207Smrglater on withing the function.
165*4c3eb207SmrgThis can be disabled via @option{-fno-analyzer-state-purge}.
166*4c3eb207Smrg
167*4c3eb207Smrg@item
168*4c3eb207Smrgstate merging.  We can try to find the commonality between two
169*4c3eb207Smrgprogram_state instances to make a third, simpler program_state.
170*4c3eb207SmrgWe have two strategies here:
171*4c3eb207Smrg
172*4c3eb207Smrg  @enumerate
173*4c3eb207Smrg  @item
174*4c3eb207Smrg     the worklist keeps new nodes for the same program_point together,
175*4c3eb207Smrg     and tries to merge them before processing, and thus before they have
176*4c3eb207Smrg     successors.  Hence, in the above, the two nodes for D (4 and 5) reach
177*4c3eb207Smrg     the front of the worklist together, and we create a node for D with
178*4c3eb207Smrg     the merger of the incoming states.
179*4c3eb207Smrg
180*4c3eb207Smrg  @item
181*4c3eb207Smrg     try merging with the state of existing enodes for the program_point
182*4c3eb207Smrg     (which may have already been explored).  There will be duplication,
183*4c3eb207Smrg     but only one set of duplication; subsequent duplicates are more likely
184*4c3eb207Smrg     to hit the cache.  In particular, (hopefully) all merger chains are
185*4c3eb207Smrg     finite, and so we guarantee termination.
186*4c3eb207Smrg     This is intended to help with loops: we ought to explore the first
187*4c3eb207Smrg     iteration, and then have a "subsequent iterations" exploration,
188*4c3eb207Smrg     which uses a state merged from that of the first, to be more abstract.
189*4c3eb207Smrg  @end enumerate
190*4c3eb207Smrg
191*4c3eb207SmrgWe avoid merging pairs of states that have state-machine differences,
192*4c3eb207Smrgas these are the kinds of differences that are likely to be most
193*4c3eb207Smrginteresting.  So, for example, given:
194*4c3eb207Smrg
195*4c3eb207Smrg@smallexample
196*4c3eb207Smrg      if (condition)
197*4c3eb207Smrg        ptr = malloc (size);
198*4c3eb207Smrg      else
199*4c3eb207Smrg        ptr = local_buf;
200*4c3eb207Smrg
201*4c3eb207Smrg      .... do things with 'ptr'
202*4c3eb207Smrg
203*4c3eb207Smrg      if (condition)
204*4c3eb207Smrg        free (ptr);
205*4c3eb207Smrg
206*4c3eb207Smrg      ...etc
207*4c3eb207Smrg@end smallexample
208*4c3eb207Smrg
209*4c3eb207Smrgthen we end up with an exploded graph that looks like this:
210*4c3eb207Smrg
211*4c3eb207Smrg@smallexample
212*4c3eb207Smrg
213*4c3eb207Smrg                   if (condition)
214*4c3eb207Smrg                     / T      \ F
215*4c3eb207Smrg            ---------          ----------
216*4c3eb207Smrg           /                             \
217*4c3eb207Smrg      ptr = malloc (size)             ptr = local_buf
218*4c3eb207Smrg          |                               |
219*4c3eb207Smrg      copy of                         copy of
220*4c3eb207Smrg        "do things with 'ptr'"          "do things with 'ptr'"
221*4c3eb207Smrg      with ptr: heap-allocated        with ptr: stack-allocated
222*4c3eb207Smrg          |                               |
223*4c3eb207Smrg      if (condition)                  if (condition)
224*4c3eb207Smrg          | known to be T                 | known to be F
225*4c3eb207Smrg      free (ptr);                         |
226*4c3eb207Smrg           \                             /
227*4c3eb207Smrg            -----------------------------
228*4c3eb207Smrg                         | ('ptr' is pruned, so states can be merged)
229*4c3eb207Smrg                        etc
230*4c3eb207Smrg
231*4c3eb207Smrg@end smallexample
232*4c3eb207Smrg
233*4c3eb207Smrgwhere some duplication has occurred, but only for the places where the
234*4c3eb207Smrgthe different paths are worth exploringly separately.
235*4c3eb207Smrg
236*4c3eb207SmrgMerging can be disabled via @option{-fno-analyzer-state-merge}.
237*4c3eb207Smrg@end enumerate
238*4c3eb207Smrg
239*4c3eb207Smrg@subsection Region Model
240*4c3eb207Smrg
241*4c3eb207SmrgPart of the state stored at a @code{exploded_node} is a @code{region_model}.
242*4c3eb207SmrgThis is an implementation of the region-based ternary model described in
243*4c3eb207Smrg@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf,
244*4c3eb207Smrg"A Memory Model for Static Analysis of C Programs"}
245*4c3eb207Smrg(Zhongxing Xu, Ted Kremenek, and Jian Zhang).
246*4c3eb207Smrg
247*4c3eb207SmrgA @code{region_model} encapsulates a representation of the state of
248*4c3eb207Smrgmemory, with a tree of @code{region} instances, along with their associated
249*4c3eb207Smrgvalues.  The representation is graph-like because values can be pointers
250*4c3eb207Smrgto regions.  It also stores a constraint_manager, capturing relationships
251*4c3eb207Smrgbetween the values.
252*4c3eb207Smrg
253*4c3eb207SmrgBecause each node in the @code{exploded_graph} has a @code{region_model},
254*4c3eb207Smrgand each of the latter is graph-like, the @code{exploded_graph} is in some
255*4c3eb207Smrgways a graph of graphs.
256*4c3eb207Smrg
257*4c3eb207SmrgHere's an example of printing a @code{region_model}, showing the ASCII-art
258*4c3eb207Smrgused to visualize the region hierarchy (colorized when printing to stderr):
259*4c3eb207Smrg
260*4c3eb207Smrg@smallexample
261*4c3eb207Smrg(gdb) call debug (*this)
262*4c3eb207Smrgr0: @{kind: 'root', parent: null, sval: null@}
263*4c3eb207Smrg|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@}
264*4c3eb207Smrg|  |: sval: sv1: @{poisoned: uninit@}
265*4c3eb207Smrg|  |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@}
266*4c3eb207Smrg|  |  `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@}
267*4c3eb207Smrg|  |    |: sval: sv3: @{type: 'void *', unknown@}
268*4c3eb207Smrg|  |    |: type: 'void *'
269*4c3eb207Smrg|  `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@}
270*4c3eb207Smrg|    |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
271*4c3eb207Smrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
272*4c3eb207Smrg|    |  |: type: 'void *'
273*4c3eb207Smrg|    |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
274*4c3eb207Smrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
275*4c3eb207Smrg|    |  |: type: 'void *'
276*4c3eb207Smrg|    `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
277*4c3eb207Smrg|      |: sval: sv4: @{type: 'void *', &r6@}
278*4c3eb207Smrg|      |: type: 'void *'
279*4c3eb207Smrg`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@}
280*4c3eb207Smrg  |: sval: sv2: @{poisoned: uninit@}
281*4c3eb207Smrg  `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@}
282*4c3eb207Smrgsvalues:
283*4c3eb207Smrg  sv0: @{type: 'size_t', '1024'@}
284*4c3eb207Smrg  sv1: @{poisoned: uninit@}
285*4c3eb207Smrg  sv2: @{poisoned: uninit@}
286*4c3eb207Smrg  sv3: @{type: 'void *', unknown@}
287*4c3eb207Smrg  sv4: @{type: 'void *', &r6@}
288*4c3eb207Smrgconstraint manager:
289*4c3eb207Smrg  equiv classes:
290*4c3eb207Smrg    ec0: @{sv0 == '1024'@}
291*4c3eb207Smrg    ec1: @{sv4@}
292*4c3eb207Smrg  constraints:
293*4c3eb207Smrg@end smallexample
294*4c3eb207Smrg
295*4c3eb207SmrgThis is the state at the point of returning from @code{calls_malloc} back
296*4c3eb207Smrgto @code{test} in the following:
297*4c3eb207Smrg
298*4c3eb207Smrg@smallexample
299*4c3eb207Smrgvoid *
300*4c3eb207Smrgcalls_malloc (void)
301*4c3eb207Smrg@{
302*4c3eb207Smrg  void *result = malloc (1024);
303*4c3eb207Smrg  return result;
304*4c3eb207Smrg@}
305*4c3eb207Smrg
306*4c3eb207Smrgvoid test (void)
307*4c3eb207Smrg@{
308*4c3eb207Smrg  void *ptr = calls_malloc ();
309*4c3eb207Smrg  /* etc.  */
310*4c3eb207Smrg@}
311*4c3eb207Smrg@end smallexample
312*4c3eb207Smrg
313*4c3eb207SmrgThe ``root'' region (``r0'') has a ``stack'' child (``r1''), with two
314*4c3eb207Smrgchildren: a frame for @code{test} (``r2''), and a frame for
315*4c3eb207Smrg@code{calls_malloc} (``r4'').  These frame regions have child regions for
316*4c3eb207Smrgstoring their local variables.  For example, the return region
317*4c3eb207Smrgand that of various other regions within the ``calls_malloc'' frame all have
318*4c3eb207Smrgvalue ``sv4'', a pointer to a heap-allocated region ``r6''.  Within the parent
319*4c3eb207Smrgframe, @code{ptr_3} has value ``sv3'', an unknown @code{void *}.
320*4c3eb207Smrg
321*4c3eb207Smrg@subsection Analyzer Paths
322*4c3eb207Smrg
323*4c3eb207SmrgWe need to explain to the user what the problem is, and to persuade them
324*4c3eb207Smrgthat there really is a problem.  Hence having a @code{diagnostic_path}
325*4c3eb207Smrgisn't just an incidental detail of the analyzer; it's required.
326*4c3eb207Smrg
327*4c3eb207SmrgPaths ought to be:
328*4c3eb207Smrg@itemize @bullet
329*4c3eb207Smrg@item
330*4c3eb207Smrginterprocedurally-valid
331*4c3eb207Smrg@item
332*4c3eb207Smrgfeasible
333*4c3eb207Smrg@end itemize
334*4c3eb207Smrg
335*4c3eb207SmrgWithout state-merging, all paths in the exploded graph are feasible
336*4c3eb207Smrg(in terms of constraints being satisified).
337*4c3eb207SmrgWith state-merging, paths in the exploded graph can be infeasible.
338*4c3eb207Smrg
339*4c3eb207SmrgWe collate warnings and only emit them for the simplest path
340*4c3eb207Smrge.g. for a bug in a utility function, with lots of routes to calling it,
341*4c3eb207Smrgwe only emit the simplest path (which could be intraprocedural, if
342*4c3eb207Smrgit can be reproduced without a caller).  We apply a check that
343*4c3eb207Smrgeach duplicate warning's shortest path is feasible, rejecting any
344*4c3eb207Smrgwarnings for which the shortest path is infeasible (which could lead to
345*4c3eb207Smrgfalse negatives).
346*4c3eb207Smrg
347*4c3eb207SmrgWe use the shortest feasible @code{exploded_path} through the
348*4c3eb207Smrg@code{exploded_graph} (a list of @code{exploded_edge *}) to build a
349*4c3eb207Smrg@code{diagnostic_path} (a list of events for the diagnostic subsystem) -
350*4c3eb207Smrgspecifically a @code{checker_path}.
351*4c3eb207Smrg
352*4c3eb207SmrgHaving built the @code{checker_path}, we prune it to try to eliminate
353*4c3eb207Smrgevents that aren't relevant, to minimize how much the user has to read.
354*4c3eb207Smrg
355*4c3eb207SmrgAfter pruning, we notify each event in the path of its ID and record the
356*4c3eb207SmrgIDs of interesting events, allowing for events to refer to other events
357*4c3eb207Smrgin their descriptions.  The @code{pending_diagnostic} class has various
358*4c3eb207Smrgvfuncs to support emitting more precise descriptions, so that e.g.
359*4c3eb207Smrg
360*4c3eb207Smrg@itemize @bullet
361*4c3eb207Smrg@item
362*4c3eb207Smrga deref-of-unchecked-malloc diagnostic might use:
363*4c3eb207Smrg@smallexample
364*4c3eb207Smrg  returning possibly-NULL pointer to 'make_obj' from 'allocator'
365*4c3eb207Smrg@end smallexample
366*4c3eb207Smrgfor a @code{return_event} to make it clearer how the unchecked value moves
367*4c3eb207Smrgfrom callee back to caller
368*4c3eb207Smrg@item
369*4c3eb207Smrga double-free diagnostic might use:
370*4c3eb207Smrg@smallexample
371*4c3eb207Smrg  second 'free' here; first 'free' was at (3)
372*4c3eb207Smrg@end smallexample
373*4c3eb207Smrgand a use-after-free might use
374*4c3eb207Smrg@smallexample
375*4c3eb207Smrg  use after 'free' here; memory was freed at (2)
376*4c3eb207Smrg@end smallexample
377*4c3eb207Smrg@end itemize
378*4c3eb207Smrg
379*4c3eb207SmrgAt this point we can emit the diagnostic.
380*4c3eb207Smrg
381*4c3eb207Smrg@subsection Limitations
382*4c3eb207Smrg
383*4c3eb207Smrg@itemize @bullet
384*4c3eb207Smrg@item
385*4c3eb207SmrgOnly for C so far
386*4c3eb207Smrg@item
387*4c3eb207SmrgThe implementation of call summaries is currently very simplistic.
388*4c3eb207Smrg@item
389*4c3eb207SmrgLack of function pointer analysis
390*4c3eb207Smrg@item
391*4c3eb207SmrgThe constraint-handling code assumes reflexivity in some places
392*4c3eb207Smrg(that values are equal to themselves), which is not the case for NaN.
393*4c3eb207SmrgAs a simple workaround, constraints on floating-point values are
394*4c3eb207Smrgcurrently ignored.
395*4c3eb207Smrg@item
396*4c3eb207SmrgThe region model code creates lots of little mutable objects at each
397*4c3eb207Smrg@code{region_model} (and thus per @code{exploded_node}) rather than
398*4c3eb207Smrgsharing immutable objects and having the mutable state in the
399*4c3eb207Smrg@code{program_state} or @code{region_model}.  The latter approach might be
400*4c3eb207Smrgmore efficient, and might avoid dealing with IDs rather than pointers
401*4c3eb207Smrg(which requires us to impose an ordering to get meaningful equality).
402*4c3eb207Smrg@item
403*4c3eb207SmrgThe region model code doesn't yet support @code{memcpy}.  At the
404*4c3eb207Smrggimple-ssa level these have been optimized to statements like this:
405*4c3eb207Smrg@smallexample
406*4c3eb207Smrg_10 = MEM <long unsigned int> [(char * @{ref-all@})&c]
407*4c3eb207SmrgMEM <long unsigned int> [(char * @{ref-all@})&d] = _10;
408*4c3eb207Smrg@end smallexample
409*4c3eb207SmrgPerhaps they could be supported via a new @code{compound_svalue} type.
410*4c3eb207Smrg@item
411*4c3eb207SmrgThere are various other limitations in the region model (grep for TODO/xfail
412*4c3eb207Smrgin the testsuite).
413*4c3eb207Smrg@item
414*4c3eb207SmrgThe constraint_manager's implementation of transitivity is currently too
415*4c3eb207Smrgexpensive to enable by default and so must be manually enabled via
416*4c3eb207Smrg@option{-fanalyzer-transitivity}).
417*4c3eb207Smrg@item
418*4c3eb207SmrgThe checkers are currently hardcoded and don't allow for user extensibility
419*4c3eb207Smrg(e.g. adding allocate/release pairs).
420*4c3eb207Smrg@item
421*4c3eb207SmrgAlthough the analyzer's test suite has a proof-of-concept test case for
422*4c3eb207SmrgLTO, LTO support hasn't had extensive testing.  There are various
423*4c3eb207Smrglang-specific things in the analyzer that assume C rather than LTO.
424*4c3eb207SmrgFor example, SSA names are printed to the user in ``raw'' form, rather
425*4c3eb207Smrgthan printing the underlying variable name.
426*4c3eb207Smrg@end itemize
427*4c3eb207Smrg
428*4c3eb207SmrgSome ideas for other checkers
429*4c3eb207Smrg@itemize @bullet
430*4c3eb207Smrg@item
431*4c3eb207SmrgFile-descriptor-based APIs
432*4c3eb207Smrg@item
433*4c3eb207SmrgLinux kernel internal APIs
434*4c3eb207Smrg@item
435*4c3eb207SmrgSignal handling
436*4c3eb207Smrg@end itemize
437*4c3eb207Smrg
438*4c3eb207Smrg@node Debugging the Analyzer
439*4c3eb207Smrg@section Debugging the Analyzer
440*4c3eb207Smrg@cindex analyzer, debugging
441*4c3eb207Smrg@cindex static analyzer, debugging
442*4c3eb207Smrg
443*4c3eb207Smrg@subsection Special Functions for Debugging the Analyzer
444*4c3eb207Smrg
445*4c3eb207SmrgThe analyzer recognizes various special functions by name, for use
446*4c3eb207Smrgin debugging the analyzer.  Declarations can be seen in the testsuite
447*4c3eb207Smrgin @file{analyzer-decls.h}.  None of these functions are actually
448*4c3eb207Smrgimplemented.
449*4c3eb207Smrg
450*4c3eb207SmrgAdd:
451*4c3eb207Smrg@smallexample
452*4c3eb207Smrg  __analyzer_break ();
453*4c3eb207Smrg@end smallexample
454*4c3eb207Smrgto the source being analyzed to trigger a breakpoint in the analyzer when
455*4c3eb207Smrgthat source is reached.  By putting a series of these in the source, it's
456*4c3eb207Smrgmuch easier to effectively step through the program state as it's analyzed.
457*4c3eb207Smrg
458*4c3eb207Smrg@smallexample
459*4c3eb207Smrg__analyzer_dump ();
460*4c3eb207Smrg@end smallexample
461*4c3eb207Smrg
462*4c3eb207Smrgwill dump the copious information about the analyzer's state each time it
463*4c3eb207Smrgreaches the call in its traversal of the source.
464*4c3eb207Smrg
465*4c3eb207Smrg@smallexample
466*4c3eb207Smrg__analyzer_dump_path ();
467*4c3eb207Smrg@end smallexample
468*4c3eb207Smrg
469*4c3eb207Smrgwill emit a placeholder ``note'' diagnostic with a path to that call site,
470*4c3eb207Smrgif the analyzer finds a feasible path to it.
471*4c3eb207Smrg
472*4c3eb207SmrgThe builtin @code{__analyzer_dump_exploded_nodes} will emit a warning
473*4c3eb207Smrgafter analysis containing information on all of the exploded nodes at that
474*4c3eb207Smrgprogram point:
475*4c3eb207Smrg
476*4c3eb207Smrg@smallexample
477*4c3eb207Smrg  __analyzer_dump_exploded_nodes (0);
478*4c3eb207Smrg@end smallexample
479*4c3eb207Smrg
480*4c3eb207Smrgwill output the number of ``processed'' nodes, and the IDs of
481*4c3eb207Smrgboth ``processed'' and ``merger'' nodes, such as:
482*4c3eb207Smrg
483*4c3eb207Smrg@smallexample
484*4c3eb207Smrgwarning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59]
485*4c3eb207Smrg@end smallexample
486*4c3eb207Smrg
487*4c3eb207SmrgWith a non-zero argument
488*4c3eb207Smrg
489*4c3eb207Smrg@smallexample
490*4c3eb207Smrg  __analyzer_dump_exploded_nodes (1);
491*4c3eb207Smrg@end smallexample
492*4c3eb207Smrg
493*4c3eb207Smrgit will also dump all of the states within the ``processed'' nodes.
494*4c3eb207Smrg
495*4c3eb207Smrg@smallexample
496*4c3eb207Smrg   __analyzer_dump_region_model ();
497*4c3eb207Smrg@end smallexample
498*4c3eb207Smrgwill dump the region_model's state to stderr.
499*4c3eb207Smrg
500*4c3eb207Smrg@smallexample
501*4c3eb207Smrg__analyzer_eval (expr);
502*4c3eb207Smrg@end smallexample
503*4c3eb207Smrgwill emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
504*4c3eb207Smrgtruthfulness of the argument.  This is useful for writing DejaGnu tests.
505*4c3eb207Smrg
506*4c3eb207Smrg
507*4c3eb207Smrg@subsection Other Debugging Techniques
508*4c3eb207Smrg
509*4c3eb207SmrgOne approach when tracking down where a particular bogus state is
510*4c3eb207Smrgintroduced into the @code{exploded_graph} is to add custom code to
511*4c3eb207Smrg@code{region_model::validate}.
512*4c3eb207Smrg
513*4c3eb207SmrgFor example, this custom code (added to @code{region_model::validate})
514*4c3eb207Smrgbreaks with an assertion failure when a variable called @code{ptr}
515*4c3eb207Smrgacquires a value that's unknown, using
516*4c3eb207Smrg@code{region_model::get_value_by_name} to locate the variable
517*4c3eb207Smrg
518*4c3eb207Smrg@smallexample
519*4c3eb207Smrg    /* Find a variable matching "ptr".  */
520*4c3eb207Smrg    svalue_id sid = get_value_by_name ("ptr");
521*4c3eb207Smrg    if (!sid.null_p ())
522*4c3eb207Smrg      @{
523*4c3eb207Smrg	svalue *sval = get_svalue (sid);
524*4c3eb207Smrg	gcc_assert (sval->get_kind () != SK_UNKNOWN);
525*4c3eb207Smrg      @}
526*4c3eb207Smrg@end smallexample
527*4c3eb207Smrg
528*4c3eb207Smrgmaking it easier to investigate further in a debugger when this occurs.
529