1*4c3eb207Smrg@c Copyright (C) 2019 Free Software Foundation, Inc. 2*4c3eb207Smrg@c This is part of the GCC manual. 3*4c3eb207Smrg@c For copying conditions, see the file gcc.texi. 4*4c3eb207Smrg@c Contributed by David Malcolm <dmalcolm@redhat.com>. 5*4c3eb207Smrg 6*4c3eb207Smrg@node Static Analyzer 7*4c3eb207Smrg@chapter Static Analyzer 8*4c3eb207Smrg@cindex analyzer 9*4c3eb207Smrg@cindex static analysis 10*4c3eb207Smrg@cindex static analyzer 11*4c3eb207Smrg 12*4c3eb207Smrg@menu 13*4c3eb207Smrg* Analyzer Internals:: Analyzer Internals 14*4c3eb207Smrg* Debugging the Analyzer:: Useful debugging tips 15*4c3eb207Smrg@end menu 16*4c3eb207Smrg 17*4c3eb207Smrg@node Analyzer Internals 18*4c3eb207Smrg@section Analyzer Internals 19*4c3eb207Smrg@cindex analyzer, internals 20*4c3eb207Smrg@cindex static analyzer, internals 21*4c3eb207Smrg 22*4c3eb207Smrg@subsection Overview 23*4c3eb207Smrg 24*4c3eb207SmrgThe analyzer implementation works on the gimple-SSA representation. 25*4c3eb207Smrg(I chose this in the hopes of making it easy to work with LTO to 26*4c3eb207Smrgdo whole-program analysis). 27*4c3eb207Smrg 28*4c3eb207SmrgThe implementation is read-only: it doesn't attempt to change anything, 29*4c3eb207Smrgjust emit warnings. 30*4c3eb207Smrg 31*4c3eb207SmrgThe gimple representation can be seen using @option{-fdump-ipa-analyzer}. 32*4c3eb207Smrg 33*4c3eb207SmrgFirst, we build a @code{supergraph} which combines the callgraph and all 34*4c3eb207Smrgof the CFGs into a single directed graph, with both interprocedural and 35*4c3eb207Smrgintraprocedural edges. The nodes and edges in the supergraph are called 36*4c3eb207Smrg``supernodes'' and ``superedges'', and often referred to in code as 37*4c3eb207Smrg@code{snodes} and @code{sedges}. Basic blocks in the CFGs are split at 38*4c3eb207Smrginterprocedural calls, so there can be more than one supernode per 39*4c3eb207Smrgbasic block. Most statements will be in just one supernode, but a call 40*4c3eb207Smrgstatement can appear in two supernodes: at the end of one for the call, 41*4c3eb207Smrgand again at the start of another for the return. 42*4c3eb207Smrg 43*4c3eb207SmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph}. 44*4c3eb207Smrg 45*4c3eb207SmrgWe then build an @code{analysis_plan} which walks the callgraph to 46*4c3eb207Smrgdetermine which calls might be suitable for being summarized (rather 47*4c3eb207Smrgthan fully explored) and thus in what order to explore the functions. 48*4c3eb207Smrg 49*4c3eb207SmrgNext is the heart of the analyzer: we use a worklist to explore state 50*4c3eb207Smrgwithin the supergraph, building an "exploded graph". 51*4c3eb207SmrgNodes in the exploded graph correspond to <point,@w{ }state> pairs, as in 52*4c3eb207Smrg "Precise Interprocedural Dataflow Analysis via Graph Reachability" 53*4c3eb207Smrg (Thomas Reps, Susan Horwitz and Mooly Sagiv). 54*4c3eb207Smrg 55*4c3eb207SmrgWe reuse nodes for <point, state> pairs we've already seen, and avoid 56*4c3eb207Smrgtracking state too closely, so that (hopefully) we rapidly converge 57*4c3eb207Smrgon a final exploded graph, and terminate the analysis. We also bail 58*4c3eb207Smrgout if the number of exploded <end-of-basic-block, state> nodes gets 59*4c3eb207Smrglarger than a particular multiple of the total number of basic blocks 60*4c3eb207Smrg(to ensure termination in the face of pathological state-explosion 61*4c3eb207Smrgcases, or bugs). We also stop exploring a point once we hit a limit 62*4c3eb207Smrgof states for that point. 63*4c3eb207Smrg 64*4c3eb207SmrgWe can identify problems directly when processing a <point,@w{ }state> 65*4c3eb207Smrginstance. For example, if we're finding the successors of 66*4c3eb207Smrg 67*4c3eb207Smrg@smallexample 68*4c3eb207Smrg <point: before-stmt: "free (ptr);", 69*4c3eb207Smrg state: @{"ptr": freed@}> 70*4c3eb207Smrg@end smallexample 71*4c3eb207Smrg 72*4c3eb207Smrgthen we can detect a double-free of "ptr". We can then emit a path 73*4c3eb207Smrgto reach the problem by finding the simplest route through the graph. 74*4c3eb207Smrg 75*4c3eb207SmrgProgram points in the analysis are much more fine-grained than in the 76*4c3eb207SmrgCFG and supergraph, with points (and thus potentially exploded nodes) 77*4c3eb207Smrgfor various events, including before individual statements. 78*4c3eb207SmrgBy default the exploded graph merges multiple consecutive statements 79*4c3eb207Smrgin a supernode into one exploded edge to minimize the size of the 80*4c3eb207Smrgexploded graph. This can be suppressed via 81*4c3eb207Smrg@option{-fanalyzer-fine-grained}. 82*4c3eb207SmrgThe fine-grained approach seems to make things simpler and more debuggable 83*4c3eb207Smrgthat other approaches I tried, in that each point is responsible for one 84*4c3eb207Smrgthing. 85*4c3eb207Smrg 86*4c3eb207SmrgProgram points in the analysis also have a "call string" identifying the 87*4c3eb207Smrgstack of callsites below them, so that paths in the exploded graph 88*4c3eb207Smrgcorrespond to interprocedurally valid paths: we always return to the 89*4c3eb207Smrgcorrect call site, propagating state information accordingly. 90*4c3eb207SmrgWe avoid infinite recursion by stopping the analysis if a callsite 91*4c3eb207Smrgappears more than @code{analyzer-max-recursion-depth} in a callstring 92*4c3eb207Smrg(defaulting to 2). 93*4c3eb207Smrg 94*4c3eb207Smrg@subsection Graphs 95*4c3eb207Smrg 96*4c3eb207SmrgNodes and edges in the exploded graph are called ``exploded nodes'' and 97*4c3eb207Smrg``exploded edges'' and often referred to in the code as 98*4c3eb207Smrg@code{enodes} and @code{eedges} (especially when distinguishing them 99*4c3eb207Smrgfrom the @code{snodes} and @code{sedges} in the supergraph). 100*4c3eb207Smrg 101*4c3eb207SmrgEach graph numbers its nodes, giving unique identifiers - supernodes 102*4c3eb207Smrgare referred to throughout dumps in the form @samp{SN': @var{index}} and 103*4c3eb207Smrgexploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and 104*4c3eb207Smrg@samp{EN:29}). 105*4c3eb207Smrg 106*4c3eb207SmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}. 107*4c3eb207Smrg 108*4c3eb207SmrgThe exploded graph can be seen using @option{-fdump-analyzer-exploded-graph} 109*4c3eb207Smrgand other dump options. Exploded nodes are color-coded in the .dot output 110*4c3eb207Smrgbased on state-machine states to make it easier to see state changes at 111*4c3eb207Smrga glance. 112*4c3eb207Smrg 113*4c3eb207Smrg@subsection State Tracking 114*4c3eb207Smrg 115*4c3eb207SmrgThere's a tension between: 116*4c3eb207Smrg@itemize @bullet 117*4c3eb207Smrg@item 118*4c3eb207Smrgprecision of analysis in the straight-line case, vs 119*4c3eb207Smrg@item 120*4c3eb207Smrgexponential blow-up in the face of control flow. 121*4c3eb207Smrg@end itemize 122*4c3eb207Smrg 123*4c3eb207SmrgFor example, in general, given this CFG: 124*4c3eb207Smrg 125*4c3eb207Smrg@smallexample 126*4c3eb207Smrg A 127*4c3eb207Smrg / \ 128*4c3eb207Smrg B C 129*4c3eb207Smrg \ / 130*4c3eb207Smrg D 131*4c3eb207Smrg / \ 132*4c3eb207Smrg E F 133*4c3eb207Smrg \ / 134*4c3eb207Smrg G 135*4c3eb207Smrg@end smallexample 136*4c3eb207Smrg 137*4c3eb207Smrgwe want to avoid differences in state-tracking in B and C from 138*4c3eb207Smrgleading to blow-up. If we don't prevent state blowup, we end up 139*4c3eb207Smrgwith exponential growth of the exploded graph like this: 140*4c3eb207Smrg 141*4c3eb207Smrg@smallexample 142*4c3eb207Smrg 143*4c3eb207Smrg 1:A 144*4c3eb207Smrg / \ 145*4c3eb207Smrg / \ 146*4c3eb207Smrg / \ 147*4c3eb207Smrg 2:B 3:C 148*4c3eb207Smrg | | 149*4c3eb207Smrg 4:D 5:D (2 exploded nodes for D) 150*4c3eb207Smrg / \ / \ 151*4c3eb207Smrg 6:E 7:F 8:E 9:F 152*4c3eb207Smrg | | | | 153*4c3eb207Smrg 10:G 11:G 12:G 13:G (4 exploded nodes for G) 154*4c3eb207Smrg 155*4c3eb207Smrg@end smallexample 156*4c3eb207Smrg 157*4c3eb207SmrgSimilar issues arise with loops. 158*4c3eb207Smrg 159*4c3eb207SmrgTo prevent this, we follow various approaches: 160*4c3eb207Smrg 161*4c3eb207Smrg@enumerate a 162*4c3eb207Smrg@item 163*4c3eb207Smrgstate pruning: which tries to discard state that won't be relevant 164*4c3eb207Smrglater on withing the function. 165*4c3eb207SmrgThis can be disabled via @option{-fno-analyzer-state-purge}. 166*4c3eb207Smrg 167*4c3eb207Smrg@item 168*4c3eb207Smrgstate merging. We can try to find the commonality between two 169*4c3eb207Smrgprogram_state instances to make a third, simpler program_state. 170*4c3eb207SmrgWe have two strategies here: 171*4c3eb207Smrg 172*4c3eb207Smrg @enumerate 173*4c3eb207Smrg @item 174*4c3eb207Smrg the worklist keeps new nodes for the same program_point together, 175*4c3eb207Smrg and tries to merge them before processing, and thus before they have 176*4c3eb207Smrg successors. Hence, in the above, the two nodes for D (4 and 5) reach 177*4c3eb207Smrg the front of the worklist together, and we create a node for D with 178*4c3eb207Smrg the merger of the incoming states. 179*4c3eb207Smrg 180*4c3eb207Smrg @item 181*4c3eb207Smrg try merging with the state of existing enodes for the program_point 182*4c3eb207Smrg (which may have already been explored). There will be duplication, 183*4c3eb207Smrg but only one set of duplication; subsequent duplicates are more likely 184*4c3eb207Smrg to hit the cache. In particular, (hopefully) all merger chains are 185*4c3eb207Smrg finite, and so we guarantee termination. 186*4c3eb207Smrg This is intended to help with loops: we ought to explore the first 187*4c3eb207Smrg iteration, and then have a "subsequent iterations" exploration, 188*4c3eb207Smrg which uses a state merged from that of the first, to be more abstract. 189*4c3eb207Smrg @end enumerate 190*4c3eb207Smrg 191*4c3eb207SmrgWe avoid merging pairs of states that have state-machine differences, 192*4c3eb207Smrgas these are the kinds of differences that are likely to be most 193*4c3eb207Smrginteresting. So, for example, given: 194*4c3eb207Smrg 195*4c3eb207Smrg@smallexample 196*4c3eb207Smrg if (condition) 197*4c3eb207Smrg ptr = malloc (size); 198*4c3eb207Smrg else 199*4c3eb207Smrg ptr = local_buf; 200*4c3eb207Smrg 201*4c3eb207Smrg .... do things with 'ptr' 202*4c3eb207Smrg 203*4c3eb207Smrg if (condition) 204*4c3eb207Smrg free (ptr); 205*4c3eb207Smrg 206*4c3eb207Smrg ...etc 207*4c3eb207Smrg@end smallexample 208*4c3eb207Smrg 209*4c3eb207Smrgthen we end up with an exploded graph that looks like this: 210*4c3eb207Smrg 211*4c3eb207Smrg@smallexample 212*4c3eb207Smrg 213*4c3eb207Smrg if (condition) 214*4c3eb207Smrg / T \ F 215*4c3eb207Smrg --------- ---------- 216*4c3eb207Smrg / \ 217*4c3eb207Smrg ptr = malloc (size) ptr = local_buf 218*4c3eb207Smrg | | 219*4c3eb207Smrg copy of copy of 220*4c3eb207Smrg "do things with 'ptr'" "do things with 'ptr'" 221*4c3eb207Smrg with ptr: heap-allocated with ptr: stack-allocated 222*4c3eb207Smrg | | 223*4c3eb207Smrg if (condition) if (condition) 224*4c3eb207Smrg | known to be T | known to be F 225*4c3eb207Smrg free (ptr); | 226*4c3eb207Smrg \ / 227*4c3eb207Smrg ----------------------------- 228*4c3eb207Smrg | ('ptr' is pruned, so states can be merged) 229*4c3eb207Smrg etc 230*4c3eb207Smrg 231*4c3eb207Smrg@end smallexample 232*4c3eb207Smrg 233*4c3eb207Smrgwhere some duplication has occurred, but only for the places where the 234*4c3eb207Smrgthe different paths are worth exploringly separately. 235*4c3eb207Smrg 236*4c3eb207SmrgMerging can be disabled via @option{-fno-analyzer-state-merge}. 237*4c3eb207Smrg@end enumerate 238*4c3eb207Smrg 239*4c3eb207Smrg@subsection Region Model 240*4c3eb207Smrg 241*4c3eb207SmrgPart of the state stored at a @code{exploded_node} is a @code{region_model}. 242*4c3eb207SmrgThis is an implementation of the region-based ternary model described in 243*4c3eb207Smrg@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf, 244*4c3eb207Smrg"A Memory Model for Static Analysis of C Programs"} 245*4c3eb207Smrg(Zhongxing Xu, Ted Kremenek, and Jian Zhang). 246*4c3eb207Smrg 247*4c3eb207SmrgA @code{region_model} encapsulates a representation of the state of 248*4c3eb207Smrgmemory, with a tree of @code{region} instances, along with their associated 249*4c3eb207Smrgvalues. The representation is graph-like because values can be pointers 250*4c3eb207Smrgto regions. It also stores a constraint_manager, capturing relationships 251*4c3eb207Smrgbetween the values. 252*4c3eb207Smrg 253*4c3eb207SmrgBecause each node in the @code{exploded_graph} has a @code{region_model}, 254*4c3eb207Smrgand each of the latter is graph-like, the @code{exploded_graph} is in some 255*4c3eb207Smrgways a graph of graphs. 256*4c3eb207Smrg 257*4c3eb207SmrgHere's an example of printing a @code{region_model}, showing the ASCII-art 258*4c3eb207Smrgused to visualize the region hierarchy (colorized when printing to stderr): 259*4c3eb207Smrg 260*4c3eb207Smrg@smallexample 261*4c3eb207Smrg(gdb) call debug (*this) 262*4c3eb207Smrgr0: @{kind: 'root', parent: null, sval: null@} 263*4c3eb207Smrg|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@} 264*4c3eb207Smrg| |: sval: sv1: @{poisoned: uninit@} 265*4c3eb207Smrg| |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@} 266*4c3eb207Smrg| | `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@} 267*4c3eb207Smrg| | |: sval: sv3: @{type: 'void *', unknown@} 268*4c3eb207Smrg| | |: type: 'void *' 269*4c3eb207Smrg| `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@} 270*4c3eb207Smrg| |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} 271*4c3eb207Smrg| | |: sval: sv4: @{type: 'void *', &r6@} 272*4c3eb207Smrg| | |: type: 'void *' 273*4c3eb207Smrg| |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} 274*4c3eb207Smrg| | |: sval: sv4: @{type: 'void *', &r6@} 275*4c3eb207Smrg| | |: type: 'void *' 276*4c3eb207Smrg| `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} 277*4c3eb207Smrg| |: sval: sv4: @{type: 'void *', &r6@} 278*4c3eb207Smrg| |: type: 'void *' 279*4c3eb207Smrg`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@} 280*4c3eb207Smrg |: sval: sv2: @{poisoned: uninit@} 281*4c3eb207Smrg `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@} 282*4c3eb207Smrgsvalues: 283*4c3eb207Smrg sv0: @{type: 'size_t', '1024'@} 284*4c3eb207Smrg sv1: @{poisoned: uninit@} 285*4c3eb207Smrg sv2: @{poisoned: uninit@} 286*4c3eb207Smrg sv3: @{type: 'void *', unknown@} 287*4c3eb207Smrg sv4: @{type: 'void *', &r6@} 288*4c3eb207Smrgconstraint manager: 289*4c3eb207Smrg equiv classes: 290*4c3eb207Smrg ec0: @{sv0 == '1024'@} 291*4c3eb207Smrg ec1: @{sv4@} 292*4c3eb207Smrg constraints: 293*4c3eb207Smrg@end smallexample 294*4c3eb207Smrg 295*4c3eb207SmrgThis is the state at the point of returning from @code{calls_malloc} back 296*4c3eb207Smrgto @code{test} in the following: 297*4c3eb207Smrg 298*4c3eb207Smrg@smallexample 299*4c3eb207Smrgvoid * 300*4c3eb207Smrgcalls_malloc (void) 301*4c3eb207Smrg@{ 302*4c3eb207Smrg void *result = malloc (1024); 303*4c3eb207Smrg return result; 304*4c3eb207Smrg@} 305*4c3eb207Smrg 306*4c3eb207Smrgvoid test (void) 307*4c3eb207Smrg@{ 308*4c3eb207Smrg void *ptr = calls_malloc (); 309*4c3eb207Smrg /* etc. */ 310*4c3eb207Smrg@} 311*4c3eb207Smrg@end smallexample 312*4c3eb207Smrg 313*4c3eb207SmrgThe ``root'' region (``r0'') has a ``stack'' child (``r1''), with two 314*4c3eb207Smrgchildren: a frame for @code{test} (``r2''), and a frame for 315*4c3eb207Smrg@code{calls_malloc} (``r4''). These frame regions have child regions for 316*4c3eb207Smrgstoring their local variables. For example, the return region 317*4c3eb207Smrgand that of various other regions within the ``calls_malloc'' frame all have 318*4c3eb207Smrgvalue ``sv4'', a pointer to a heap-allocated region ``r6''. Within the parent 319*4c3eb207Smrgframe, @code{ptr_3} has value ``sv3'', an unknown @code{void *}. 320*4c3eb207Smrg 321*4c3eb207Smrg@subsection Analyzer Paths 322*4c3eb207Smrg 323*4c3eb207SmrgWe need to explain to the user what the problem is, and to persuade them 324*4c3eb207Smrgthat there really is a problem. Hence having a @code{diagnostic_path} 325*4c3eb207Smrgisn't just an incidental detail of the analyzer; it's required. 326*4c3eb207Smrg 327*4c3eb207SmrgPaths ought to be: 328*4c3eb207Smrg@itemize @bullet 329*4c3eb207Smrg@item 330*4c3eb207Smrginterprocedurally-valid 331*4c3eb207Smrg@item 332*4c3eb207Smrgfeasible 333*4c3eb207Smrg@end itemize 334*4c3eb207Smrg 335*4c3eb207SmrgWithout state-merging, all paths in the exploded graph are feasible 336*4c3eb207Smrg(in terms of constraints being satisified). 337*4c3eb207SmrgWith state-merging, paths in the exploded graph can be infeasible. 338*4c3eb207Smrg 339*4c3eb207SmrgWe collate warnings and only emit them for the simplest path 340*4c3eb207Smrge.g. for a bug in a utility function, with lots of routes to calling it, 341*4c3eb207Smrgwe only emit the simplest path (which could be intraprocedural, if 342*4c3eb207Smrgit can be reproduced without a caller). We apply a check that 343*4c3eb207Smrgeach duplicate warning's shortest path is feasible, rejecting any 344*4c3eb207Smrgwarnings for which the shortest path is infeasible (which could lead to 345*4c3eb207Smrgfalse negatives). 346*4c3eb207Smrg 347*4c3eb207SmrgWe use the shortest feasible @code{exploded_path} through the 348*4c3eb207Smrg@code{exploded_graph} (a list of @code{exploded_edge *}) to build a 349*4c3eb207Smrg@code{diagnostic_path} (a list of events for the diagnostic subsystem) - 350*4c3eb207Smrgspecifically a @code{checker_path}. 351*4c3eb207Smrg 352*4c3eb207SmrgHaving built the @code{checker_path}, we prune it to try to eliminate 353*4c3eb207Smrgevents that aren't relevant, to minimize how much the user has to read. 354*4c3eb207Smrg 355*4c3eb207SmrgAfter pruning, we notify each event in the path of its ID and record the 356*4c3eb207SmrgIDs of interesting events, allowing for events to refer to other events 357*4c3eb207Smrgin their descriptions. The @code{pending_diagnostic} class has various 358*4c3eb207Smrgvfuncs to support emitting more precise descriptions, so that e.g. 359*4c3eb207Smrg 360*4c3eb207Smrg@itemize @bullet 361*4c3eb207Smrg@item 362*4c3eb207Smrga deref-of-unchecked-malloc diagnostic might use: 363*4c3eb207Smrg@smallexample 364*4c3eb207Smrg returning possibly-NULL pointer to 'make_obj' from 'allocator' 365*4c3eb207Smrg@end smallexample 366*4c3eb207Smrgfor a @code{return_event} to make it clearer how the unchecked value moves 367*4c3eb207Smrgfrom callee back to caller 368*4c3eb207Smrg@item 369*4c3eb207Smrga double-free diagnostic might use: 370*4c3eb207Smrg@smallexample 371*4c3eb207Smrg second 'free' here; first 'free' was at (3) 372*4c3eb207Smrg@end smallexample 373*4c3eb207Smrgand a use-after-free might use 374*4c3eb207Smrg@smallexample 375*4c3eb207Smrg use after 'free' here; memory was freed at (2) 376*4c3eb207Smrg@end smallexample 377*4c3eb207Smrg@end itemize 378*4c3eb207Smrg 379*4c3eb207SmrgAt this point we can emit the diagnostic. 380*4c3eb207Smrg 381*4c3eb207Smrg@subsection Limitations 382*4c3eb207Smrg 383*4c3eb207Smrg@itemize @bullet 384*4c3eb207Smrg@item 385*4c3eb207SmrgOnly for C so far 386*4c3eb207Smrg@item 387*4c3eb207SmrgThe implementation of call summaries is currently very simplistic. 388*4c3eb207Smrg@item 389*4c3eb207SmrgLack of function pointer analysis 390*4c3eb207Smrg@item 391*4c3eb207SmrgThe constraint-handling code assumes reflexivity in some places 392*4c3eb207Smrg(that values are equal to themselves), which is not the case for NaN. 393*4c3eb207SmrgAs a simple workaround, constraints on floating-point values are 394*4c3eb207Smrgcurrently ignored. 395*4c3eb207Smrg@item 396*4c3eb207SmrgThe region model code creates lots of little mutable objects at each 397*4c3eb207Smrg@code{region_model} (and thus per @code{exploded_node}) rather than 398*4c3eb207Smrgsharing immutable objects and having the mutable state in the 399*4c3eb207Smrg@code{program_state} or @code{region_model}. The latter approach might be 400*4c3eb207Smrgmore efficient, and might avoid dealing with IDs rather than pointers 401*4c3eb207Smrg(which requires us to impose an ordering to get meaningful equality). 402*4c3eb207Smrg@item 403*4c3eb207SmrgThe region model code doesn't yet support @code{memcpy}. At the 404*4c3eb207Smrggimple-ssa level these have been optimized to statements like this: 405*4c3eb207Smrg@smallexample 406*4c3eb207Smrg_10 = MEM <long unsigned int> [(char * @{ref-all@})&c] 407*4c3eb207SmrgMEM <long unsigned int> [(char * @{ref-all@})&d] = _10; 408*4c3eb207Smrg@end smallexample 409*4c3eb207SmrgPerhaps they could be supported via a new @code{compound_svalue} type. 410*4c3eb207Smrg@item 411*4c3eb207SmrgThere are various other limitations in the region model (grep for TODO/xfail 412*4c3eb207Smrgin the testsuite). 413*4c3eb207Smrg@item 414*4c3eb207SmrgThe constraint_manager's implementation of transitivity is currently too 415*4c3eb207Smrgexpensive to enable by default and so must be manually enabled via 416*4c3eb207Smrg@option{-fanalyzer-transitivity}). 417*4c3eb207Smrg@item 418*4c3eb207SmrgThe checkers are currently hardcoded and don't allow for user extensibility 419*4c3eb207Smrg(e.g. adding allocate/release pairs). 420*4c3eb207Smrg@item 421*4c3eb207SmrgAlthough the analyzer's test suite has a proof-of-concept test case for 422*4c3eb207SmrgLTO, LTO support hasn't had extensive testing. There are various 423*4c3eb207Smrglang-specific things in the analyzer that assume C rather than LTO. 424*4c3eb207SmrgFor example, SSA names are printed to the user in ``raw'' form, rather 425*4c3eb207Smrgthan printing the underlying variable name. 426*4c3eb207Smrg@end itemize 427*4c3eb207Smrg 428*4c3eb207SmrgSome ideas for other checkers 429*4c3eb207Smrg@itemize @bullet 430*4c3eb207Smrg@item 431*4c3eb207SmrgFile-descriptor-based APIs 432*4c3eb207Smrg@item 433*4c3eb207SmrgLinux kernel internal APIs 434*4c3eb207Smrg@item 435*4c3eb207SmrgSignal handling 436*4c3eb207Smrg@end itemize 437*4c3eb207Smrg 438*4c3eb207Smrg@node Debugging the Analyzer 439*4c3eb207Smrg@section Debugging the Analyzer 440*4c3eb207Smrg@cindex analyzer, debugging 441*4c3eb207Smrg@cindex static analyzer, debugging 442*4c3eb207Smrg 443*4c3eb207Smrg@subsection Special Functions for Debugging the Analyzer 444*4c3eb207Smrg 445*4c3eb207SmrgThe analyzer recognizes various special functions by name, for use 446*4c3eb207Smrgin debugging the analyzer. Declarations can be seen in the testsuite 447*4c3eb207Smrgin @file{analyzer-decls.h}. None of these functions are actually 448*4c3eb207Smrgimplemented. 449*4c3eb207Smrg 450*4c3eb207SmrgAdd: 451*4c3eb207Smrg@smallexample 452*4c3eb207Smrg __analyzer_break (); 453*4c3eb207Smrg@end smallexample 454*4c3eb207Smrgto the source being analyzed to trigger a breakpoint in the analyzer when 455*4c3eb207Smrgthat source is reached. By putting a series of these in the source, it's 456*4c3eb207Smrgmuch easier to effectively step through the program state as it's analyzed. 457*4c3eb207Smrg 458*4c3eb207Smrg@smallexample 459*4c3eb207Smrg__analyzer_dump (); 460*4c3eb207Smrg@end smallexample 461*4c3eb207Smrg 462*4c3eb207Smrgwill dump the copious information about the analyzer's state each time it 463*4c3eb207Smrgreaches the call in its traversal of the source. 464*4c3eb207Smrg 465*4c3eb207Smrg@smallexample 466*4c3eb207Smrg__analyzer_dump_path (); 467*4c3eb207Smrg@end smallexample 468*4c3eb207Smrg 469*4c3eb207Smrgwill emit a placeholder ``note'' diagnostic with a path to that call site, 470*4c3eb207Smrgif the analyzer finds a feasible path to it. 471*4c3eb207Smrg 472*4c3eb207SmrgThe builtin @code{__analyzer_dump_exploded_nodes} will emit a warning 473*4c3eb207Smrgafter analysis containing information on all of the exploded nodes at that 474*4c3eb207Smrgprogram point: 475*4c3eb207Smrg 476*4c3eb207Smrg@smallexample 477*4c3eb207Smrg __analyzer_dump_exploded_nodes (0); 478*4c3eb207Smrg@end smallexample 479*4c3eb207Smrg 480*4c3eb207Smrgwill output the number of ``processed'' nodes, and the IDs of 481*4c3eb207Smrgboth ``processed'' and ``merger'' nodes, such as: 482*4c3eb207Smrg 483*4c3eb207Smrg@smallexample 484*4c3eb207Smrgwarning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59] 485*4c3eb207Smrg@end smallexample 486*4c3eb207Smrg 487*4c3eb207SmrgWith a non-zero argument 488*4c3eb207Smrg 489*4c3eb207Smrg@smallexample 490*4c3eb207Smrg __analyzer_dump_exploded_nodes (1); 491*4c3eb207Smrg@end smallexample 492*4c3eb207Smrg 493*4c3eb207Smrgit will also dump all of the states within the ``processed'' nodes. 494*4c3eb207Smrg 495*4c3eb207Smrg@smallexample 496*4c3eb207Smrg __analyzer_dump_region_model (); 497*4c3eb207Smrg@end smallexample 498*4c3eb207Smrgwill dump the region_model's state to stderr. 499*4c3eb207Smrg 500*4c3eb207Smrg@smallexample 501*4c3eb207Smrg__analyzer_eval (expr); 502*4c3eb207Smrg@end smallexample 503*4c3eb207Smrgwill emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the 504*4c3eb207Smrgtruthfulness of the argument. This is useful for writing DejaGnu tests. 505*4c3eb207Smrg 506*4c3eb207Smrg 507*4c3eb207Smrg@subsection Other Debugging Techniques 508*4c3eb207Smrg 509*4c3eb207SmrgOne approach when tracking down where a particular bogus state is 510*4c3eb207Smrgintroduced into the @code{exploded_graph} is to add custom code to 511*4c3eb207Smrg@code{region_model::validate}. 512*4c3eb207Smrg 513*4c3eb207SmrgFor example, this custom code (added to @code{region_model::validate}) 514*4c3eb207Smrgbreaks with an assertion failure when a variable called @code{ptr} 515*4c3eb207Smrgacquires a value that's unknown, using 516*4c3eb207Smrg@code{region_model::get_value_by_name} to locate the variable 517*4c3eb207Smrg 518*4c3eb207Smrg@smallexample 519*4c3eb207Smrg /* Find a variable matching "ptr". */ 520*4c3eb207Smrg svalue_id sid = get_value_by_name ("ptr"); 521*4c3eb207Smrg if (!sid.null_p ()) 522*4c3eb207Smrg @{ 523*4c3eb207Smrg svalue *sval = get_svalue (sid); 524*4c3eb207Smrg gcc_assert (sval->get_kind () != SK_UNKNOWN); 525*4c3eb207Smrg @} 526*4c3eb207Smrg@end smallexample 527*4c3eb207Smrg 528*4c3eb207Smrgmaking it easier to investigate further in a debugger when this occurs. 529