1*7330f729Sjoerg//===----------------------------------------------------------------------===// 2*7330f729Sjoerg// Clang Static Analyzer 3*7330f729Sjoerg//===----------------------------------------------------------------------===// 4*7330f729Sjoerg 5*7330f729Sjoerg= Library Structure = 6*7330f729Sjoerg 7*7330f729SjoergThe analyzer library has two layers: a (low-level) static analysis 8*7330f729Sjoergengine (GRExprEngine.cpp and friends), and some static checkers 9*7330f729Sjoerg(*Checker.cpp). The latter are built on top of the former via the 10*7330f729SjoergChecker and CheckerVisitor interfaces (Checker.h and 11*7330f729SjoergCheckerVisitor.h). The Checker interface is designed to be minimal 12*7330f729Sjoergand simple for checker writers, and attempts to isolate them from much 13*7330f729Sjoergof the gore of the internal analysis engine. 14*7330f729Sjoerg 15*7330f729Sjoerg= How It Works = 16*7330f729Sjoerg 17*7330f729SjoergThe analyzer is inspired by several foundational research papers ([1], 18*7330f729Sjoerg[2]). (FIXME: kremenek to add more links) 19*7330f729Sjoerg 20*7330f729SjoergIn a nutshell, the analyzer is basically a source code simulator that 21*7330f729Sjoergtraces out possible paths of execution. The state of the program 22*7330f729Sjoerg(values of variables and expressions) is encapsulated by the state 23*7330f729Sjoerg(ProgramState). A location in the program is called a program point 24*7330f729Sjoerg(ProgramPoint), and the combination of state and program point is a 25*7330f729Sjoergnode in an exploded graph (ExplodedGraph). The term "exploded" comes 26*7330f729Sjoergfrom exploding the control-flow edges in the control-flow graph (CFG). 27*7330f729Sjoerg 28*7330f729SjoergConceptually the analyzer does a reachability analysis through the 29*7330f729SjoergExplodedGraph. We start at a root node, which has the entry program 30*7330f729Sjoergpoint and initial state, and then simulate transitions by analyzing 31*7330f729Sjoergindividual expressions. The analysis of an expression can cause the 32*7330f729Sjoergstate to change, resulting in a new node in the ExplodedGraph with an 33*7330f729Sjoergupdated program point and an updated state. A bug is found by hitting 34*7330f729Sjoerga node that satisfies some "bug condition" (basically a violation of a 35*7330f729Sjoergchecking invariant). 36*7330f729Sjoerg 37*7330f729SjoergThe analyzer traces out multiple paths by reasoning about branches and 38*7330f729Sjoergthen bifurcating the state: on the true branch the conditions of the 39*7330f729Sjoergbranch are assumed to be true and on the false branch the conditions 40*7330f729Sjoergof the branch are assumed to be false. Such "assumptions" create 41*7330f729Sjoergconstraints on the values of the program, and those constraints are 42*7330f729Sjoergrecorded in the ProgramState object (and are manipulated by the 43*7330f729SjoergConstraintManager). If assuming the conditions of a branch would 44*7330f729Sjoergcause the constraints to be unsatisfiable, the branch is considered 45*7330f729Sjoerginfeasible and that path is not taken. This is how we get 46*7330f729Sjoergpath-sensitivity. We reduce exponential blow-up by caching nodes. If 47*7330f729Sjoerga new node with the same state and program point as an existing node 48*7330f729Sjoergwould get generated, the path "caches out" and we simply reuse the 49*7330f729Sjoergexisting node. Thus the ExplodedGraph is not a DAG; it can contain 50*7330f729Sjoergcycles as paths loop back onto each other and cache out. 51*7330f729Sjoerg 52*7330f729SjoergProgramState and ExplodedNodes are basically immutable once created. Once 53*7330f729Sjoergone creates a ProgramState, you need to create a new one to get a new 54*7330f729SjoergProgramState. This immutability is key since the ExplodedGraph represents 55*7330f729Sjoergthe behavior of the analyzed program from the entry point. To 56*7330f729Sjoergrepresent these efficiently, we use functional data structures (e.g., 57*7330f729SjoergImmutableMaps) which share data between instances. 58*7330f729Sjoerg 59*7330f729SjoergFinally, individual Checkers work by also manipulating the analysis 60*7330f729Sjoergstate. The analyzer engine talks to them via a visitor interface. 61*7330f729SjoergFor example, the PreVisitCallExpr() method is called by GRExprEngine 62*7330f729Sjoergto tell the Checker that we are about to analyze a CallExpr, and the 63*7330f729Sjoergchecker is asked to check for any preconditions that might not be 64*7330f729Sjoergsatisfied. The checker can do nothing, or it can generate a new 65*7330f729SjoergProgramState and ExplodedNode which contains updated checker state. If it 66*7330f729Sjoergfinds a bug, it can tell the BugReporter object about the bug, 67*7330f729Sjoergproviding it an ExplodedNode which is the last node in the path that 68*7330f729Sjoergtriggered the problem. 69*7330f729Sjoerg 70*7330f729Sjoerg= Notes about C++ = 71*7330f729Sjoerg 72*7330f729SjoergSince now constructors are seen before the variable that is constructed 73*7330f729Sjoergin the CFG, we create a temporary object as the destination region that 74*7330f729Sjoergis constructed into. See ExprEngine::VisitCXXConstructExpr(). 75*7330f729Sjoerg 76*7330f729SjoergIn ExprEngine::processCallExit(), we always bind the object region to the 77*7330f729Sjoergevaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the 78*7330f729Sjoergcorresponding lazy compound value if the variable is not a reference, and 79*7330f729Sjoergbind the variable region to the lazy compound value. If the variable 80*7330f729Sjoergis a reference, just use the object region as the initializer value. 81*7330f729Sjoerg 82*7330f729SjoergBefore entering a C++ method (or ctor/dtor), the 'this' region is bound 83*7330f729Sjoergto the object region. In ctors, we synthesize 'this' region with 84*7330f729SjoergCXXRecordDecl*, which means we do not use type qualifiers. In methods, we 85*7330f729Sjoergsynthesize 'this' region with CXXMethodDecl*, which has getThisType() 86*7330f729Sjoergtaking type qualifiers into account. It does not matter we use qualified 87*7330f729Sjoerg'this' region in one method and unqualified 'this' region in another 88*7330f729Sjoergmethod, because we only need to ensure the 'this' region is consistent 89*7330f729Sjoergwhen we synthesize it and create it directly from CXXThisExpr in a single 90*7330f729Sjoergmethod call. 91*7330f729Sjoerg 92*7330f729Sjoerg= Working on the Analyzer = 93*7330f729Sjoerg 94*7330f729SjoergIf you are interested in bringing up support for C++ expressions, the 95*7330f729Sjoergbest place to look is the visitation logic in GRExprEngine, which 96*7330f729Sjoerghandles the simulation of individual expressions. There are plenty of 97*7330f729Sjoergexamples there of how other expressions are handled. 98*7330f729Sjoerg 99*7330f729SjoergIf you are interested in writing checkers, look at the Checker and 100*7330f729SjoergCheckerVisitor interfaces (Checker.h and CheckerVisitor.h). Also look 101*7330f729Sjoergat the files named *Checker.cpp for examples on how you can implement 102*7330f729Sjoergthese interfaces. 103*7330f729Sjoerg 104*7330f729Sjoerg= Debugging the Analyzer = 105*7330f729Sjoerg 106*7330f729SjoergThere are some useful command-line options for debugging. For example: 107*7330f729Sjoerg 108*7330f729Sjoerg$ clang -cc1 -help | grep analyze 109*7330f729Sjoerg -analyze-function <value> 110*7330f729Sjoerg -analyzer-display-progress 111*7330f729Sjoerg -analyzer-viz-egraph-graphviz 112*7330f729Sjoerg ... 113*7330f729Sjoerg 114*7330f729SjoergThe first allows you to specify only analyzing a specific function. 115*7330f729SjoergThe second prints to the console what function is being analyzed. The 116*7330f729Sjoergthird generates a graphviz dot file of the ExplodedGraph. This is 117*7330f729Sjoergextremely useful when debugging the analyzer and viewing the 118*7330f729Sjoergsimulation results. 119*7330f729Sjoerg 120*7330f729SjoergOf course, viewing the CFG (Control-Flow Graph) is also useful: 121*7330f729Sjoerg 122*7330f729Sjoerg$ clang -cc1 -help | grep cfg 123*7330f729Sjoerg -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses 124*7330f729Sjoerg -cfg-add-initializers Add C++ initializers to CFGs for all analyses 125*7330f729Sjoerg -cfg-dump Display Control-Flow Graphs 126*7330f729Sjoerg -cfg-view View Control-Flow Graphs using GraphViz 127*7330f729Sjoerg -unoptimized-cfg Generate unoptimized CFGs for all analyses 128*7330f729Sjoerg 129*7330f729Sjoerg-cfg-dump dumps a textual representation of the CFG to the console, 130*7330f729Sjoergand -cfg-view creates a GraphViz representation. 131*7330f729Sjoerg 132*7330f729Sjoerg= References = 133*7330f729Sjoerg 134*7330f729Sjoerg[1] Precise interprocedural dataflow analysis via graph reachability, 135*7330f729Sjoerg T Reps, S Horwitz, and M Sagiv, POPL '95, 136*7330f729Sjoerg http://portal.acm.org/citation.cfm?id=199462 137*7330f729Sjoerg 138*7330f729Sjoerg[2] A memory model for static analysis of C programs, Z Xu, T 139*7330f729Sjoerg Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf 140