xref: /netbsd-src/external/apache2/llvm/dist/clang/lib/StaticAnalyzer/README.txt (revision 7330f729ccf0bd976a06f95fad452fe774fc7fd1)
1*7330f729Sjoerg//===----------------------------------------------------------------------===//
2*7330f729Sjoerg// Clang Static Analyzer
3*7330f729Sjoerg//===----------------------------------------------------------------------===//
4*7330f729Sjoerg
5*7330f729Sjoerg= Library Structure =
6*7330f729Sjoerg
7*7330f729SjoergThe analyzer library has two layers: a (low-level) static analysis
8*7330f729Sjoergengine (GRExprEngine.cpp and friends), and some static checkers
9*7330f729Sjoerg(*Checker.cpp).  The latter are built on top of the former via the
10*7330f729SjoergChecker and CheckerVisitor interfaces (Checker.h and
11*7330f729SjoergCheckerVisitor.h).  The Checker interface is designed to be minimal
12*7330f729Sjoergand simple for checker writers, and attempts to isolate them from much
13*7330f729Sjoergof the gore of the internal analysis engine.
14*7330f729Sjoerg
15*7330f729Sjoerg= How It Works =
16*7330f729Sjoerg
17*7330f729SjoergThe analyzer is inspired by several foundational research papers ([1],
18*7330f729Sjoerg[2]).  (FIXME: kremenek to add more links)
19*7330f729Sjoerg
20*7330f729SjoergIn a nutshell, the analyzer is basically a source code simulator that
21*7330f729Sjoergtraces out possible paths of execution.  The state of the program
22*7330f729Sjoerg(values of variables and expressions) is encapsulated by the state
23*7330f729Sjoerg(ProgramState).  A location in the program is called a program point
24*7330f729Sjoerg(ProgramPoint), and the combination of state and program point is a
25*7330f729Sjoergnode in an exploded graph (ExplodedGraph).  The term "exploded" comes
26*7330f729Sjoergfrom exploding the control-flow edges in the control-flow graph (CFG).
27*7330f729Sjoerg
28*7330f729SjoergConceptually the analyzer does a reachability analysis through the
29*7330f729SjoergExplodedGraph.  We start at a root node, which has the entry program
30*7330f729Sjoergpoint and initial state, and then simulate transitions by analyzing
31*7330f729Sjoergindividual expressions.  The analysis of an expression can cause the
32*7330f729Sjoergstate to change, resulting in a new node in the ExplodedGraph with an
33*7330f729Sjoergupdated program point and an updated state.  A bug is found by hitting
34*7330f729Sjoerga node that satisfies some "bug condition" (basically a violation of a
35*7330f729Sjoergchecking invariant).
36*7330f729Sjoerg
37*7330f729SjoergThe analyzer traces out multiple paths by reasoning about branches and
38*7330f729Sjoergthen bifurcating the state: on the true branch the conditions of the
39*7330f729Sjoergbranch are assumed to be true and on the false branch the conditions
40*7330f729Sjoergof the branch are assumed to be false.  Such "assumptions" create
41*7330f729Sjoergconstraints on the values of the program, and those constraints are
42*7330f729Sjoergrecorded in the ProgramState object (and are manipulated by the
43*7330f729SjoergConstraintManager).  If assuming the conditions of a branch would
44*7330f729Sjoergcause the constraints to be unsatisfiable, the branch is considered
45*7330f729Sjoerginfeasible and that path is not taken.  This is how we get
46*7330f729Sjoergpath-sensitivity.  We reduce exponential blow-up by caching nodes.  If
47*7330f729Sjoerga new node with the same state and program point as an existing node
48*7330f729Sjoergwould get generated, the path "caches out" and we simply reuse the
49*7330f729Sjoergexisting node.  Thus the ExplodedGraph is not a DAG; it can contain
50*7330f729Sjoergcycles as paths loop back onto each other and cache out.
51*7330f729Sjoerg
52*7330f729SjoergProgramState and ExplodedNodes are basically immutable once created.  Once
53*7330f729Sjoergone creates a ProgramState, you need to create a new one to get a new
54*7330f729SjoergProgramState.  This immutability is key since the ExplodedGraph represents
55*7330f729Sjoergthe behavior of the analyzed program from the entry point.  To
56*7330f729Sjoergrepresent these efficiently, we use functional data structures (e.g.,
57*7330f729SjoergImmutableMaps) which share data between instances.
58*7330f729Sjoerg
59*7330f729SjoergFinally, individual Checkers work by also manipulating the analysis
60*7330f729Sjoergstate.  The analyzer engine talks to them via a visitor interface.
61*7330f729SjoergFor example, the PreVisitCallExpr() method is called by GRExprEngine
62*7330f729Sjoergto tell the Checker that we are about to analyze a CallExpr, and the
63*7330f729Sjoergchecker is asked to check for any preconditions that might not be
64*7330f729Sjoergsatisfied.  The checker can do nothing, or it can generate a new
65*7330f729SjoergProgramState and ExplodedNode which contains updated checker state.  If it
66*7330f729Sjoergfinds a bug, it can tell the BugReporter object about the bug,
67*7330f729Sjoergproviding it an ExplodedNode which is the last node in the path that
68*7330f729Sjoergtriggered the problem.
69*7330f729Sjoerg
70*7330f729Sjoerg= Notes about C++ =
71*7330f729Sjoerg
72*7330f729SjoergSince now constructors are seen before the variable that is constructed
73*7330f729Sjoergin the CFG, we create a temporary object as the destination region that
74*7330f729Sjoergis constructed into. See ExprEngine::VisitCXXConstructExpr().
75*7330f729Sjoerg
76*7330f729SjoergIn ExprEngine::processCallExit(), we always bind the object region to the
77*7330f729Sjoergevaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
78*7330f729Sjoergcorresponding lazy compound value if the variable is not a reference, and
79*7330f729Sjoergbind the variable region to the lazy compound value. If the variable
80*7330f729Sjoergis a reference, just use the object region as the initializer value.
81*7330f729Sjoerg
82*7330f729SjoergBefore entering a C++ method (or ctor/dtor), the 'this' region is bound
83*7330f729Sjoergto the object region. In ctors, we synthesize 'this' region with
84*7330f729SjoergCXXRecordDecl*, which means we do not use type qualifiers. In methods, we
85*7330f729Sjoergsynthesize 'this' region with CXXMethodDecl*, which has getThisType()
86*7330f729Sjoergtaking type qualifiers into account. It does not matter we use qualified
87*7330f729Sjoerg'this' region in one method and unqualified 'this' region in another
88*7330f729Sjoergmethod, because we only need to ensure the 'this' region is consistent
89*7330f729Sjoergwhen we synthesize it and create it directly from CXXThisExpr in a single
90*7330f729Sjoergmethod call.
91*7330f729Sjoerg
92*7330f729Sjoerg= Working on the Analyzer =
93*7330f729Sjoerg
94*7330f729SjoergIf you are interested in bringing up support for C++ expressions, the
95*7330f729Sjoergbest place to look is the visitation logic in GRExprEngine, which
96*7330f729Sjoerghandles the simulation of individual expressions.  There are plenty of
97*7330f729Sjoergexamples there of how other expressions are handled.
98*7330f729Sjoerg
99*7330f729SjoergIf you are interested in writing checkers, look at the Checker and
100*7330f729SjoergCheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look
101*7330f729Sjoergat the files named *Checker.cpp for examples on how you can implement
102*7330f729Sjoergthese interfaces.
103*7330f729Sjoerg
104*7330f729Sjoerg= Debugging the Analyzer =
105*7330f729Sjoerg
106*7330f729SjoergThere are some useful command-line options for debugging.  For example:
107*7330f729Sjoerg
108*7330f729Sjoerg$ clang -cc1 -help | grep analyze
109*7330f729Sjoerg -analyze-function <value>
110*7330f729Sjoerg -analyzer-display-progress
111*7330f729Sjoerg -analyzer-viz-egraph-graphviz
112*7330f729Sjoerg ...
113*7330f729Sjoerg
114*7330f729SjoergThe first allows you to specify only analyzing a specific function.
115*7330f729SjoergThe second prints to the console what function is being analyzed.  The
116*7330f729Sjoergthird generates a graphviz dot file of the ExplodedGraph.  This is
117*7330f729Sjoergextremely useful when debugging the analyzer and viewing the
118*7330f729Sjoergsimulation results.
119*7330f729Sjoerg
120*7330f729SjoergOf course, viewing the CFG (Control-Flow Graph) is also useful:
121*7330f729Sjoerg
122*7330f729Sjoerg$ clang -cc1 -help | grep cfg
123*7330f729Sjoerg -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
124*7330f729Sjoerg -cfg-add-initializers   Add C++ initializers to CFGs for all analyses
125*7330f729Sjoerg -cfg-dump               Display Control-Flow Graphs
126*7330f729Sjoerg -cfg-view               View Control-Flow Graphs using GraphViz
127*7330f729Sjoerg -unoptimized-cfg        Generate unoptimized CFGs for all analyses
128*7330f729Sjoerg
129*7330f729Sjoerg-cfg-dump dumps a textual representation of the CFG to the console,
130*7330f729Sjoergand -cfg-view creates a GraphViz representation.
131*7330f729Sjoerg
132*7330f729Sjoerg= References =
133*7330f729Sjoerg
134*7330f729Sjoerg[1] Precise interprocedural dataflow analysis via graph reachability,
135*7330f729Sjoerg    T Reps, S Horwitz, and M Sagiv, POPL '95,
136*7330f729Sjoerg    http://portal.acm.org/citation.cfm?id=199462
137*7330f729Sjoerg
138*7330f729Sjoerg[2] A memory model for static analysis of C programs, Z Xu, T
139*7330f729Sjoerg    Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf
140