xref: /openbsd-src/gnu/llvm/clang/www/analyzer/open_projects.html (revision a9ac8606c53d55cee9c3a39778b249c51df111ef)
1e5dd7070Spatrick<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2e5dd7070Spatrick          "http://www.w3.org/TR/html4/strict.dtd">
3e5dd7070Spatrick<html>
4e5dd7070Spatrick<head>
5e5dd7070Spatrick  <title>Open Projects</title>
6e5dd7070Spatrick  <link type="text/css" rel="stylesheet" href="menu.css">
7e5dd7070Spatrick  <link type="text/css" rel="stylesheet" href="content.css">
8e5dd7070Spatrick  <script type="text/javascript" src="scripts/menu.js"></script>
9e5dd7070Spatrick</head>
10e5dd7070Spatrick<body>
11e5dd7070Spatrick
12e5dd7070Spatrick<div id="page">
13e5dd7070Spatrick<!--#include virtual="menu.html.incl"-->
14e5dd7070Spatrick<div id="content">
15e5dd7070Spatrick
16e5dd7070Spatrick<h1>Open Projects</h1>
17e5dd7070Spatrick
18e5dd7070Spatrick<p>This page lists several projects that would boost analyzer's usability and
19e5dd7070Spatrickpower. Most of the projects listed here are infrastructure-related so this list
20e5dd7070Spatrickis an addition to the <a href="potential_checkers.html">potential checkers
21e5dd7070Spatricklist</a>. If you are interested in tackling one of these, please send an email
22e5dd7070Spatrickto the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev
23e5dd7070Spatrickmailing list</a> to notify other members of the community.</p>
24e5dd7070Spatrick
25e5dd7070Spatrick<ul>
26e5dd7070Spatrick  <li>Release checkers from "alpha"
27e5dd7070Spatrick    <p>New checkers which were contributed to the analyzer,
28e5dd7070Spatrick    but have not passed a rigorous evaluation process,
29e5dd7070Spatrick    are committed as "alpha checkers" (from "alpha version"),
30e5dd7070Spatrick    and are not enabled by default.</p>
31e5dd7070Spatrick
32e5dd7070Spatrick    <p>Ideally, only the checkers which are actively being worked on should be in
33e5dd7070Spatrick    "alpha",
34e5dd7070Spatrick    but over the years the development of many of those has stalled.
35e5dd7070Spatrick    Such checkers should either be improved
36e5dd7070Spatrick    up to a point where they can be enabled by default,
37e5dd7070Spatrick    or removed from the analyzer entirely.
38e5dd7070Spatrick
39e5dd7070Spatrick    <ul>
40e5dd7070Spatrick      <li><code>alpha.security.ArrayBound</code> and
41e5dd7070Spatrick      <code>alpha.security.ArrayBoundV2</code>
42e5dd7070Spatrick      <p>Array bounds checking is a desired feature,
43e5dd7070Spatrick      but having an acceptable rate of false positives might not be possible
44e5dd7070Spatrick      without a proper
45e5dd7070Spatrick      <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support.
46e5dd7070Spatrick      Additionally, it might be more promising to perform index checking based on
47e5dd7070Spatrick      <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values.
48e5dd7070Spatrick      <p><i>(Difficulty: Medium)</i></p></p>
49e5dd7070Spatrick      </li>
50e5dd7070Spatrick
51e5dd7070Spatrick      <li><code>alpha.unix.StreamChecker</code>
52e5dd7070Spatrick        <p>A SimpleStreamChecker has been presented in the Building a Checker in 24
53e5dd7070Spatrick        Hours talk
54e5dd7070Spatrick        (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
55e5dd7070Spatrick        <a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p>
56e5dd7070Spatrick
57e5dd7070Spatrick        <p>This alpha checker is an attempt to write a production grade stream checker.
58e5dd7070Spatrick        However, it was found to have an unacceptably high false positive rate.
59e5dd7070Spatrick        One of the found problems was that eagerly splitting the state
60e5dd7070Spatrick        based on whether the system call may fail leads to too many reports.
61e5dd7070Spatrick        A <em>delayed</em> split where the implication is stored in the state
62e5dd7070Spatrick        (similarly to nullability implications in <code>TrustNonnullChecker</code>)
63e5dd7070Spatrick        may produce much better results.</p>
64e5dd7070Spatrick        <p><i>(Difficulty: Medium)</i></p>
65e5dd7070Spatrick      </li>
66e5dd7070Spatrick    </ul>
67e5dd7070Spatrick  </li>
68e5dd7070Spatrick
69e5dd7070Spatrick  <li>Improve C++ support
70e5dd7070Spatrick  <ul>
71e5dd7070Spatrick    <li>Handle construction as part of aggregate initialization.
72e5dd7070Spatrick      <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a>
73e5dd7070Spatrick      are objects that can be brace-initialized without calling a
74e5dd7070Spatrick      constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
75e5dd7070Spatrick      CXXConstructExpr</a></code> does not occur in the AST),
76e5dd7070Spatrick      but potentially calling
77e5dd7070Spatrick      constructors for their fields and base classes
78e5dd7070Spatrick      These
79e5dd7070Spatrick      constructors of sub-objects need to know what object they are constructing.
80e5dd7070Spatrick      Moreover, if the aggregate contains
81e5dd7070Spatrick      references, lifetime extension needs to be properly modeled.
82e5dd7070Spatrick
83e5dd7070Spatrick      One can start untangling this problem by trying to replace the
84e5dd7070Spatrick      current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
85e5dd7070Spatrick      ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
86e5dd7070Spatrick      <code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of
87e5dd7070Spatrick      <code>ExprEngine::VisitCXXConstructExpr()</code>
88e5dd7070Spatrick      with proper support for the feature.
89e5dd7070Spatrick      <p><i>(Difficulty: Medium) </i></p></p>
90e5dd7070Spatrick    </li>
91e5dd7070Spatrick
92e5dd7070Spatrick    <li>Handle array constructors.
93e5dd7070Spatrick      <p>When an array of objects is allocated (say, using the
94e5dd7070Spatrick         <code>operator new[]</code> or defining a stack array),
95e5dd7070Spatrick         constructors for all elements of the array are called.
96e5dd7070Spatrick         We should model (potentially some of) such evaluations,
97e5dd7070Spatrick         and the same applies for destructors called from
98e5dd7070Spatrick         <code>operator delete[]</code>.
99*a9ac8606Spatrick         See tests cases in <a href="https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp</a>.
100e5dd7070Spatrick      </p>
101e5dd7070Spatrick      <p>
102e5dd7070Spatrick      Constructing an array requires invoking multiple (potentially unknown)
103e5dd7070Spatrick      amount of constructors with the same construct-expression.
104e5dd7070Spatrick      Apart from the technical difficulties of juggling program points around
105e5dd7070Spatrick      correctly to avoid accidentally merging paths together, we'll have to
106e5dd7070Spatrick      be a judge on when to exit the loop and how to widen it.
107e5dd7070Spatrick      Given that the constructor is going to be a default constructor,
108e5dd7070Spatrick      a nice 95% solution might be to execute exactly one constructor and
109e5dd7070Spatrick      then default-bind the resulting LazyCompoundVal to the whole array;
110e5dd7070Spatrick      it'll work whenever the default constructor doesn't touch global state
111e5dd7070Spatrick      but only initializes the object to various default values.
112e5dd7070Spatrick      But if, say, we're making an array of strings,
113e5dd7070Spatrick      depending on the implementation you might have to allocate a new buffer
114e5dd7070Spatrick      for each string, and in this case default-binding won't cut it.
115e5dd7070Spatrick      We might want to come up with an auxiliary analysis in order to perform
116e5dd7070Spatrick      widening of these simple loops more precisely.
117e5dd7070Spatrick      </p>
118e5dd7070Spatrick    </li>
119e5dd7070Spatrick
120e5dd7070Spatrick    <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
121e5dd7070Spatrick      <p>Local variables which are returned by values on all return statements
122e5dd7070Spatrick         may be stored directly at the address for the return value,
123e5dd7070Spatrick         eliding the copy or move constructor call.
124e5dd7070Spatrick         Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>.
125e5dd7070Spatrick      </p>
126e5dd7070Spatrick    </li>
127e5dd7070Spatrick
128e5dd7070Spatrick    <li>Handle constructors of lambda captures
129e5dd7070Spatrick      <p>Variables which are captured by value into a lambda require a call to
130e5dd7070Spatrick         a copy constructor.
131e5dd7070Spatrick         This call is not currently modeled.
132e5dd7070Spatrick      </p>
133e5dd7070Spatrick    </li>
134e5dd7070Spatrick
135e5dd7070Spatrick    <li>Handle constructors for default arguments
136e5dd7070Spatrick      <p>Default arguments in C++ are recomputed at every call,
137e5dd7070Spatrick         and are therefore local, and not static, variables.
138*a9ac8606Spatrick         See tests cases in <a href="https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp</a>.
139e5dd7070Spatrick      </p>
140e5dd7070Spatrick      <p>
141e5dd7070Spatrick      Default arguments are annoying because the initializer expression is
142e5dd7070Spatrick      evaluated at the call site but doesn't syntactically belong to the
143e5dd7070Spatrick      caller's AST; instead it belongs to the ParmVarDecl for the default
144e5dd7070Spatrick      parameter. This can lead to situations when the same expression has to
145e5dd7070Spatrick      carry different values simultaneously -
146e5dd7070Spatrick      when multiple instances of the same function are evaluated as part of the
147e5dd7070Spatrick      same full-expression without specifying the default arguments.
148e5dd7070Spatrick      Even simply calling the function twice (not necessarily within the
149e5dd7070Spatrick      same full-expression) may lead to program points agglutinating because
150e5dd7070Spatrick      it's the same expression. There are some nasty test cases already
151e5dd7070Spatrick      in temporaries.cpp (struct DefaultParam and so on). I recommend adding a
152e5dd7070Spatrick      new LocationContext kind specifically to deal with this problem. It'll
153e5dd7070Spatrick      also help you figure out the construction context when you evaluate the
154e5dd7070Spatrick      construct-expression (though you might still need to do some additional
155e5dd7070Spatrick      CFG work to get construction contexts right).
156e5dd7070Spatrick      </p>
157e5dd7070Spatrick    </li>
158e5dd7070Spatrick
159e5dd7070Spatrick    <li>Enhance the modeling of the standard library.
160e5dd7070Spatrick      <p>The analyzer needs a better understanding of STL in order to be more
161e5dd7070Spatrick      useful on C++ codebases.
162e5dd7070Spatrick      While full library modeling is not an easy task,
163e5dd7070Spatrick      large gains can be achieved by supporting only a few cases:
164e5dd7070Spatrick      e.g. calling <code>.length()</code> on an empty
165e5dd7070Spatrick      <code>std::string</code> always yields zero.
166e5dd7070Spatrick    <p><i>(Difficulty: Medium)</i></p><p>
167e5dd7070Spatrick    </li>
168e5dd7070Spatrick
169e5dd7070Spatrick    <li>Enhance CFG to model exception-handling.
170e5dd7070Spatrick      <p>Currently exceptions are treated as "black holes", and exception-handling
171e5dd7070Spatrick      control structures are poorly modeled in order to be conservative.
172e5dd7070Spatrick      This could be improved for both C++ and Objective-C exceptions.
173e5dd7070Spatrick      <p><i>(Difficulty: Hard)</i></p></p>
174e5dd7070Spatrick    </li>
175e5dd7070Spatrick  </ul>
176e5dd7070Spatrick  </li>
177e5dd7070Spatrick
178e5dd7070Spatrick  <li>Core Analyzer Infrastructure
179e5dd7070Spatrick  <ul>
180e5dd7070Spatrick    <li>Handle unions.
181e5dd7070Spatrick      <p>Currently in the analyzer the value of a union is always regarded as
182e5dd7070Spatrick      an unknown.
183e5dd7070Spatrick      This problem was
184e5dd7070Spatrick      previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a>
185e5dd7070Spatrick      on the mailing list, but no solution was implemented.
186e5dd7070Spatrick      <p><i> (Difficulty: Medium) </i></p></p>
187e5dd7070Spatrick    </li>
188e5dd7070Spatrick
189e5dd7070Spatrick    <li>Floating-point support.
190e5dd7070Spatrick      <p>Currently, the analyzer treats all floating-point values as unknown.
191e5dd7070Spatrick      This project would involve adding a new <code>SVal</code> kind
192e5dd7070Spatrick      for constant floats, generalizing the constraint manager to handle floats,
193e5dd7070Spatrick      and auditing existing code to make sure it doesn't
194e5dd7070Spatrick      make incorrect assumptions (most notably, that <code>X == X</code>
195e5dd7070Spatrick      is always true, since it does not hold for <code>NaN</code>).
196e5dd7070Spatrick      <p><i> (Difficulty: Medium)</i></p></p>
197e5dd7070Spatrick    </li>
198e5dd7070Spatrick
199e5dd7070Spatrick    <li>Improved loop execution modeling.
200e5dd7070Spatrick      <p>The analyzer simply unrolls each loop <tt>N</tt> times before
201e5dd7070Spatrick      dropping the path, for a fixed constant <tt>N</tt>.
202e5dd7070Spatrick      However, that results in lost coverage in cases where the loop always
203e5dd7070Spatrick      executes more than <tt>N</tt> times.
204e5dd7070Spatrick      A Google Summer Of Code
205e5dd7070Spatrick      <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a>
206e5dd7070Spatrick      was completed to make the loop bound parameterizable,
207e5dd7070Spatrick      but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a>
208e5dd7070Spatrick      problem still remains open.
209e5dd7070Spatrick
210e5dd7070Spatrick      <p><i> (Difficulty: Hard)</i></p></p>
211e5dd7070Spatrick    </li>
212e5dd7070Spatrick
213e5dd7070Spatrick    <li>Basic function summarization support
214e5dd7070Spatrick      <p>The analyzer performs inter-procedural analysis using
215e5dd7070Spatrick      either inlining or "conservative evaluation" (invalidating all data
216e5dd7070Spatrick      passed to the function).
217e5dd7070Spatrick      Often, a very simple summary
218e5dd7070Spatrick      (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be
219e5dd7070Spatrick      enough to be a large improvement over conservative evaluation.
220e5dd7070Spatrick      Such summaries could be obtained either syntactically,
221e5dd7070Spatrick      or using a dataflow framework.
222e5dd7070Spatrick      <p><i>(Difficulty: Hard)</i></p><p>
223e5dd7070Spatrick    </li>
224e5dd7070Spatrick
225e5dd7070Spatrick    <li>Implement a dataflow flamework.
226e5dd7070Spatrick      <p>The analyzer core
227e5dd7070Spatrick      implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>
228e5dd7070Spatrick      engine, which performs checks
229e5dd7070Spatrick      (use-after-free, uninitialized value read, etc.)
230e5dd7070Spatrick      over a <em>single</em> program path.
231e5dd7070Spatrick      However, many useful properties
232e5dd7070Spatrick      (dead code, check-after-use, etc.) require
233e5dd7070Spatrick      reasoning over <em>all</em> possible in a program.
234e5dd7070Spatrick      Such reasoning requires a
235e5dd7070Spatrick      <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework.
236e5dd7070Spatrick      Clang already implements
237e5dd7070Spatrick      a few dataflow analyses (most notably, liveness),
238e5dd7070Spatrick      but they implemented in an ad-hoc fashion.
239e5dd7070Spatrick      A proper framework would enable us writing many more useful checkers.
240e5dd7070Spatrick      <p><i> (Difficulty: Hard) </i></p></p>
241e5dd7070Spatrick    </li>
242e5dd7070Spatrick
243e5dd7070Spatrick    <li>Track type information through casts more precisely.
244e5dd7070Spatrick      <p>The <code>DynamicTypePropagation</code>
245e5dd7070Spatrick      checker is in charge of inferring a region's
246e5dd7070Spatrick      dynamic type based on what operations the code is performing.
247e5dd7070Spatrick      Casts are a rich source of type information that the analyzer currently ignores.
248e5dd7070Spatrick      <p><i>(Difficulty: Medium)</i></p></p>
249e5dd7070Spatrick    </li>
250e5dd7070Spatrick
251e5dd7070Spatrick  </ul>
252e5dd7070Spatrick  </li>
253e5dd7070Spatrick
254e5dd7070Spatrick  <li>Fixing miscellaneous bugs
255e5dd7070Spatrick    <p>Apart from the open projects listed above,
256e5dd7070Spatrick       contributors are welcome to fix any of the outstanding
257e5dd7070Spatrick       <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a>
258e5dd7070Spatrick       in the Bugzilla.
259e5dd7070Spatrick       <p><i>(Difficulty: Anything)</i></p></p>
260e5dd7070Spatrick  </li>
261e5dd7070Spatrick
262e5dd7070Spatrick</ul>
263e5dd7070Spatrick
264e5dd7070Spatrick</div>
265e5dd7070Spatrick</div>
266e5dd7070Spatrick</body>
267e5dd7070Spatrick</html>
268