1e5dd7070Spatrick<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2e5dd7070Spatrick "http://www.w3.org/TR/html4/strict.dtd"> 3e5dd7070Spatrick<html> 4e5dd7070Spatrick<head> 5e5dd7070Spatrick <title>Open Projects</title> 6e5dd7070Spatrick <link type="text/css" rel="stylesheet" href="menu.css"> 7e5dd7070Spatrick <link type="text/css" rel="stylesheet" href="content.css"> 8e5dd7070Spatrick <script type="text/javascript" src="scripts/menu.js"></script> 9e5dd7070Spatrick</head> 10e5dd7070Spatrick<body> 11e5dd7070Spatrick 12e5dd7070Spatrick<div id="page"> 13e5dd7070Spatrick<!--#include virtual="menu.html.incl"--> 14e5dd7070Spatrick<div id="content"> 15e5dd7070Spatrick 16e5dd7070Spatrick<h1>Open Projects</h1> 17e5dd7070Spatrick 18e5dd7070Spatrick<p>This page lists several projects that would boost analyzer's usability and 19e5dd7070Spatrickpower. Most of the projects listed here are infrastructure-related so this list 20e5dd7070Spatrickis an addition to the <a href="potential_checkers.html">potential checkers 21e5dd7070Spatricklist</a>. If you are interested in tackling one of these, please send an email 22e5dd7070Spatrickto the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev 23e5dd7070Spatrickmailing list</a> to notify other members of the community.</p> 24e5dd7070Spatrick 25e5dd7070Spatrick<ul> 26e5dd7070Spatrick <li>Release checkers from "alpha" 27e5dd7070Spatrick <p>New checkers which were contributed to the analyzer, 28e5dd7070Spatrick but have not passed a rigorous evaluation process, 29e5dd7070Spatrick are committed as "alpha checkers" (from "alpha version"), 30e5dd7070Spatrick and are not enabled by default.</p> 31e5dd7070Spatrick 32e5dd7070Spatrick <p>Ideally, only the checkers which are actively being worked on should be in 33e5dd7070Spatrick "alpha", 34e5dd7070Spatrick but over the years the development of many of those has stalled. 35e5dd7070Spatrick Such checkers should either be improved 36e5dd7070Spatrick up to a point where they can be enabled by default, 37e5dd7070Spatrick or removed from the analyzer entirely. 38e5dd7070Spatrick 39e5dd7070Spatrick <ul> 40e5dd7070Spatrick <li><code>alpha.security.ArrayBound</code> and 41e5dd7070Spatrick <code>alpha.security.ArrayBoundV2</code> 42e5dd7070Spatrick <p>Array bounds checking is a desired feature, 43e5dd7070Spatrick but having an acceptable rate of false positives might not be possible 44e5dd7070Spatrick without a proper 45e5dd7070Spatrick <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support. 46e5dd7070Spatrick Additionally, it might be more promising to perform index checking based on 47e5dd7070Spatrick <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values. 48e5dd7070Spatrick <p><i>(Difficulty: Medium)</i></p></p> 49e5dd7070Spatrick </li> 50e5dd7070Spatrick 51e5dd7070Spatrick <li><code>alpha.unix.StreamChecker</code> 52e5dd7070Spatrick <p>A SimpleStreamChecker has been presented in the Building a Checker in 24 53e5dd7070Spatrick Hours talk 54e5dd7070Spatrick (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a> 55e5dd7070Spatrick <a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p> 56e5dd7070Spatrick 57e5dd7070Spatrick <p>This alpha checker is an attempt to write a production grade stream checker. 58e5dd7070Spatrick However, it was found to have an unacceptably high false positive rate. 59e5dd7070Spatrick One of the found problems was that eagerly splitting the state 60e5dd7070Spatrick based on whether the system call may fail leads to too many reports. 61e5dd7070Spatrick A <em>delayed</em> split where the implication is stored in the state 62e5dd7070Spatrick (similarly to nullability implications in <code>TrustNonnullChecker</code>) 63e5dd7070Spatrick may produce much better results.</p> 64e5dd7070Spatrick <p><i>(Difficulty: Medium)</i></p> 65e5dd7070Spatrick </li> 66e5dd7070Spatrick </ul> 67e5dd7070Spatrick </li> 68e5dd7070Spatrick 69e5dd7070Spatrick <li>Improve C++ support 70e5dd7070Spatrick <ul> 71e5dd7070Spatrick <li>Handle construction as part of aggregate initialization. 72e5dd7070Spatrick <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a> 73e5dd7070Spatrick are objects that can be brace-initialized without calling a 74e5dd7070Spatrick constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html"> 75e5dd7070Spatrick CXXConstructExpr</a></code> does not occur in the AST), 76e5dd7070Spatrick but potentially calling 77e5dd7070Spatrick constructors for their fields and base classes 78e5dd7070Spatrick These 79e5dd7070Spatrick constructors of sub-objects need to know what object they are constructing. 80e5dd7070Spatrick Moreover, if the aggregate contains 81e5dd7070Spatrick references, lifetime extension needs to be properly modeled. 82e5dd7070Spatrick 83e5dd7070Spatrick One can start untangling this problem by trying to replace the 84e5dd7070Spatrick current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html"> 85e5dd7070Spatrick ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430"> 86e5dd7070Spatrick <code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of 87e5dd7070Spatrick <code>ExprEngine::VisitCXXConstructExpr()</code> 88e5dd7070Spatrick with proper support for the feature. 89e5dd7070Spatrick <p><i>(Difficulty: Medium) </i></p></p> 90e5dd7070Spatrick </li> 91e5dd7070Spatrick 92e5dd7070Spatrick <li>Handle array constructors. 93e5dd7070Spatrick <p>When an array of objects is allocated (say, using the 94e5dd7070Spatrick <code>operator new[]</code> or defining a stack array), 95e5dd7070Spatrick constructors for all elements of the array are called. 96e5dd7070Spatrick We should model (potentially some of) such evaluations, 97e5dd7070Spatrick and the same applies for destructors called from 98e5dd7070Spatrick <code>operator delete[]</code>. 99*a9ac8606Spatrick See tests cases in <a href="https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp</a>. 100e5dd7070Spatrick </p> 101e5dd7070Spatrick <p> 102e5dd7070Spatrick Constructing an array requires invoking multiple (potentially unknown) 103e5dd7070Spatrick amount of constructors with the same construct-expression. 104e5dd7070Spatrick Apart from the technical difficulties of juggling program points around 105e5dd7070Spatrick correctly to avoid accidentally merging paths together, we'll have to 106e5dd7070Spatrick be a judge on when to exit the loop and how to widen it. 107e5dd7070Spatrick Given that the constructor is going to be a default constructor, 108e5dd7070Spatrick a nice 95% solution might be to execute exactly one constructor and 109e5dd7070Spatrick then default-bind the resulting LazyCompoundVal to the whole array; 110e5dd7070Spatrick it'll work whenever the default constructor doesn't touch global state 111e5dd7070Spatrick but only initializes the object to various default values. 112e5dd7070Spatrick But if, say, we're making an array of strings, 113e5dd7070Spatrick depending on the implementation you might have to allocate a new buffer 114e5dd7070Spatrick for each string, and in this case default-binding won't cut it. 115e5dd7070Spatrick We might want to come up with an auxiliary analysis in order to perform 116e5dd7070Spatrick widening of these simple loops more precisely. 117e5dd7070Spatrick </p> 118e5dd7070Spatrick </li> 119e5dd7070Spatrick 120e5dd7070Spatrick <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO) 121e5dd7070Spatrick <p>Local variables which are returned by values on all return statements 122e5dd7070Spatrick may be stored directly at the address for the return value, 123e5dd7070Spatrick eliding the copy or move constructor call. 124e5dd7070Spatrick Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>. 125e5dd7070Spatrick </p> 126e5dd7070Spatrick </li> 127e5dd7070Spatrick 128e5dd7070Spatrick <li>Handle constructors of lambda captures 129e5dd7070Spatrick <p>Variables which are captured by value into a lambda require a call to 130e5dd7070Spatrick a copy constructor. 131e5dd7070Spatrick This call is not currently modeled. 132e5dd7070Spatrick </p> 133e5dd7070Spatrick </li> 134e5dd7070Spatrick 135e5dd7070Spatrick <li>Handle constructors for default arguments 136e5dd7070Spatrick <p>Default arguments in C++ are recomputed at every call, 137e5dd7070Spatrick and are therefore local, and not static, variables. 138*a9ac8606Spatrick See tests cases in <a href="https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp</a>. 139e5dd7070Spatrick </p> 140e5dd7070Spatrick <p> 141e5dd7070Spatrick Default arguments are annoying because the initializer expression is 142e5dd7070Spatrick evaluated at the call site but doesn't syntactically belong to the 143e5dd7070Spatrick caller's AST; instead it belongs to the ParmVarDecl for the default 144e5dd7070Spatrick parameter. This can lead to situations when the same expression has to 145e5dd7070Spatrick carry different values simultaneously - 146e5dd7070Spatrick when multiple instances of the same function are evaluated as part of the 147e5dd7070Spatrick same full-expression without specifying the default arguments. 148e5dd7070Spatrick Even simply calling the function twice (not necessarily within the 149e5dd7070Spatrick same full-expression) may lead to program points agglutinating because 150e5dd7070Spatrick it's the same expression. There are some nasty test cases already 151e5dd7070Spatrick in temporaries.cpp (struct DefaultParam and so on). I recommend adding a 152e5dd7070Spatrick new LocationContext kind specifically to deal with this problem. It'll 153e5dd7070Spatrick also help you figure out the construction context when you evaluate the 154e5dd7070Spatrick construct-expression (though you might still need to do some additional 155e5dd7070Spatrick CFG work to get construction contexts right). 156e5dd7070Spatrick </p> 157e5dd7070Spatrick </li> 158e5dd7070Spatrick 159e5dd7070Spatrick <li>Enhance the modeling of the standard library. 160e5dd7070Spatrick <p>The analyzer needs a better understanding of STL in order to be more 161e5dd7070Spatrick useful on C++ codebases. 162e5dd7070Spatrick While full library modeling is not an easy task, 163e5dd7070Spatrick large gains can be achieved by supporting only a few cases: 164e5dd7070Spatrick e.g. calling <code>.length()</code> on an empty 165e5dd7070Spatrick <code>std::string</code> always yields zero. 166e5dd7070Spatrick <p><i>(Difficulty: Medium)</i></p><p> 167e5dd7070Spatrick </li> 168e5dd7070Spatrick 169e5dd7070Spatrick <li>Enhance CFG to model exception-handling. 170e5dd7070Spatrick <p>Currently exceptions are treated as "black holes", and exception-handling 171e5dd7070Spatrick control structures are poorly modeled in order to be conservative. 172e5dd7070Spatrick This could be improved for both C++ and Objective-C exceptions. 173e5dd7070Spatrick <p><i>(Difficulty: Hard)</i></p></p> 174e5dd7070Spatrick </li> 175e5dd7070Spatrick </ul> 176e5dd7070Spatrick </li> 177e5dd7070Spatrick 178e5dd7070Spatrick <li>Core Analyzer Infrastructure 179e5dd7070Spatrick <ul> 180e5dd7070Spatrick <li>Handle unions. 181e5dd7070Spatrick <p>Currently in the analyzer the value of a union is always regarded as 182e5dd7070Spatrick an unknown. 183e5dd7070Spatrick This problem was 184e5dd7070Spatrick previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a> 185e5dd7070Spatrick on the mailing list, but no solution was implemented. 186e5dd7070Spatrick <p><i> (Difficulty: Medium) </i></p></p> 187e5dd7070Spatrick </li> 188e5dd7070Spatrick 189e5dd7070Spatrick <li>Floating-point support. 190e5dd7070Spatrick <p>Currently, the analyzer treats all floating-point values as unknown. 191e5dd7070Spatrick This project would involve adding a new <code>SVal</code> kind 192e5dd7070Spatrick for constant floats, generalizing the constraint manager to handle floats, 193e5dd7070Spatrick and auditing existing code to make sure it doesn't 194e5dd7070Spatrick make incorrect assumptions (most notably, that <code>X == X</code> 195e5dd7070Spatrick is always true, since it does not hold for <code>NaN</code>). 196e5dd7070Spatrick <p><i> (Difficulty: Medium)</i></p></p> 197e5dd7070Spatrick </li> 198e5dd7070Spatrick 199e5dd7070Spatrick <li>Improved loop execution modeling. 200e5dd7070Spatrick <p>The analyzer simply unrolls each loop <tt>N</tt> times before 201e5dd7070Spatrick dropping the path, for a fixed constant <tt>N</tt>. 202e5dd7070Spatrick However, that results in lost coverage in cases where the loop always 203e5dd7070Spatrick executes more than <tt>N</tt> times. 204e5dd7070Spatrick A Google Summer Of Code 205e5dd7070Spatrick <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a> 206e5dd7070Spatrick was completed to make the loop bound parameterizable, 207e5dd7070Spatrick but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a> 208e5dd7070Spatrick problem still remains open. 209e5dd7070Spatrick 210e5dd7070Spatrick <p><i> (Difficulty: Hard)</i></p></p> 211e5dd7070Spatrick </li> 212e5dd7070Spatrick 213e5dd7070Spatrick <li>Basic function summarization support 214e5dd7070Spatrick <p>The analyzer performs inter-procedural analysis using 215e5dd7070Spatrick either inlining or "conservative evaluation" (invalidating all data 216e5dd7070Spatrick passed to the function). 217e5dd7070Spatrick Often, a very simple summary 218e5dd7070Spatrick (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be 219e5dd7070Spatrick enough to be a large improvement over conservative evaluation. 220e5dd7070Spatrick Such summaries could be obtained either syntactically, 221e5dd7070Spatrick or using a dataflow framework. 222e5dd7070Spatrick <p><i>(Difficulty: Hard)</i></p><p> 223e5dd7070Spatrick </li> 224e5dd7070Spatrick 225e5dd7070Spatrick <li>Implement a dataflow flamework. 226e5dd7070Spatrick <p>The analyzer core 227e5dd7070Spatrick implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a> 228e5dd7070Spatrick engine, which performs checks 229e5dd7070Spatrick (use-after-free, uninitialized value read, etc.) 230e5dd7070Spatrick over a <em>single</em> program path. 231e5dd7070Spatrick However, many useful properties 232e5dd7070Spatrick (dead code, check-after-use, etc.) require 233e5dd7070Spatrick reasoning over <em>all</em> possible in a program. 234e5dd7070Spatrick Such reasoning requires a 235e5dd7070Spatrick <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework. 236e5dd7070Spatrick Clang already implements 237e5dd7070Spatrick a few dataflow analyses (most notably, liveness), 238e5dd7070Spatrick but they implemented in an ad-hoc fashion. 239e5dd7070Spatrick A proper framework would enable us writing many more useful checkers. 240e5dd7070Spatrick <p><i> (Difficulty: Hard) </i></p></p> 241e5dd7070Spatrick </li> 242e5dd7070Spatrick 243e5dd7070Spatrick <li>Track type information through casts more precisely. 244e5dd7070Spatrick <p>The <code>DynamicTypePropagation</code> 245e5dd7070Spatrick checker is in charge of inferring a region's 246e5dd7070Spatrick dynamic type based on what operations the code is performing. 247e5dd7070Spatrick Casts are a rich source of type information that the analyzer currently ignores. 248e5dd7070Spatrick <p><i>(Difficulty: Medium)</i></p></p> 249e5dd7070Spatrick </li> 250e5dd7070Spatrick 251e5dd7070Spatrick </ul> 252e5dd7070Spatrick </li> 253e5dd7070Spatrick 254e5dd7070Spatrick <li>Fixing miscellaneous bugs 255e5dd7070Spatrick <p>Apart from the open projects listed above, 256e5dd7070Spatrick contributors are welcome to fix any of the outstanding 257e5dd7070Spatrick <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a> 258e5dd7070Spatrick in the Bugzilla. 259e5dd7070Spatrick <p><i>(Difficulty: Anything)</i></p></p> 260e5dd7070Spatrick </li> 261e5dd7070Spatrick 262e5dd7070Spatrick</ul> 263e5dd7070Spatrick 264e5dd7070Spatrick</div> 265e5dd7070Spatrick</div> 266e5dd7070Spatrick</body> 267e5dd7070Spatrick</html> 268