1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<html> 4<head> 5 <title>Open Projects</title> 6 <link type="text/css" rel="stylesheet" href="menu.css"> 7 <link type="text/css" rel="stylesheet" href="content.css"> 8 <script type="text/javascript" src="scripts/menu.js"></script> 9</head> 10<body> 11 12<div id="page"> 13<!--#include virtual="menu.html.incl"--> 14<div id="content"> 15 16<h1>Open Projects</h1> 17 18<p>This page lists several projects that would boost analyzer's usability and 19power. Most of the projects listed here are infrastructure-related so this list 20is an addition to the <a href="potential_checkers.html">potential checkers 21list</a>. If you are interested in tackling one of these, please send an email 22to the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev 23mailing list</a> to notify other members of the community.</p> 24 25<ul> 26 <li>Release checkers from "alpha" 27 <p>New checkers which were contributed to the analyzer, 28 but have not passed a rigorous evaluation process, 29 are committed as "alpha checkers" (from "alpha version"), 30 and are not enabled by default.</p> 31 32 <p>Ideally, only the checkers which are actively being worked on should be in 33 "alpha", 34 but over the years the development of many of those has stalled. 35 Such checkers should either be improved 36 up to a point where they can be enabled by default, 37 or removed from the analyzer entirely. 38 39 <ul> 40 <li><code>alpha.security.ArrayBound</code> and 41 <code>alpha.security.ArrayBoundV2</code> 42 <p>Array bounds checking is a desired feature, 43 but having an acceptable rate of false positives might not be possible 44 without a proper 45 <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support. 46 Additionally, it might be more promising to perform index checking based on 47 <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values. 48 <p><i>(Difficulty: Medium)</i></p></p> 49 </li> 50 </ul> 51 </li> 52 53 <li>Improve C++ support 54 <ul> 55 <li>Handle construction as part of aggregate initialization. 56 <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a> 57 are objects that can be brace-initialized without calling a 58 constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html"> 59 CXXConstructExpr</a></code> does not occur in the AST), 60 but potentially calling 61 constructors for their fields and base classes 62 These 63 constructors of sub-objects need to know what object they are constructing. 64 Moreover, if the aggregate contains 65 references, lifetime extension needs to be properly modeled. 66 67 One can start untangling this problem by trying to replace the 68 current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html"> 69 ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430"> 70 <code>CXXConstructionKind::NonVirtualBase</code></a> branch of 71 <code>ExprEngine::VisitCXXConstructExpr()</code> 72 with proper support for the feature. 73 <p><i>(Difficulty: Medium) </i></p></p> 74 </li> 75 76 <li>Handle array constructors. 77 <p>When an array of objects is allocated (say, using the 78 <code>operator new[]</code> or defining a stack array), 79 constructors for all elements of the array are called. 80 We should model (potentially some of) such evaluations, 81 and the same applies for destructors called from 82 <code>operator delete[]</code>. 83 See tests cases in <a href="https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp</a>. 84 </p> 85 <p> 86 Constructing an array requires invoking multiple (potentially unknown) 87 amount of constructors with the same construct-expression. 88 Apart from the technical difficulties of juggling program points around 89 correctly to avoid accidentally merging paths together, we'll have to 90 be a judge on when to exit the loop and how to widen it. 91 Given that the constructor is going to be a default constructor, 92 a nice 95% solution might be to execute exactly one constructor and 93 then default-bind the resulting LazyCompoundVal to the whole array; 94 it'll work whenever the default constructor doesn't touch global state 95 but only initializes the object to various default values. 96 But if, say, we're making an array of strings, 97 depending on the implementation you might have to allocate a new buffer 98 for each string, and in this case default-binding won't cut it. 99 We might want to come up with an auxiliary analysis in order to perform 100 widening of these simple loops more precisely. 101 </p> 102 </li> 103 104 <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO) 105 <p>Local variables which are returned by values on all return statements 106 may be stored directly at the address for the return value, 107 eliding the copy or move constructor call. 108 Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>. 109 </p> 110 </li> 111 112 <li>Handle constructors of lambda captures 113 <p>Variables which are captured by value into a lambda require a call to 114 a copy constructor. 115 This call is not currently modeled. 116 </p> 117 </li> 118 119 <li>Handle constructors for default arguments 120 <p>Default arguments in C++ are recomputed at every call, 121 and are therefore local, and not static, variables. 122 See tests cases in <a href="https://github.com/llvm/llvm-project/tree/main/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp</a>. 123 </p> 124 <p> 125 Default arguments are annoying because the initializer expression is 126 evaluated at the call site but doesn't syntactically belong to the 127 caller's AST; instead it belongs to the ParmVarDecl for the default 128 parameter. This can lead to situations when the same expression has to 129 carry different values simultaneously - 130 when multiple instances of the same function are evaluated as part of the 131 same full-expression without specifying the default arguments. 132 Even simply calling the function twice (not necessarily within the 133 same full-expression) may lead to program points agglutinating because 134 it's the same expression. There are some nasty test cases already 135 in temporaries.cpp (struct DefaultParam and so on). I recommend adding a 136 new LocationContext kind specifically to deal with this problem. It'll 137 also help you figure out the construction context when you evaluate the 138 construct-expression (though you might still need to do some additional 139 CFG work to get construction contexts right). 140 </p> 141 </li> 142 143 <li>Enhance the modeling of the standard library. 144 <p>The analyzer needs a better understanding of STL in order to be more 145 useful on C++ codebases. 146 While full library modeling is not an easy task, 147 large gains can be achieved by supporting only a few cases: 148 e.g. calling <code>.length()</code> on an empty 149 <code>std::string</code> always yields zero. 150 <p><i>(Difficulty: Medium)</i></p><p> 151 </li> 152 153 <li>Enhance CFG to model exception-handling. 154 <p>Currently exceptions are treated as "black holes", and exception-handling 155 control structures are poorly modeled in order to be conservative. 156 This could be improved for both C++ and Objective-C exceptions. 157 <p><i>(Difficulty: Hard)</i></p></p> 158 </li> 159 </ul> 160 </li> 161 162 <li>Core Analyzer Infrastructure 163 <ul> 164 <li>Handle unions. 165 <p>Currently in the analyzer the value of a union is always regarded as 166 an unknown. 167 This problem was 168 previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a> 169 on the mailing list, but no solution was implemented. 170 <p><i> (Difficulty: Medium) </i></p></p> 171 </li> 172 173 <li>Floating-point support. 174 <p>Currently, the analyzer treats all floating-point values as unknown. 175 This project would involve adding a new <code>SVal</code> kind 176 for constant floats, generalizing the constraint manager to handle floats, 177 and auditing existing code to make sure it doesn't 178 make incorrect assumptions (most notably, that <code>X == X</code> 179 is always true, since it does not hold for <code>NaN</code>). 180 <p><i> (Difficulty: Medium)</i></p></p> 181 </li> 182 183 <li>Improved loop execution modeling. 184 <p>The analyzer simply unrolls each loop <tt>N</tt> times before 185 dropping the path, for a fixed constant <tt>N</tt>. 186 However, that results in lost coverage in cases where the loop always 187 executes more than <tt>N</tt> times. 188 A Google Summer Of Code 189 <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a> 190 was completed to make the loop bound parameterizable, 191 but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a> 192 problem still remains open. 193 194 <p><i> (Difficulty: Hard)</i></p></p> 195 </li> 196 197 <li>Basic function summarization support 198 <p>The analyzer performs inter-procedural analysis using 199 either inlining or "conservative evaluation" (invalidating all data 200 passed to the function). 201 Often, a very simple summary 202 (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be 203 enough to be a large improvement over conservative evaluation. 204 Such summaries could be obtained either syntactically, 205 or using a dataflow framework. 206 <p><i>(Difficulty: Hard)</i></p><p> 207 </li> 208 209 <li>Implement a dataflow flamework. 210 <p>The analyzer core 211 implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a> 212 engine, which performs checks 213 (use-after-free, uninitialized value read, etc.) 214 over a <em>single</em> program path. 215 However, many useful properties 216 (dead code, check-after-use, etc.) require 217 reasoning over <em>all</em> possible in a program. 218 Such reasoning requires a 219 <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework. 220 Clang already implements 221 a few dataflow analyses (most notably, liveness), 222 but they implemented in an ad-hoc fashion. 223 A proper framework would enable us writing many more useful checkers. 224 <p><i> (Difficulty: Hard) </i></p></p> 225 </li> 226 227 <li>Track type information through casts more precisely. 228 <p>The <code>DynamicTypePropagation</code> 229 checker is in charge of inferring a region's 230 dynamic type based on what operations the code is performing. 231 Casts are a rich source of type information that the analyzer currently ignores. 232 <p><i>(Difficulty: Medium)</i></p></p> 233 </li> 234 235 </ul> 236 </li> 237 238 <li>Fixing miscellaneous bugs 239 <p>Apart from the open projects listed above, 240 contributors are welcome to fix any of the outstanding 241 <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a> 242 in the Bugzilla. 243 <p><i>(Difficulty: Anything)</i></p></p> 244 </li> 245 246</ul> 247 248</div> 249</div> 250</body> 251</html> 252