1========================== 2Source-based Code Coverage 3========================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11This document explains how to use clang's source-based code coverage feature. 12It's called "source-based" because it operates on AST and preprocessor 13information directly. This allows it to generate very precise coverage data. 14 15Clang ships two other code coverage implementations: 16 17* :doc:`SanitizerCoverage` - A low-overhead tool meant for use alongside the 18 various sanitizers. It can provide up to edge-level coverage. 19 20* gcov - A GCC-compatible coverage implementation which operates on DebugInfo. 21 This is enabled by ``-ftest-coverage`` or ``--coverage``. 22 23From this point onwards "code coverage" will refer to the source-based kind. 24 25The code coverage workflow 26========================== 27 28The code coverage workflow consists of three main steps: 29 30* Compiling with coverage enabled. 31 32* Running the instrumented program. 33 34* Creating coverage reports. 35 36The next few sections work through a complete, copy-'n-paste friendly example 37based on this program: 38 39.. code-block:: cpp 40 41 % cat <<EOF > foo.cc 42 #define BAR(x) ((x) || (x)) 43 template <typename T> void foo(T x) { 44 for (unsigned I = 0; I < 10; ++I) { BAR(I); } 45 } 46 int main() { 47 foo<int>(0); 48 foo<float>(0); 49 return 0; 50 } 51 EOF 52 53Compiling with coverage enabled 54=============================== 55 56To compile code with coverage enabled, pass ``-fprofile-instr-generate 57-fcoverage-mapping`` to the compiler: 58 59.. code-block:: console 60 61 # Step 1: Compile with coverage enabled. 62 % clang++ -fprofile-instr-generate -fcoverage-mapping foo.cc -o foo 63 64Note that linking together code with and without coverage instrumentation is 65supported. Uninstrumented code simply won't be accounted for in reports. 66 67Running the instrumented program 68================================ 69 70The next step is to run the instrumented program. When the program exits it 71will write a **raw profile** to the path specified by the ``LLVM_PROFILE_FILE`` 72environment variable. If that variable does not exist, the profile is written 73to ``default.profraw`` in the current directory of the program. If 74``LLVM_PROFILE_FILE`` contains a path to a non-existent directory, the missing 75directory structure will be created. Additionally, the following special 76**pattern strings** are rewritten: 77 78* "%p" expands out to the process ID. 79 80* "%h" expands out to the hostname of the machine running the program. 81 82* "%Nm" expands out to the instrumented binary's signature. When this pattern 83 is specified, the runtime creates a pool of N raw profiles which are used for 84 on-line profile merging. The runtime takes care of selecting a raw profile 85 from the pool, locking it, and updating it before the program exits. If N is 86 not specified (i.e the pattern is "%m"), it's assumed that ``N = 1``. N must 87 be between 1 and 9. The merge pool specifier can only occur once per filename 88 pattern. 89 90* "%c" expands out to nothing, but enables a mode in which profile counter 91 updates are continuously synced to a file. This means that if the 92 instrumented program crashes, or is killed by a signal, perfect coverage 93 information can still be recovered. Continuous mode does not support value 94 profiling for PGO, and is only supported on Darwin at the moment. Support for 95 Linux may be mostly complete but requires testing, and support for 96 Fuchsia/Windows may require more extensive changes: please get involved if 97 you are interested in porting this feature. 98 99.. code-block:: console 100 101 # Step 2: Run the program. 102 % LLVM_PROFILE_FILE="foo.profraw" ./foo 103 104Creating coverage reports 105========================= 106 107Raw profiles have to be **indexed** before they can be used to generate 108coverage reports. This is done using the "merge" tool in ``llvm-profdata`` 109(which can combine multiple raw profiles and index them at the same time): 110 111.. code-block:: console 112 113 # Step 3(a): Index the raw profile. 114 % llvm-profdata merge -sparse foo.profraw -o foo.profdata 115 116There are multiple different ways to render coverage reports. The simplest 117option is to generate a line-oriented report: 118 119.. code-block:: console 120 121 # Step 3(b): Create a line-oriented coverage report. 122 % llvm-cov show ./foo -instr-profile=foo.profdata 123 124This report includes a summary view as well as dedicated sub-views for 125templated functions and their instantiations. For our example program, we get 126distinct views for ``foo<int>(...)`` and ``foo<float>(...)``. If 127``-show-line-counts-or-regions`` is enabled, ``llvm-cov`` displays sub-line 128region counts (even in macro expansions): 129 130.. code-block:: none 131 132 1| 20|#define BAR(x) ((x) || (x)) 133 ^20 ^2 134 2| 2|template <typename T> void foo(T x) { 135 3| 22| for (unsigned I = 0; I < 10; ++I) { BAR(I); } 136 ^22 ^20 ^20^20 137 4| 2|} 138 ------------------ 139 | void foo<int>(int): 140 | 2| 1|template <typename T> void foo(T x) { 141 | 3| 11| for (unsigned I = 0; I < 10; ++I) { BAR(I); } 142 | ^11 ^10 ^10^10 143 | 4| 1|} 144 ------------------ 145 | void foo<float>(int): 146 | 2| 1|template <typename T> void foo(T x) { 147 | 3| 11| for (unsigned I = 0; I < 10; ++I) { BAR(I); } 148 | ^11 ^10 ^10^10 149 | 4| 1|} 150 ------------------ 151 152To generate a file-level summary of coverage statistics instead of a 153line-oriented report, try: 154 155.. code-block:: console 156 157 # Step 3(c): Create a coverage summary. 158 % llvm-cov report ./foo -instr-profile=foo.profdata 159 Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover 160 -------------------------------------------------------------------------------------------------------------------------------------- 161 /tmp/foo.cc 13 0 100.00% 3 0 100.00% 13 0 100.00% 162 -------------------------------------------------------------------------------------------------------------------------------------- 163 TOTAL 13 0 100.00% 3 0 100.00% 13 0 100.00% 164 165The ``llvm-cov`` tool supports specifying a custom demangler, writing out 166reports in a directory structure, and generating html reports. For the full 167list of options, please refer to the `command guide 168<https://llvm.org/docs/CommandGuide/llvm-cov.html>`_. 169 170A few final notes: 171 172* The ``-sparse`` flag is optional but can result in dramatically smaller 173 indexed profiles. This option should not be used if the indexed profile will 174 be reused for PGO. 175 176* Raw profiles can be discarded after they are indexed. Advanced use of the 177 profile runtime library allows an instrumented program to merge profiling 178 information directly into an existing raw profile on disk. The details are 179 out of scope. 180 181* The ``llvm-profdata`` tool can be used to merge together multiple raw or 182 indexed profiles. To combine profiling data from multiple runs of a program, 183 try e.g: 184 185 .. code-block:: console 186 187 % llvm-profdata merge -sparse foo1.profraw foo2.profdata -o foo3.profdata 188 189Exporting coverage data 190======================= 191 192Coverage data can be exported into JSON using the ``llvm-cov export`` 193sub-command. There is a comprehensive reference which defines the structure of 194the exported data at a high level in the llvm-cov source code. 195 196Interpreting reports 197==================== 198 199There are four statistics tracked in a coverage summary: 200 201* Function coverage is the percentage of functions which have been executed at 202 least once. A function is considered to be executed if any of its 203 instantiations are executed. 204 205* Instantiation coverage is the percentage of function instantiations which 206 have been executed at least once. Template functions and static inline 207 functions from headers are two kinds of functions which may have multiple 208 instantiations. 209 210* Line coverage is the percentage of code lines which have been executed at 211 least once. Only executable lines within function bodies are considered to be 212 code lines. 213 214* Region coverage is the percentage of code regions which have been executed at 215 least once. A code region may span multiple lines (e.g in a large function 216 body with no control flow). However, it's also possible for a single line to 217 contain multiple code regions (e.g in "return x || y && z"). 218 219Of these four statistics, function coverage is usually the least granular while 220region coverage is the most granular. The project-wide totals for each 221statistic are listed in the summary. 222 223Format compatibility guarantees 224=============================== 225 226* There are no backwards or forwards compatibility guarantees for the raw 227 profile format. Raw profiles may be dependent on the specific compiler 228 revision used to generate them. It's inadvisable to store raw profiles for 229 long periods of time. 230 231* Tools must retain **backwards** compatibility with indexed profile formats. 232 These formats are not forwards-compatible: i.e, a tool which uses format 233 version X will not be able to understand format version (X+k). 234 235* Tools must also retain **backwards** compatibility with the format of the 236 coverage mappings emitted into instrumented binaries. These formats are not 237 forwards-compatible. 238 239* The JSON coverage export format has a (major, minor, patch) version triple. 240 Only a major version increment indicates a backwards-incompatible change. A 241 minor version increment is for added functionality, and patch version 242 increments are for bugfixes. 243 244Using the profiling runtime without static initializers 245======================================================= 246 247By default the compiler runtime uses a static initializer to determine the 248profile output path and to register a writer function. To collect profiles 249without using static initializers, do this manually: 250 251* Export a ``int __llvm_profile_runtime`` symbol from each instrumented shared 252 library and executable. When the linker finds a definition of this symbol, it 253 knows to skip loading the object which contains the profiling runtime's 254 static initializer. 255 256* Forward-declare ``void __llvm_profile_initialize_file(void)`` and call it 257 once from each instrumented executable. This function parses 258 ``LLVM_PROFILE_FILE``, sets the output path, and truncates any existing files 259 at that path. To get the same behavior without truncating existing files, 260 pass a filename pattern string to ``void __llvm_profile_set_filename(char 261 *)``. These calls can be placed anywhere so long as they precede all calls 262 to ``__llvm_profile_write_file``. 263 264* Forward-declare ``int __llvm_profile_write_file(void)`` and call it to write 265 out a profile. This function returns 0 when it succeeds, and a non-zero value 266 otherwise. Calling this function multiple times appends profile data to an 267 existing on-disk raw profile. 268 269In C++ files, declare these as ``extern "C"``. 270 271Collecting coverage reports for the llvm project 272================================================ 273 274To prepare a coverage report for llvm (and any of its sub-projects), add 275``-DLLVM_BUILD_INSTRUMENTED_COVERAGE=On`` to the cmake configuration. Raw 276profiles will be written to ``$BUILD_DIR/profiles/``. To prepare an html 277report, run ``llvm/utils/prepare-code-coverage-artifact.py``. 278 279To specify an alternate directory for raw profiles, use 280``-DLLVM_PROFILE_DATA_DIR``. To change the size of the profile merge pool, use 281``-DLLVM_PROFILE_MERGE_POOL_SIZE``. 282 283Drawbacks and limitations 284========================= 285 286* Prior to version 2.26, the GNU binutils BFD linker is not able link programs 287 compiled with ``-fcoverage-mapping`` in its ``--gc-sections`` mode. Possible 288 workarounds include disabling ``--gc-sections``, upgrading to a newer version 289 of BFD, or using the Gold linker. 290 291* Code coverage does not handle unpredictable changes in control flow or stack 292 unwinding in the presence of exceptions precisely. Consider the following 293 function: 294 295 .. code-block:: cpp 296 297 int f() { 298 may_throw(); 299 return 0; 300 } 301 302 If the call to ``may_throw()`` propagates an exception into ``f``, the code 303 coverage tool may mark the ``return`` statement as executed even though it is 304 not. A call to ``longjmp()`` can have similar effects. 305 306Clang implementation details 307============================ 308 309This section may be of interest to those wishing to understand or improve 310the clang code coverage implementation. 311 312Gap regions 313----------- 314 315Gap regions are source regions with counts. A reporting tool cannot set a line 316execution count to the count from a gap region unless that region is the only 317one on a line. 318 319Gap regions are used to eliminate unnatural artifacts in coverage reports, such 320as red "unexecuted" highlights present at the end of an otherwise covered line, 321or blue "executed" highlights present at the start of a line that is otherwise 322not executed. 323 324Switch statements 325----------------- 326 327The region mapping for a switch body consists of a gap region that covers the 328entire body (starting from the '{' in 'switch (...) {', and terminating where the 329last case ends). This gap region has a zero count: this causes "gap" areas in 330between case statements, which contain no executable code, to appear uncovered. 331 332When a switch case is visited, the parent region is extended: if the parent 333region has no start location, its start location becomes the start of the case. 334This is used to support switch statements without a ``CompoundStmt`` body, in 335which the switch body and the single case share a count. 336 337For switches with ``CompoundStmt`` bodies, a new region is created at the start 338of each switch case. 339