17330f729Sjoerg===================================== 27330f729SjoergCross Translation Unit (CTU) Analysis 37330f729Sjoerg===================================== 47330f729Sjoerg 57330f729SjoergNormally, static analysis works in the boundary of one translation unit (TU). 6*e038c9c4SjoergHowever, with additional steps and configuration we can enable the analysis to inline the definition of a function from 7*e038c9c4Sjoerganother TU. 87330f729Sjoerg 97330f729Sjoerg.. contents:: 107330f729Sjoerg :local: 117330f729Sjoerg 12*e038c9c4SjoergOverview 13*e038c9c4Sjoerg________ 14*e038c9c4SjoergCTU analysis can be used in a variety of ways. The importing of external TU definitions can work with pre-dumped PCH 15*e038c9c4Sjoergfiles or generating the necessary AST structure on-demand, during the analysis of the main TU. Driving the static 16*e038c9c4Sjoerganalysis can also be implemented in multiple ways. The most direct way is to specify the necessary commandline options 17*e038c9c4Sjoergof the Clang frontend manually (and generate the prerequisite dependencies of the specific import method by hand). This 18*e038c9c4Sjoergprocess can be automated by other tools, like `CodeChecker <https://github.com/Ericsson/codechecker>`_ and scan-build-py 19*e038c9c4Sjoerg(preference for the former). 207330f729Sjoerg 21*e038c9c4SjoergPCH-based analysis 22*e038c9c4Sjoerg__________________ 23*e038c9c4SjoergThe analysis needs the PCH dumps of all the translations units used in the project. 24*e038c9c4SjoergThese can be generated by the Clang Frontend itself, and must be arranged in a specific way in the filesystem. 25*e038c9c4SjoergThe index, which maps symbols' USR names to PCH dumps containing them must also be generated by the 26*e038c9c4Sjoerg`clang-extdef-mapping`. Entries in the index *must* have an `.ast` suffix if the goal 27*e038c9c4Sjoergis to use PCH-based analysis, as the lack of that extension signals that the entry is to be used as a source-file, and parsed on-demand. 28*e038c9c4SjoergThis tool uses a :doc:`compilation database <../../JSONCompilationDatabase>` to 29*e038c9c4Sjoergdetermine the compilation flags used. 30*e038c9c4SjoergThe analysis invocation must be provided with the directory which contains the dumps and the mapping files. 31*e038c9c4Sjoerg 32*e038c9c4Sjoerg 33*e038c9c4SjoergManual CTU Analysis 34*e038c9c4Sjoerg################### 357330f729SjoergLet's consider these source files in our minimal example: 367330f729Sjoerg 377330f729Sjoerg.. code-block:: cpp 387330f729Sjoerg 397330f729Sjoerg // main.cpp 407330f729Sjoerg int foo(); 417330f729Sjoerg 427330f729Sjoerg int main() { 437330f729Sjoerg return 3 / foo(); 447330f729Sjoerg } 457330f729Sjoerg 467330f729Sjoerg.. code-block:: cpp 477330f729Sjoerg 487330f729Sjoerg // foo.cpp 497330f729Sjoerg int foo() { 507330f729Sjoerg return 0; 517330f729Sjoerg } 527330f729Sjoerg 537330f729SjoergAnd a compilation database: 547330f729Sjoerg 557330f729Sjoerg.. code-block:: bash 567330f729Sjoerg 577330f729Sjoerg [ 587330f729Sjoerg { 597330f729Sjoerg "directory": "/path/to/your/project", 607330f729Sjoerg "command": "clang++ -c foo.cpp -o foo.o", 617330f729Sjoerg "file": "foo.cpp" 627330f729Sjoerg }, 637330f729Sjoerg { 647330f729Sjoerg "directory": "/path/to/your/project", 657330f729Sjoerg "command": "clang++ -c main.cpp -o main.o", 667330f729Sjoerg "file": "main.cpp" 677330f729Sjoerg } 687330f729Sjoerg ] 697330f729Sjoerg 707330f729SjoergWe'd like to analyze `main.cpp` and discover the division by zero bug. 71*e038c9c4SjoergIn order to be able to inline the definition of `foo` from `foo.cpp` first we have to generate the `AST` (or `PCH`) file 72*e038c9c4Sjoergof `foo.cpp`: 737330f729Sjoerg 747330f729Sjoerg.. code-block:: bash 757330f729Sjoerg 767330f729Sjoerg $ pwd $ /path/to/your/project 777330f729Sjoerg $ clang++ -emit-ast -o foo.cpp.ast foo.cpp 787330f729Sjoerg $ # Check that the .ast file is generated: 797330f729Sjoerg $ ls 807330f729Sjoerg compile_commands.json foo.cpp.ast foo.cpp main.cpp 817330f729Sjoerg $ 827330f729Sjoerg 83*e038c9c4SjoergThe next step is to create a CTU index file which holds the `USR` name and location of external definitions in the 84*e038c9c4Sjoergsource files: 857330f729Sjoerg 867330f729Sjoerg.. code-block:: bash 877330f729Sjoerg 887330f729Sjoerg $ clang-extdef-mapping -p . foo.cpp 897330f729Sjoerg c:@F@foo# /path/to/your/project/foo.cpp 907330f729Sjoerg $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt 917330f729Sjoerg 927330f729SjoergWe have to modify `externalDefMap.txt` to contain the name of the `.ast` files instead of the source files: 937330f729Sjoerg 947330f729Sjoerg.. code-block:: bash 957330f729Sjoerg 967330f729Sjoerg $ sed -i -e "s/.cpp/.cpp.ast/g" externalDefMap.txt 977330f729Sjoerg 987330f729SjoergWe still have to further modify the `externalDefMap.txt` file to contain relative paths: 997330f729Sjoerg 1007330f729Sjoerg.. code-block:: bash 1017330f729Sjoerg 1027330f729Sjoerg $ sed -i -e "s|$(pwd)/||g" externalDefMap.txt 1037330f729Sjoerg 1047330f729SjoergNow everything is available for the CTU analysis. 1057330f729SjoergWe have to feed Clang with CTU specific extra arguments: 1067330f729Sjoerg 1077330f729Sjoerg.. code-block:: bash 1087330f729Sjoerg 1097330f729Sjoerg $ pwd 1107330f729Sjoerg /path/to/your/project 111*e038c9c4Sjoerg $ clang++ --analyze \ 112*e038c9c4Sjoerg -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \ 113*e038c9c4Sjoerg -Xclang -analyzer-config -Xclang ctu-dir=. \ 114*e038c9c4Sjoerg -Xclang -analyzer-output=plist-multi-file \ 115*e038c9c4Sjoerg main.cpp 1167330f729Sjoerg main.cpp:5:12: warning: Division by zero 1177330f729Sjoerg return 3 / foo(); 1187330f729Sjoerg ~~^~~~~~~ 1197330f729Sjoerg 1 warning generated. 1207330f729Sjoerg $ # The plist file with the result is generated. 121*e038c9c4Sjoerg $ ls -F 1227330f729Sjoerg compile_commands.json externalDefMap.txt foo.ast foo.cpp foo.cpp.ast main.cpp main.plist 1237330f729Sjoerg $ 1247330f729Sjoerg 125*e038c9c4SjoergThis manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use 126*e038c9c4Sjoerg`CodeChecker` or `scan-build-py`. 1277330f729Sjoerg 1287330f729SjoergAutomated CTU Analysis with CodeChecker 129*e038c9c4Sjoerg####################################### 1307330f729SjoergThe `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang. 1317330f729SjoergOnce we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes: 1327330f729Sjoerg 1337330f729Sjoerg.. code-block:: bash 1347330f729Sjoerg 1357330f729Sjoerg $ CodeChecker analyze --ctu compile_commands.json -o reports 136*e038c9c4Sjoerg $ ls -F 137*e038c9c4Sjoerg compile_commands.json foo.cpp foo.cpp.ast main.cpp reports/ 1387330f729Sjoerg $ tree reports 1397330f729Sjoerg reports 1407330f729Sjoerg ├── compile_cmd.json 1417330f729Sjoerg ├── compiler_info.json 1427330f729Sjoerg ├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist 1437330f729Sjoerg ├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist 1447330f729Sjoerg ├── metadata.json 1457330f729Sjoerg └── unique_compile_commands.json 1467330f729Sjoerg 1477330f729Sjoerg 0 directories, 6 files 1487330f729Sjoerg $ 1497330f729Sjoerg 1507330f729SjoergThe `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools. 1517330f729SjoergE.g. one may use `CodeChecker parse` to view the results in command line: 1527330f729Sjoerg 1537330f729Sjoerg.. code-block:: bash 1547330f729Sjoerg 1557330f729Sjoerg $ CodeChecker parse reports 1567330f729Sjoerg [HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero] 1577330f729Sjoerg return 3 / foo(); 1587330f729Sjoerg ^ 1597330f729Sjoerg 1607330f729Sjoerg Found 1 defect(s) in main.cpp 1617330f729Sjoerg 1627330f729Sjoerg 1637330f729Sjoerg ----==== Summary ====---- 1647330f729Sjoerg ----------------------- 1657330f729Sjoerg Filename | Report count 1667330f729Sjoerg ----------------------- 1677330f729Sjoerg main.cpp | 1 1687330f729Sjoerg ----------------------- 1697330f729Sjoerg ----------------------- 1707330f729Sjoerg Severity | Report count 1717330f729Sjoerg ----------------------- 1727330f729Sjoerg HIGH | 1 1737330f729Sjoerg ----------------------- 1747330f729Sjoerg ----=================---- 1757330f729Sjoerg Total number of reports: 1 1767330f729Sjoerg ----=================---- 1777330f729Sjoerg 1787330f729SjoergOr we can use `CodeChecker parse -e html` to export the results into HTML format: 1797330f729Sjoerg 1807330f729Sjoerg.. code-block:: bash 1817330f729Sjoerg 1827330f729Sjoerg $ CodeChecker parse -e html -o html_out reports 1837330f729Sjoerg $ firefox html_out/index.html 1847330f729Sjoerg 1857330f729SjoergAutomated CTU Analysis with scan-build-py (don't do it) 186*e038c9c4Sjoerg############################################################# 187*e038c9c4SjoergWe actively develop CTU with CodeChecker as the driver for this feature, `scan-build-py` is not actively developed for CTU. 188*e038c9c4Sjoerg`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only. 1897330f729Sjoerg 1907330f729SjoergExample usage of scan-build-py: 1917330f729Sjoerg 1927330f729Sjoerg.. code-block:: bash 1937330f729Sjoerg 1947330f729Sjoerg $ /your/path/to/llvm-project/clang/tools/scan-build-py/bin/analyze-build --ctu 1957330f729Sjoerg analyze-build: Run 'scan-view /tmp/scan-build-2019-07-17-17-53-33-810365-7fqgWk' to examine bug reports. 1967330f729Sjoerg $ /your/path/to/llvm-project/clang/tools/scan-view/bin/scan-view /tmp/scan-build-2019-07-17-17-53-33-810365-7fqgWk 1977330f729Sjoerg Starting scan-view at: http://127.0.0.1:8181 1987330f729Sjoerg Use Ctrl-C to exit. 1997330f729Sjoerg [6336:6431:0717/175357.633914:ERROR:browser_process_sub_thread.cc(209)] Waited 5 ms for network service 2007330f729Sjoerg Opening in existing browser session. 2017330f729Sjoerg ^C 2027330f729Sjoerg $ 203*e038c9c4Sjoerg 204*e038c9c4Sjoerg.. _ctu-on-demand: 205*e038c9c4Sjoerg 206*e038c9c4SjoergOn-demand analysis 207*e038c9c4Sjoerg__________________ 208*e038c9c4SjoergThe analysis produces the necessary AST structure of external TUs during analysis. This requires the 209*e038c9c4Sjoergexact compiler invocations for each TU, which can be generated by hand, or by tools driving the analyzer. 210*e038c9c4SjoergThe compiler invocation is a shell command that could be used to compile the TU-s main source file. 211*e038c9c4SjoergThe mapping from absolute source file paths of a TU to lists of compilation command segments used to 212*e038c9c4Sjoergcompile said TU are given in YAML format referred to as `invocation list`, and must be passed as an 213*e038c9c4Sjoerganalyer-config argument. 214*e038c9c4SjoergThe index, which maps function USR names to source files containing them must also be generated by the 215*e038c9c4Sjoerg`clang-extdef-mapping`. Entries in the index must *not* have an `.ast` suffix if the goal 216*e038c9c4Sjoergis to use On-demand analysis, as that extension signals that the entry is to be used as an PCH-dump. 217*e038c9c4SjoergThe mapping of external definitions implicitly uses a 218*e038c9c4Sjoerg:doc:`compilation database <../../JSONCompilationDatabase>` to determine the compilation flags used. 219*e038c9c4SjoergThe analysis invocation must be provided with the directory which contains the mapping 220*e038c9c4Sjoergfiles, and the `invocation list` which is used to determine compiler flags. 221*e038c9c4Sjoerg 222*e038c9c4Sjoerg 223*e038c9c4SjoergManual CTU Analysis 224*e038c9c4Sjoerg################### 225*e038c9c4Sjoerg 226*e038c9c4SjoergLet's consider these source files in our minimal example: 227*e038c9c4Sjoerg 228*e038c9c4Sjoerg.. code-block:: cpp 229*e038c9c4Sjoerg 230*e038c9c4Sjoerg // main.cpp 231*e038c9c4Sjoerg int foo(); 232*e038c9c4Sjoerg 233*e038c9c4Sjoerg int main() { 234*e038c9c4Sjoerg return 3 / foo(); 235*e038c9c4Sjoerg } 236*e038c9c4Sjoerg 237*e038c9c4Sjoerg.. code-block:: cpp 238*e038c9c4Sjoerg 239*e038c9c4Sjoerg // foo.cpp 240*e038c9c4Sjoerg int foo() { 241*e038c9c4Sjoerg return 0; 242*e038c9c4Sjoerg } 243*e038c9c4Sjoerg 244*e038c9c4SjoergThe compilation database: 245*e038c9c4Sjoerg 246*e038c9c4Sjoerg.. code-block:: bash 247*e038c9c4Sjoerg 248*e038c9c4Sjoerg [ 249*e038c9c4Sjoerg { 250*e038c9c4Sjoerg "directory": "/path/to/your/project", 251*e038c9c4Sjoerg "command": "clang++ -c foo.cpp -o foo.o", 252*e038c9c4Sjoerg "file": "foo.cpp" 253*e038c9c4Sjoerg }, 254*e038c9c4Sjoerg { 255*e038c9c4Sjoerg "directory": "/path/to/your/project", 256*e038c9c4Sjoerg "command": "clang++ -c main.cpp -o main.o", 257*e038c9c4Sjoerg "file": "main.cpp" 258*e038c9c4Sjoerg } 259*e038c9c4Sjoerg ] 260*e038c9c4Sjoerg 261*e038c9c4SjoergThe `invocation list`: 262*e038c9c4Sjoerg 263*e038c9c4Sjoerg.. code-block:: bash 264*e038c9c4Sjoerg 265*e038c9c4Sjoerg "/path/to/your/project/foo.cpp": 266*e038c9c4Sjoerg - "clang++" 267*e038c9c4Sjoerg - "-c" 268*e038c9c4Sjoerg - "/path/to/your/project/foo.cpp" 269*e038c9c4Sjoerg - "-o" 270*e038c9c4Sjoerg - "/path/to/your/project/foo.o" 271*e038c9c4Sjoerg 272*e038c9c4Sjoerg "/path/to/your/project/main.cpp": 273*e038c9c4Sjoerg - "clang++" 274*e038c9c4Sjoerg - "-c" 275*e038c9c4Sjoerg - "/path/to/your/project/main.cpp" 276*e038c9c4Sjoerg - "-o" 277*e038c9c4Sjoerg - "/path/to/your/project/main.o" 278*e038c9c4Sjoerg 279*e038c9c4SjoergWe'd like to analyze `main.cpp` and discover the division by zero bug. 280*e038c9c4SjoergAs we are using On-demand mode, we only need to create a CTU index file which holds the `USR` name and location of 281*e038c9c4Sjoergexternal definitions in the source files: 282*e038c9c4Sjoerg 283*e038c9c4Sjoerg.. code-block:: bash 284*e038c9c4Sjoerg 285*e038c9c4Sjoerg $ clang-extdef-mapping -p . foo.cpp 286*e038c9c4Sjoerg c:@F@foo# /path/to/your/project/foo.cpp 287*e038c9c4Sjoerg $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt 288*e038c9c4Sjoerg 289*e038c9c4SjoergNow everything is available for the CTU analysis. 290*e038c9c4SjoergWe have to feed Clang with CTU specific extra arguments: 291*e038c9c4Sjoerg 292*e038c9c4Sjoerg.. code-block:: bash 293*e038c9c4Sjoerg 294*e038c9c4Sjoerg $ pwd 295*e038c9c4Sjoerg /path/to/your/project 296*e038c9c4Sjoerg $ clang++ --analyze \ 297*e038c9c4Sjoerg -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \ 298*e038c9c4Sjoerg -Xclang -analyzer-config -Xclang ctu-dir=. \ 299*e038c9c4Sjoerg -Xclang -analyzer-config -Xclang ctu-invocation-list=invocations.yaml \ 300*e038c9c4Sjoerg -Xclang -analyzer-output=plist-multi-file \ 301*e038c9c4Sjoerg main.cpp 302*e038c9c4Sjoerg main.cpp:5:12: warning: Division by zero 303*e038c9c4Sjoerg return 3 / foo(); 304*e038c9c4Sjoerg ~~^~~~~~~ 305*e038c9c4Sjoerg 1 warning generated. 306*e038c9c4Sjoerg $ # The plist file with the result is generated. 307*e038c9c4Sjoerg $ ls -F 308*e038c9c4Sjoerg compile_commands.json externalDefMap.txt foo.cpp main.cpp main.plist 309*e038c9c4Sjoerg $ 310*e038c9c4Sjoerg 311*e038c9c4SjoergThis manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use 312*e038c9c4Sjoerg`CodeChecker` or `scan-build-py`. 313*e038c9c4Sjoerg 314*e038c9c4SjoergAutomated CTU Analysis with CodeChecker 315*e038c9c4Sjoerg####################################### 316*e038c9c4SjoergThe `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang. 317*e038c9c4SjoergOnce we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes: 318*e038c9c4Sjoerg 319*e038c9c4Sjoerg.. code-block:: bash 320*e038c9c4Sjoerg 321*e038c9c4Sjoerg $ CodeChecker analyze --ctu --ctu-ast-loading-mode on-demand compile_commands.json -o reports 322*e038c9c4Sjoerg $ ls -F 323*e038c9c4Sjoerg compile_commands.json foo.cpp main.cpp reports/ 324*e038c9c4Sjoerg $ tree reports 325*e038c9c4Sjoerg reports 326*e038c9c4Sjoerg ├── compile_cmd.json 327*e038c9c4Sjoerg ├── compiler_info.json 328*e038c9c4Sjoerg ├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist 329*e038c9c4Sjoerg ├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist 330*e038c9c4Sjoerg ├── metadata.json 331*e038c9c4Sjoerg └── unique_compile_commands.json 332*e038c9c4Sjoerg 333*e038c9c4Sjoerg 0 directories, 6 files 334*e038c9c4Sjoerg $ 335*e038c9c4Sjoerg 336*e038c9c4SjoergThe `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools. 337*e038c9c4SjoergE.g. one may use `CodeChecker parse` to view the results in command line: 338*e038c9c4Sjoerg 339*e038c9c4Sjoerg.. code-block:: bash 340*e038c9c4Sjoerg 341*e038c9c4Sjoerg $ CodeChecker parse reports 342*e038c9c4Sjoerg [HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero] 343*e038c9c4Sjoerg return 3 / foo(); 344*e038c9c4Sjoerg ^ 345*e038c9c4Sjoerg 346*e038c9c4Sjoerg Found 1 defect(s) in main.cpp 347*e038c9c4Sjoerg 348*e038c9c4Sjoerg 349*e038c9c4Sjoerg ----==== Summary ====---- 350*e038c9c4Sjoerg ----------------------- 351*e038c9c4Sjoerg Filename | Report count 352*e038c9c4Sjoerg ----------------------- 353*e038c9c4Sjoerg main.cpp | 1 354*e038c9c4Sjoerg ----------------------- 355*e038c9c4Sjoerg ----------------------- 356*e038c9c4Sjoerg Severity | Report count 357*e038c9c4Sjoerg ----------------------- 358*e038c9c4Sjoerg HIGH | 1 359*e038c9c4Sjoerg ----------------------- 360*e038c9c4Sjoerg ----=================---- 361*e038c9c4Sjoerg Total number of reports: 1 362*e038c9c4Sjoerg ----=================---- 363*e038c9c4Sjoerg 364*e038c9c4SjoergOr we can use `CodeChecker parse -e html` to export the results into HTML format: 365*e038c9c4Sjoerg 366*e038c9c4Sjoerg.. code-block:: bash 367*e038c9c4Sjoerg 368*e038c9c4Sjoerg $ CodeChecker parse -e html -o html_out reports 369*e038c9c4Sjoerg $ firefox html_out/index.html 370*e038c9c4Sjoerg 371*e038c9c4SjoergAutomated CTU Analysis with scan-build-py (don't do it) 372*e038c9c4Sjoerg####################################################### 373*e038c9c4SjoergWe actively develop CTU with CodeChecker as the driver for feature, `scan-build-py` is not actively developed for CTU. 374*e038c9c4Sjoerg`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only. 375*e038c9c4Sjoerg 376*e038c9c4SjoergCurrently On-demand analysis is not supported with `scan-build-py`. 377*e038c9c4Sjoerg 378