xref: /llvm-project/llvm/docs/TestSuiteGuide.md (revision 451a80ccc034799151d3a82c15e320cdde5a2e04)
1test-suite Guide
2================
3
4Quickstart
5----------
6
71. The lit test runner is required to run the tests. You can either use one
8   from an LLVM build:
9
10   ```bash
11   % <path to llvm build>/bin/llvm-lit --version
12   lit 20.0.0dev
13   ```
14
15   An alternative is installing it as a Python package in a Python virtual
16   environment:
17
18   ```bash
19   % python3 -m venv .venv
20   % . .venv/bin/activate
21   % pip install git+https://github.com/llvm/llvm-project.git#subdirectory=llvm/utils/lit
22   % lit --version
23   lit 20.0.0dev
24   ```
25
26   Installing the official Python release of lit in a Python virtual
27   environment could also work. This will install the most recent
28   release of lit:
29
30   ```bash
31   % python3 -m venv .venv
32   % . .venv/bin/activate
33   % pip install lit
34   % lit --version
35   lit 18.1.8
36   ```
37
38   Please note that recent tests may rely on features not in the latest released lit.
39   If in doubt, try one of the previous methods.
40
412. Check out the `test-suite` module with:
42
43   ```bash
44   % git clone https://github.com/llvm/llvm-test-suite.git test-suite
45   ```
46
473. Create a build directory and use CMake to configure the suite. Use the
48   `CMAKE_C_COMPILER` option to specify the compiler to test. Use a cache file
49   to choose a typical build configuration:
50
51   ```bash
52   % mkdir test-suite-build
53   % cd test-suite-build
54   % cmake -DCMAKE_C_COMPILER=<path to llvm build>/bin/clang \
55           -C../test-suite/cmake/caches/O3.cmake \
56           ../test-suite
57   ```
58
59**NOTE!** if you are using your built clang, and you want to build and run the
60MicroBenchmarks/XRay microbenchmarks, you need to add `compiler-rt` to your
61`LLVM_ENABLE_RUNTIMES` cmake flag.
62
634. Build the benchmarks:
64
65   ```text
66   % make
67   Scanning dependencies of target timeit-target
68   [  0%] Building C object tools/CMakeFiles/timeit-target.dir/timeit.c.o
69   [  0%] Linking C executable timeit-target
70   ...
71   ```
72
735. Run the tests with lit:
74
75   ```text
76   % llvm-lit -v -j 1 -o results.json .
77   -- Testing: 474 tests, 1 threads --
78   PASS: test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test (1 of 474)
79   ********** TEST 'test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test' RESULTS **********
80   compile_time: 0.2192
81   exec_time: 0.0462
82   hash: "59620e187c6ac38b36382685ccd2b63b"
83   size: 83348
84   **********
85   PASS: test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test (2 of 474)
86   ...
87   ```
88**NOTE!** even in the case you only want to get the compile-time results(code size, llvm stats etc),
89you need to run the test with the above `llvm-lit` command. In that case, the *results.json* file will
90contain compile-time metrics.
91
926. Show and compare result files (optional):
93
94   ```bash
95   # Make sure pandas and scipy are installed. Prepend `sudo` if necessary.
96   % pip install pandas scipy
97   # Show a single result file:
98   % test-suite/utils/compare.py results.json
99   # Compare two result files:
100   % test-suite/utils/compare.py results_a.json results_b.json
101   ```
102
103
104Structure
105---------
106
107The test-suite contains benchmark and test programs.  The programs come with
108reference outputs so that their correctness can be checked.  The suite comes
109with tools to collect metrics such as benchmark runtime, compilation time and
110code size.
111
112The test-suite is divided into several directories:
113
114-  `SingleSource/`
115
116   Contains test programs that are only a single source file in size.  A
117   subdirectory may contain several programs.
118
119-  `MultiSource/`
120
121   Contains subdirectories which entire programs with multiple source files.
122   Large benchmarks and whole applications go here.
123
124-  `MicroBenchmarks/`
125
126   Programs using the [google-benchmark](https://github.com/google/benchmark)
127   library. The programs define functions that are run multiple times until the
128   measurement results are statistically significant.
129
130-  `External/`
131
132   Contains descriptions and test data for code that cannot be directly
133   distributed with the test-suite. The most prominent members of this
134   directory are the SPEC CPU benchmark suites.
135   See [External Suites](#external-suites).
136
137-  `Bitcode/`
138
139   These tests are mostly written in LLVM bitcode.
140
141-  `CTMark/`
142
143   Contains symbolic links to other benchmarks forming a representative sample
144   for compilation performance measurements.
145
146### Benchmarks
147
148Every program can work as a correctness test. Some programs are unsuitable for
149performance measurements. Setting the `TEST_SUITE_BENCHMARKING_ONLY` CMake
150option to `ON` will disable them.
151
152The MultiSource benchmarks consist of the following apps and benchmarks:
153
154| MultiSource          | Language  | Application Area              | Remark               |
155|----------------------|-----------|-------------------------------|----------------------|
156| 7zip                 |  C/C++    | Compression/Decompression     |                      |
157| ASCI_Purple          |  C        | SMG2000 benchmark and solver  | Memory intensive app |
158| ASC_Sequoia          |  C        | Simulation and solver         |                      |
159| BitBench             |  C        | uudecode/uuencode utility     | Bit Stream benchmark for functional compilers |
160| Bullet               |  C++      | Bullet 2.75 physics engine    |                      |
161| DOE-ProxyApps-C++    |  C++      | HPC/scientific apps           | Small applications, representative of our larger DOE workloads |
162| DOE-ProxyApps-C      |  C        | HPC/scientific apps           | "                    |
163| Fhourstones          |  C        | Game/solver                   | Integer benchmark that efficiently solves positions in the game of Connect-4 |
164| Fhourstones-3.1      |  C        | Game/solver                   | "                    |
165| FreeBench            |  C        | Benchmark suite               | Raytracer, four in a row, neural network, file compressor, Fast Fourier/Cosine/Sine Transform |
166| llubenchmark         |  C        | Linked-list micro-benchmark   |                      |
167| mafft                |  C        | Bioinformatics                | A multiple sequence alignment program |
168| MallocBench          |  C        | Benchmark suite               | cfrac, espresso, gawk, gs, make, p2c, perl |
169| McCat                |  C        | Benchmark suite               | Quicksort, bubblesort, eigenvalues |
170| mediabench           |  C        | Benchmark suite               | adpcm, g721, gsm, jpeg, mpeg2 |
171| MiBench              |  C        | Embedded benchmark suite      | Automotive, consumer, office, security, telecom apps  |
172| nbench               |  C        |                               | BYTE Magazine's BYTEmark benchmark program |
173| NPB-serial           |  C        | Parallel computing            | Serial version of the NPB IS code |
174| Olden                |  C        | Data Structures               | SGI version of the Olden benchmark |
175| OptimizerEval        |  C        | Solver                        | Preston Brigg's optimizer evaluation framework |
176| PAQ8p                |  C++      | Data compression              |                      |
177| Prolangs-C++         |  C++      | Benchmark suite               | city, employ, life, NP, ocean, primes, simul, vcirc |
178| Prolangs-C           |  C        | Benchmark suite               | agrep, archie-client, bison, gnugo, unix-smail |
179| Ptrdist              |  C        | Pointer-Intensive Benchmark Suite |                  |
180| Rodinia              |  C        | Scientific apps              | backprop, pathfinder, srad |
181| SciMark2-C           |  C        | Scientific apps              | FFT, LU, Montecarlo, sparse matmul |
182| sim                  |  C        | Dynamic programming          | A Time-Efficient, Linear-Space Local Similarity Algorithm |
183| tramp3d-v4           |  C++      | Numerical analysis           | Template-intensive numerical program based on FreePOOMA |
184| Trimaran             |  C        | Encryption                   | 3des, md5, crc |
185| TSVC                 |  C        | Vectorization benchmark      | Test Suite for Vectorizing Compilers (TSVC) |
186| VersaBench           |  C        | Benchmark suite              | 8b10b, beamformer, bmm, dbms, ecbdes |
187
188All MultiSource applications are suitable for performance measurements
189and will run when CMake option `TEST_SUITE_BENCHMARKING_ONLY` is set.
190
191Configuration
192-------------
193
194The test-suite has configuration options to customize building and running the
195benchmarks. CMake can print a list of them:
196
197```bash
198% cd test-suite-build
199# Print basic options:
200% cmake -LH
201# Print all options:
202% cmake -LAH
203```
204
205### Common Configuration Options
206
207- `CMAKE_C_FLAGS`
208
209  Specify extra flags to be passed to C compiler invocations.  The flags are
210  also passed to the C++ compiler and linker invocations.  See
211  [https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html)
212
213- `CMAKE_C_COMPILER`
214
215  Select the C compiler executable to be used. Note that the C++ compiler is
216  inferred automatically i.e. when specifying `path/to/clang` CMake will
217  automatically use `path/to/clang++` as the C++ compiler.  See
218  [https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER.html](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER.html)
219
220- `CMAKE_Fortran_COMPILER`
221
222  Select the Fortran compiler executable to be used. Not set by default and not
223  required unless running the Fortran Test Suite.
224
225- `CMAKE_BUILD_TYPE`
226
227  Select a build type like `OPTIMIZE` or `DEBUG` selecting a set of predefined
228  compiler flags. These flags are applied regardless of the `CMAKE_C_FLAGS`
229  option and may be changed by modifying `CMAKE_C_FLAGS_OPTIMIZE` etc.  See
230  [https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html](https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html)
231
232- `TEST_SUITE_FORTRAN`
233
234  Activate that Fortran tests. This is a work in progress. More information can be
235  found in the [Flang documentation](https://flang.llvm.org/docs/FortranLLVMTestSuite.html)
236
237- `TEST_SUITE_RUN_UNDER`
238
239  Prefix test invocations with the given tool. This is typically used to run
240  cross-compiled tests within a simulator tool.
241
242- `TEST_SUITE_BENCHMARKING_ONLY`
243
244  Disable tests that are unsuitable for performance measurements. The disabled
245  tests either run for a very short time or are dominated by I/O performance
246  making them unsuitable as compiler performance tests.
247
248- `TEST_SUITE_SUBDIRS`
249
250  Semicolon-separated list of directories to include. This can be used to only
251  build parts of the test-suite or to include external suites.  This option
252  does not work reliably with deeper subdirectories as it skips intermediate
253  `CMakeLists.txt` files which may be required.
254
255- `TEST_SUITE_COLLECT_STATS`
256
257  Collect internal LLVM statistics. Appends `-save-stats=obj` when invoking the
258  compiler and makes the lit runner collect and merge the statistic files.
259
260- `TEST_SUITE_RUN_BENCHMARKS`
261
262  If this is set to `OFF` then lit will not actually run the tests but just
263  collect build statistics like compile time and code size.
264
265- `TEST_SUITE_USE_PERF`
266
267  Use the `perf` tool for time measurement instead of the `timeit` tool that
268  comes with the test-suite.  The `perf` is usually available on linux systems.
269
270- `TEST_SUITE_SPEC2000_ROOT`, `TEST_SUITE_SPEC2006_ROOT`, `TEST_SUITE_SPEC2017_ROOT`, ...
271
272  Specify installation directories of external benchmark suites. You can find
273  more information about expected versions or usage in the README files in the
274  `External` directory (such as `External/SPEC/README`)
275
276### Common CMake Flags
277
278- `-GNinja`
279
280  Generate build files for the ninja build tool.
281
282- `-Ctest-suite/cmake/caches/<cachefile.cmake>`
283
284  Use a CMake cache.  The test-suite comes with several CMake caches which
285  predefine common or tricky build configurations.
286
287
288Displaying and Analyzing Results
289--------------------------------
290
291The `compare.py` script displays and compares result files.  A result file is
292produced when invoking lit with the `-o filename.json` flag.
293
294Example usage:
295
296- Basic Usage:
297
298  ```text
299  % test-suite/utils/compare.py baseline.json
300  Warning: 'test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test' has No metrics!
301  Tests: 508
302  Metric: exec_time
303
304  Program                                         baseline
305
306  INT2006/456.hmmer/456.hmmer                   1222.90
307  INT2006/464.h264ref/464.h264ref               928.70
308  ...
309               baseline
310  count  506.000000
311  mean   20.563098
312  std    111.423325
313  min    0.003400
314  25%    0.011200
315  50%    0.339450
316  75%    4.067200
317  max    1222.896800
318  ```
319
320- Show compile_time or text segment size metrics:
321
322  ```bash
323  % test-suite/utils/compare.py -m compile_time baseline.json
324  % test-suite/utils/compare.py -m size.__text baseline.json
325  ```
326
327- Compare two result files and filter short running tests:
328
329  ```bash
330  % test-suite/utils/compare.py --filter-short baseline.json experiment.json
331  ...
332  Program                                         baseline  experiment  diff
333
334  SingleSour.../Benchmarks/Linpack/linpack-pc     5.16      4.30        -16.5%
335  MultiSourc...erolling-dbl/LoopRerolling-dbl     7.01      7.86         12.2%
336  SingleSour...UnitTests/Vectorizer/gcc-loops     3.89      3.54        -9.0%
337  ...
338  ```
339
340- Merge multiple baseline and experiment result files by taking the minimum
341  runtime each:
342
343  ```bash
344  % test-suite/utils/compare.py base0.json base1.json base2.json vs exp0.json exp1.json exp2.json
345  ```
346
347### Continuous Tracking with LNT
348
349LNT is a set of client and server tools for continuously monitoring
350performance. You can find more information at
351[https://llvm.org/docs/lnt](https://llvm.org/docs/lnt). The official LNT instance
352of the LLVM project is hosted at [http://lnt.llvm.org](http://lnt.llvm.org).
353
354
355External Suites
356---------------
357
358External suites such as SPEC can be enabled by either
359
360- placing (or linking) them into the `test-suite/test-suite-externals/xxx` directory (example: `test-suite/test-suite-externals/speccpu2000`)
361- using a configuration option such as `-D TEST_SUITE_SPEC2000_ROOT=path/to/speccpu2000`
362
363You can find further information in the respective README files such as
364`test-suite/External/SPEC/README`.
365
366For the SPEC benchmarks you can switch between the `test`, `train` and
367`ref` input datasets via the `TEST_SUITE_RUN_TYPE` configuration option.
368The `train` dataset is used by default.
369
370In addition to SPEC, the multimedia frameworks ffmpeg and dav1d can also
371be hooked up as external projects in the same way. By including them in
372llvm-test-suite, a lot more of potentially vectorizable code gets compiled
373- which can catch compiler bugs merely by triggering code generation asserts.
374Including them also adds small code correctness tests, that compare the
375output of the compiler generated functions against handwritten assembly
376functions. (On x86, building the assembly requires having the nasm tool
377available.) The integration into llvm-test-suite doesn't run the projects'
378full testsuites though. The projects also contain microbenchmarks for
379measuring the performance of some functions. See the `README.md` files in
380the respective `ffmpeg` and `dav1d` directories under
381`llvm-test-suite/External` for further details.
382
383
384Custom Suites
385-------------
386
387You can build custom suites using the test-suite infrastructure. A custom suite
388has a `CMakeLists.txt` file at the top directory. The `CMakeLists.txt` will be
389picked up automatically if placed into a subdirectory of the test-suite or when
390setting the `TEST_SUITE_SUBDIRS` variable:
391
392```bash
393% cmake -DTEST_SUITE_SUBDIRS=path/to/my/benchmark-suite ../test-suite
394```
395
396
397Profile Guided Optimization
398---------------------------
399
400Profile guided optimization requires to compile and run twice. First the
401benchmark should be compiled with profile generation instrumentation enabled
402and setup for training data. The lit runner will merge the profile files
403using `llvm-profdata` so they can be used by the second compilation run.
404
405Example:
406```bash
407# Profile generation run using LLVM IR PGO:
408% cmake -DTEST_SUITE_PROFILE_GENERATE=ON \
409        -DTEST_SUITE_USE_IR_PGO=ON \
410        -DTEST_SUITE_RUN_TYPE=train \
411        ../test-suite
412% make
413% llvm-lit .
414# Use the profile data for compilation and actual benchmark run:
415% cmake -DTEST_SUITE_PROFILE_GENERATE=OFF \
416        -DTEST_SUITE_PROFILE_USE=ON \
417        -DTEST_SUITE_RUN_TYPE=ref \
418        .
419% make
420% llvm-lit -o result.json .
421```
422
423To use Clang frontend's PGO instead of LLVM IR PGO, set `-DTEST_SUITE_USE_IR_PGO=OFF`.
424
425The `TEST_SUITE_RUN_TYPE` setting only affects the SPEC benchmark suites.
426
427
428Cross Compilation and External Devices
429--------------------------------------
430
431### Compilation
432
433CMake allows to cross compile to a different target via toolchain files. More
434information can be found here:
435
436- [https://llvm.org/docs/lnt/tests.html#cross-compiling](https://llvm.org/docs/lnt/tests.html#cross-compiling)
437
438- [https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html](https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html)
439
440Cross compilation from macOS to iOS is possible with the
441`test-suite/cmake/caches/target-target-*-iphoneos-internal.cmake` CMake cache
442files; this requires an internal iOS SDK.
443
444### Running
445
446There are two ways to run the tests in a cross compilation setting:
447
448- Via SSH connection to an external device: The `TEST_SUITE_REMOTE_HOST` option
449  should be set to the SSH hostname.  The executables and data files need to be
450  transferred to the device after compilation.  This is typically done via the
451  `rsync` make target.  After this, the lit runner can be used on the host
452  machine. It will prefix the benchmark and verification command lines with an
453  `ssh` command.
454
455  Example:
456
457  ```bash
458  % cmake -G Ninja -D CMAKE_C_COMPILER=path/to/clang \
459          -C ../test-suite/cmake/caches/target-arm64-iphoneos-internal.cmake \
460          -D CMAKE_BUILD_TYPE=Release \
461          -D TEST_SUITE_REMOTE_HOST=mydevice \
462          ../test-suite
463  % ninja
464  % ninja rsync
465  % llvm-lit -j1 -o result.json .
466  ```
467
468- You can specify a simulator for the target machine with the
469  `TEST_SUITE_RUN_UNDER` setting. The lit runner will prefix all benchmark
470  invocations with it.
471
472
473Running the test-suite via LNT
474------------------------------
475
476The LNT tool can run the test-suite. Use this when submitting test results to
477an LNT instance.  See
478[https://llvm.org/docs/lnt/tests.html#llvm-cmake-test-suite](https://llvm.org/docs/lnt/tests.html#llvm-cmake-test-suite)
479for details.
480
481Running the test-suite via Makefiles (deprecated)
482-------------------------------------------------
483
484**Note**: The test-suite comes with a set of Makefiles that are considered
485deprecated.  They do not support newer testing modes like `Bitcode` or
486`Microbenchmarks` and are harder to use.
487
488Old documentation is available in the
489[test-suite Makefile Guide](TestSuiteMakefileGuide).
490