1a290770fSMircea Trofin# User Guide 2a290770fSMircea Trofin 3a290770fSMircea Trofin## Command Line 4a290770fSMircea Trofin 5a290770fSMircea Trofin[Output Formats](#output-formats) 6a290770fSMircea Trofin 7a290770fSMircea Trofin[Output Files](#output-files) 8a290770fSMircea Trofin 9a290770fSMircea Trofin[Running Benchmarks](#running-benchmarks) 10a290770fSMircea Trofin 11a290770fSMircea Trofin[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks) 12a290770fSMircea Trofin 13a290770fSMircea Trofin[Result Comparison](#result-comparison) 14a290770fSMircea Trofin 15a290770fSMircea Trofin[Extra Context](#extra-context) 16a290770fSMircea Trofin 17a290770fSMircea Trofin## Library 18a290770fSMircea Trofin 19a290770fSMircea Trofin[Runtime and Reporting Considerations](#runtime-and-reporting-considerations) 20a290770fSMircea Trofin 21a290770fSMircea Trofin[Setup/Teardown](#setupteardown) 22a290770fSMircea Trofin 23a290770fSMircea Trofin[Passing Arguments](#passing-arguments) 24a290770fSMircea Trofin 25a290770fSMircea Trofin[Custom Benchmark Name](#custom-benchmark-name) 26a290770fSMircea Trofin 27a290770fSMircea Trofin[Calculating Asymptotic Complexity](#asymptotic-complexity) 28a290770fSMircea Trofin 29a290770fSMircea Trofin[Templated Benchmarks](#templated-benchmarks) 30a290770fSMircea Trofin 31*a5b79717SMircea Trofin[Templated Benchmarks that take arguments](#templated-benchmarks-with-arguments) 32*a5b79717SMircea Trofin 33a290770fSMircea Trofin[Fixtures](#fixtures) 34a290770fSMircea Trofin 35a290770fSMircea Trofin[Custom Counters](#custom-counters) 36a290770fSMircea Trofin 37a290770fSMircea Trofin[Multithreaded Benchmarks](#multithreaded-benchmarks) 38a290770fSMircea Trofin 39a290770fSMircea Trofin[CPU Timers](#cpu-timers) 40a290770fSMircea Trofin 41a290770fSMircea Trofin[Manual Timing](#manual-timing) 42a290770fSMircea Trofin 43a290770fSMircea Trofin[Setting the Time Unit](#setting-the-time-unit) 44a290770fSMircea Trofin 45a290770fSMircea Trofin[Random Interleaving](random_interleaving.md) 46a290770fSMircea Trofin 47a290770fSMircea Trofin[User-Requested Performance Counters](perf_counters.md) 48a290770fSMircea Trofin 49a290770fSMircea Trofin[Preventing Optimization](#preventing-optimization) 50a290770fSMircea Trofin 51a290770fSMircea Trofin[Reporting Statistics](#reporting-statistics) 52a290770fSMircea Trofin 53a290770fSMircea Trofin[Custom Statistics](#custom-statistics) 54a290770fSMircea Trofin 55*a5b79717SMircea Trofin[Memory Usage](#memory-usage) 56*a5b79717SMircea Trofin 57a290770fSMircea Trofin[Using RegisterBenchmark](#using-register-benchmark) 58a290770fSMircea Trofin 59a290770fSMircea Trofin[Exiting with an Error](#exiting-with-an-error) 60a290770fSMircea Trofin 61*a5b79717SMircea Trofin[A Faster `KeepRunning` Loop](#a-faster-keep-running-loop) 62*a5b79717SMircea Trofin 63*a5b79717SMircea Trofin## Benchmarking Tips 64a290770fSMircea Trofin 65a290770fSMircea Trofin[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling) 66a290770fSMircea Trofin 67*a5b79717SMircea Trofin[Reducing Variance in Benchmarks](reducing_variance.md) 68a290770fSMircea Trofin 69a290770fSMircea Trofin<a name="output-formats" /> 70a290770fSMircea Trofin 71a290770fSMircea Trofin## Output Formats 72a290770fSMircea Trofin 73a290770fSMircea TrofinThe library supports multiple output formats. Use the 74a290770fSMircea Trofin`--benchmark_format=<console|json|csv>` flag (or set the 75a290770fSMircea Trofin`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set 76a290770fSMircea Trofinthe format type. `console` is the default format. 77a290770fSMircea Trofin 78a290770fSMircea TrofinThe Console format is intended to be a human readable format. By default 79a290770fSMircea Trofinthe format generates color output. Context is output on stderr and the 80a290770fSMircea Trofintabular data on stdout. Example tabular output looks like: 81a290770fSMircea Trofin 82a290770fSMircea Trofin``` 83a290770fSMircea TrofinBenchmark Time(ns) CPU(ns) Iterations 84a290770fSMircea Trofin---------------------------------------------------------------------- 85a290770fSMircea TrofinBM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 86a290770fSMircea TrofinBM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 87a290770fSMircea TrofinBM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 88a290770fSMircea Trofin``` 89a290770fSMircea Trofin 90a290770fSMircea TrofinThe JSON format outputs human readable json split into two top level attributes. 91a290770fSMircea TrofinThe `context` attribute contains information about the run in general, including 92a290770fSMircea Trofininformation about the CPU and the date. 93a290770fSMircea TrofinThe `benchmarks` attribute contains a list of every benchmark run. Example json 94a290770fSMircea Trofinoutput looks like: 95a290770fSMircea Trofin 96a290770fSMircea Trofin```json 97a290770fSMircea Trofin{ 98a290770fSMircea Trofin "context": { 99a290770fSMircea Trofin "date": "2015/03/17-18:40:25", 100a290770fSMircea Trofin "num_cpus": 40, 101a290770fSMircea Trofin "mhz_per_cpu": 2801, 102a290770fSMircea Trofin "cpu_scaling_enabled": false, 103a290770fSMircea Trofin "build_type": "debug" 104a290770fSMircea Trofin }, 105a290770fSMircea Trofin "benchmarks": [ 106a290770fSMircea Trofin { 107a290770fSMircea Trofin "name": "BM_SetInsert/1024/1", 108a290770fSMircea Trofin "iterations": 94877, 109a290770fSMircea Trofin "real_time": 29275, 110a290770fSMircea Trofin "cpu_time": 29836, 111a290770fSMircea Trofin "bytes_per_second": 134066, 112a290770fSMircea Trofin "items_per_second": 33516 113a290770fSMircea Trofin }, 114a290770fSMircea Trofin { 115a290770fSMircea Trofin "name": "BM_SetInsert/1024/8", 116a290770fSMircea Trofin "iterations": 21609, 117a290770fSMircea Trofin "real_time": 32317, 118a290770fSMircea Trofin "cpu_time": 32429, 119a290770fSMircea Trofin "bytes_per_second": 986770, 120a290770fSMircea Trofin "items_per_second": 246693 121a290770fSMircea Trofin }, 122a290770fSMircea Trofin { 123a290770fSMircea Trofin "name": "BM_SetInsert/1024/10", 124a290770fSMircea Trofin "iterations": 21393, 125a290770fSMircea Trofin "real_time": 32724, 126a290770fSMircea Trofin "cpu_time": 33355, 127a290770fSMircea Trofin "bytes_per_second": 1199226, 128a290770fSMircea Trofin "items_per_second": 299807 129a290770fSMircea Trofin } 130a290770fSMircea Trofin ] 131a290770fSMircea Trofin} 132a290770fSMircea Trofin``` 133a290770fSMircea Trofin 134a290770fSMircea TrofinThe CSV format outputs comma-separated values. The `context` is output on stderr 135a290770fSMircea Trofinand the CSV itself on stdout. Example CSV output looks like: 136a290770fSMircea Trofin 137a290770fSMircea Trofin``` 138a290770fSMircea Trofinname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 139a290770fSMircea Trofin"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 140a290770fSMircea Trofin"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 141a290770fSMircea Trofin"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 142a290770fSMircea Trofin``` 143a290770fSMircea Trofin 144a290770fSMircea Trofin<a name="output-files" /> 145a290770fSMircea Trofin 146a290770fSMircea Trofin## Output Files 147a290770fSMircea Trofin 148a290770fSMircea TrofinWrite benchmark results to a file with the `--benchmark_out=<filename>` option 149a290770fSMircea Trofin(or set `BENCHMARK_OUT`). Specify the output format with 150a290770fSMircea Trofin`--benchmark_out_format={json|console|csv}` (or set 151a290770fSMircea Trofin`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is 152a290770fSMircea Trofindeprecated and the saved `.csv` file 153a290770fSMircea Trofin[is not parsable](https://github.com/google/benchmark/issues/794) by csv 154a290770fSMircea Trofinparsers. 155a290770fSMircea Trofin 156a290770fSMircea TrofinSpecifying `--benchmark_out` does not suppress the console output. 157a290770fSMircea Trofin 158a290770fSMircea Trofin<a name="running-benchmarks" /> 159a290770fSMircea Trofin 160a290770fSMircea Trofin## Running Benchmarks 161a290770fSMircea Trofin 162a290770fSMircea TrofinBenchmarks are executed by running the produced binaries. Benchmarks binaries, 163a290770fSMircea Trofinby default, accept options that may be specified either through their command 164a290770fSMircea Trofinline interface or by setting environment variables before execution. For every 165a290770fSMircea Trofin`--option_flag=<value>` CLI switch, a corresponding environment variable 166a290770fSMircea Trofin`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always 167a290770fSMircea Trofin prevails). A complete list of CLI options is available running benchmarks 168a290770fSMircea Trofin with the `--help` switch. 169a290770fSMircea Trofin 170a290770fSMircea Trofin<a name="running-a-subset-of-benchmarks" /> 171a290770fSMircea Trofin 172a290770fSMircea Trofin## Running a Subset of Benchmarks 173a290770fSMircea Trofin 174a290770fSMircea TrofinThe `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>` 175a290770fSMircea Trofinenvironment variable) can be used to only run the benchmarks that match 176a290770fSMircea Trofinthe specified `<regex>`. For example: 177a290770fSMircea Trofin 178a290770fSMircea Trofin```bash 179a290770fSMircea Trofin$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 180a290770fSMircea TrofinRun on (1 X 2300 MHz CPU ) 181a290770fSMircea Trofin2016-06-25 19:34:24 182a290770fSMircea TrofinBenchmark Time CPU Iterations 183a290770fSMircea Trofin---------------------------------------------------- 184a290770fSMircea TrofinBM_memcpy/32 11 ns 11 ns 79545455 185a290770fSMircea TrofinBM_memcpy/32k 2181 ns 2185 ns 324074 186a290770fSMircea TrofinBM_memcpy/32 12 ns 12 ns 54687500 187a290770fSMircea TrofinBM_memcpy/32k 1834 ns 1837 ns 357143 188a290770fSMircea Trofin``` 189a290770fSMircea Trofin 190*a5b79717SMircea Trofin## Disabling Benchmarks 191*a5b79717SMircea Trofin 192*a5b79717SMircea TrofinIt is possible to temporarily disable benchmarks by renaming the benchmark 193*a5b79717SMircea Trofinfunction to have the prefix "DISABLED_". This will cause the benchmark to 194*a5b79717SMircea Trofinbe skipped at runtime. 195*a5b79717SMircea Trofin 196a290770fSMircea Trofin<a name="result-comparison" /> 197a290770fSMircea Trofin 198a290770fSMircea Trofin## Result comparison 199a290770fSMircea Trofin 200a290770fSMircea TrofinIt is possible to compare the benchmarking results. 201a290770fSMircea TrofinSee [Additional Tooling Documentation](tools.md) 202a290770fSMircea Trofin 203a290770fSMircea Trofin<a name="extra-context" /> 204a290770fSMircea Trofin 205a290770fSMircea Trofin## Extra Context 206a290770fSMircea Trofin 207a290770fSMircea TrofinSometimes it's useful to add extra context to the content printed before the 208a290770fSMircea Trofinresults. By default this section includes information about the CPU on which 209a290770fSMircea Trofinthe benchmarks are running. If you do want to add more context, you can use 210a290770fSMircea Trofinthe `benchmark_context` command line flag: 211a290770fSMircea Trofin 212a290770fSMircea Trofin```bash 213a290770fSMircea Trofin$ ./run_benchmarks --benchmark_context=pwd=`pwd` 214a290770fSMircea TrofinRun on (1 x 2300 MHz CPU) 215a290770fSMircea Trofinpwd: /home/user/benchmark/ 216a290770fSMircea TrofinBenchmark Time CPU Iterations 217a290770fSMircea Trofin---------------------------------------------------- 218a290770fSMircea TrofinBM_memcpy/32 11 ns 11 ns 79545455 219a290770fSMircea TrofinBM_memcpy/32k 2181 ns 2185 ns 324074 220a290770fSMircea Trofin``` 221a290770fSMircea Trofin 222a290770fSMircea TrofinYou can get the same effect with the API: 223a290770fSMircea Trofin 224a290770fSMircea Trofin```c++ 225a290770fSMircea Trofin benchmark::AddCustomContext("foo", "bar"); 226a290770fSMircea Trofin``` 227a290770fSMircea Trofin 228a290770fSMircea TrofinNote that attempts to add a second value with the same key will fail with an 229a290770fSMircea Trofinerror message. 230a290770fSMircea Trofin 231a290770fSMircea Trofin<a name="runtime-and-reporting-considerations" /> 232a290770fSMircea Trofin 233a290770fSMircea Trofin## Runtime and Reporting Considerations 234a290770fSMircea Trofin 235a290770fSMircea TrofinWhen the benchmark binary is executed, each benchmark function is run serially. 236a290770fSMircea TrofinThe number of iterations to run is determined dynamically by running the 237a290770fSMircea Trofinbenchmark a few times and measuring the time taken and ensuring that the 238a290770fSMircea Trofinultimate result will be statistically stable. As such, faster benchmark 239a290770fSMircea Trofinfunctions will be run for more iterations than slower benchmark functions, and 240a290770fSMircea Trofinthe number of iterations is thus reported. 241a290770fSMircea Trofin 242a290770fSMircea TrofinIn all cases, the number of iterations for which the benchmark is run is 243a290770fSMircea Trofingoverned by the amount of time the benchmark takes. Concretely, the number of 244a290770fSMircea Trofiniterations is at least one, not more than 1e9, until CPU time is greater than 245a290770fSMircea Trofinthe minimum time, or the wallclock time is 5x minimum time. The minimum time is 246a290770fSMircea Trofinset per benchmark by calling `MinTime` on the registered benchmark object. 247a290770fSMircea Trofin 248*a5b79717SMircea TrofinFurthermore warming up a benchmark might be necessary in order to get 249*a5b79717SMircea Trofinstable results because of e.g caching effects of the code under benchmark. 250*a5b79717SMircea TrofinWarming up means running the benchmark a given amount of time, before 251*a5b79717SMircea Trofinresults are actually taken into account. The amount of time for which 252*a5b79717SMircea Trofinthe warmup should be run can be set per benchmark by calling 253*a5b79717SMircea Trofin`MinWarmUpTime` on the registered benchmark object or for all benchmarks 254*a5b79717SMircea Trofinusing the `--benchmark_min_warmup_time` command-line option. Note that 255*a5b79717SMircea Trofin`MinWarmUpTime` will overwrite the value of `--benchmark_min_warmup_time` 256*a5b79717SMircea Trofinfor the single benchmark. How many iterations the warmup run of each 257*a5b79717SMircea Trofinbenchmark takes is determined the same way as described in the paragraph 258*a5b79717SMircea Trofinabove. Per default the warmup phase is set to 0 seconds and is therefore 259*a5b79717SMircea Trofindisabled. 260*a5b79717SMircea Trofin 261a290770fSMircea TrofinAverage timings are then reported over the iterations run. If multiple 262a290770fSMircea Trofinrepetitions are requested using the `--benchmark_repetitions` command-line 263a290770fSMircea Trofinoption, or at registration time, the benchmark function will be run several 264a290770fSMircea Trofintimes and statistical results across these repetitions will also be reported. 265a290770fSMircea Trofin 266a290770fSMircea TrofinAs well as the per-benchmark entries, a preamble in the report will include 267a290770fSMircea Trofininformation about the machine on which the benchmarks are run. 268a290770fSMircea Trofin 269a290770fSMircea Trofin<a name="setup-teardown" /> 270a290770fSMircea Trofin 271a290770fSMircea Trofin## Setup/Teardown 272a290770fSMircea Trofin 273a290770fSMircea TrofinGlobal setup/teardown specific to each benchmark can be done by 274a290770fSMircea Trofinpassing a callback to Setup/Teardown: 275a290770fSMircea Trofin 276*a5b79717SMircea TrofinThe setup/teardown callbacks will be invoked once for each benchmark. If the 277*a5b79717SMircea Trofinbenchmark is multi-threaded (will run in k threads), they will be invoked 278*a5b79717SMircea Trofinexactly once before each run with k threads. 279*a5b79717SMircea Trofin 280*a5b79717SMircea TrofinIf the benchmark uses different size groups of threads, the above will be true 281*a5b79717SMircea Trofinfor each size group. 282a290770fSMircea Trofin 283a290770fSMircea TrofinEg., 284a290770fSMircea Trofin 285a290770fSMircea Trofin```c++ 286a290770fSMircea Trofinstatic void DoSetup(const benchmark::State& state) { 287a290770fSMircea Trofin} 288a290770fSMircea Trofin 289a290770fSMircea Trofinstatic void DoTeardown(const benchmark::State& state) { 290a290770fSMircea Trofin} 291a290770fSMircea Trofin 292a290770fSMircea Trofinstatic void BM_func(benchmark::State& state) {...} 293a290770fSMircea Trofin 294a290770fSMircea TrofinBENCHMARK(BM_func)->Arg(1)->Arg(3)->Threads(16)->Threads(32)->Setup(DoSetup)->Teardown(DoTeardown); 295a290770fSMircea Trofin 296a290770fSMircea Trofin``` 297a290770fSMircea Trofin 298a290770fSMircea TrofinIn this example, `DoSetup` and `DoTearDown` will be invoked 4 times each, 299a290770fSMircea Trofinspecifically, once for each of this family: 300a290770fSMircea Trofin - BM_func_Arg_1_Threads_16, BM_func_Arg_1_Threads_32 301a290770fSMircea Trofin - BM_func_Arg_3_Threads_16, BM_func_Arg_3_Threads_32 302a290770fSMircea Trofin 303a290770fSMircea Trofin<a name="passing-arguments" /> 304a290770fSMircea Trofin 305a290770fSMircea Trofin## Passing Arguments 306a290770fSMircea Trofin 307a290770fSMircea TrofinSometimes a family of benchmarks can be implemented with just one routine that 308a290770fSMircea Trofintakes an extra argument to specify which one of the family of benchmarks to 309a290770fSMircea Trofinrun. For example, the following code defines a family of benchmarks for 310a290770fSMircea Trofinmeasuring the speed of `memcpy()` calls of different lengths: 311a290770fSMircea Trofin 312a290770fSMircea Trofin```c++ 313a290770fSMircea Trofinstatic void BM_memcpy(benchmark::State& state) { 314a290770fSMircea Trofin char* src = new char[state.range(0)]; 315a290770fSMircea Trofin char* dst = new char[state.range(0)]; 316a290770fSMircea Trofin memset(src, 'x', state.range(0)); 317a290770fSMircea Trofin for (auto _ : state) 318a290770fSMircea Trofin memcpy(dst, src, state.range(0)); 319a290770fSMircea Trofin state.SetBytesProcessed(int64_t(state.iterations()) * 320a290770fSMircea Trofin int64_t(state.range(0))); 321a290770fSMircea Trofin delete[] src; 322a290770fSMircea Trofin delete[] dst; 323a290770fSMircea Trofin} 324*a5b79717SMircea TrofinBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(4<<10)->Arg(8<<10); 325a290770fSMircea Trofin``` 326a290770fSMircea Trofin 327a290770fSMircea TrofinThe preceding code is quite repetitive, and can be replaced with the following 328a290770fSMircea Trofinshort-hand. The following invocation will pick a few appropriate arguments in 329a290770fSMircea Trofinthe specified range and will generate a benchmark for each such argument. 330a290770fSMircea Trofin 331a290770fSMircea Trofin```c++ 332a290770fSMircea TrofinBENCHMARK(BM_memcpy)->Range(8, 8<<10); 333a290770fSMircea Trofin``` 334a290770fSMircea Trofin 335a290770fSMircea TrofinBy default the arguments in the range are generated in multiples of eight and 336a290770fSMircea Trofinthe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 337a290770fSMircea Trofinrange multiplier is changed to multiples of two. 338a290770fSMircea Trofin 339a290770fSMircea Trofin```c++ 340a290770fSMircea TrofinBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 341a290770fSMircea Trofin``` 342a290770fSMircea Trofin 343a290770fSMircea TrofinNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 344a290770fSMircea Trofin 345a290770fSMircea TrofinThe preceding code shows a method of defining a sparse range. The following 346a290770fSMircea Trofinexample shows a method of defining a dense range. It is then used to benchmark 347a290770fSMircea Trofinthe performance of `std::vector` initialization for uniformly increasing sizes. 348a290770fSMircea Trofin 349a290770fSMircea Trofin```c++ 350a290770fSMircea Trofinstatic void BM_DenseRange(benchmark::State& state) { 351a290770fSMircea Trofin for(auto _ : state) { 352a290770fSMircea Trofin std::vector<int> v(state.range(0), state.range(0)); 353*a5b79717SMircea Trofin auto data = v.data(); 354*a5b79717SMircea Trofin benchmark::DoNotOptimize(data); 355a290770fSMircea Trofin benchmark::ClobberMemory(); 356a290770fSMircea Trofin } 357a290770fSMircea Trofin} 358a290770fSMircea TrofinBENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128); 359a290770fSMircea Trofin``` 360a290770fSMircea Trofin 361a290770fSMircea TrofinNow arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ]. 362a290770fSMircea Trofin 363a290770fSMircea TrofinYou might have a benchmark that depends on two or more inputs. For example, the 364a290770fSMircea Trofinfollowing code defines a family of benchmarks for measuring the speed of set 365a290770fSMircea Trofininsertion. 366a290770fSMircea Trofin 367a290770fSMircea Trofin```c++ 368a290770fSMircea Trofinstatic void BM_SetInsert(benchmark::State& state) { 369a290770fSMircea Trofin std::set<int> data; 370a290770fSMircea Trofin for (auto _ : state) { 371a290770fSMircea Trofin state.PauseTiming(); 372a290770fSMircea Trofin data = ConstructRandomSet(state.range(0)); 373a290770fSMircea Trofin state.ResumeTiming(); 374a290770fSMircea Trofin for (int j = 0; j < state.range(1); ++j) 375a290770fSMircea Trofin data.insert(RandomNumber()); 376a290770fSMircea Trofin } 377a290770fSMircea Trofin} 378a290770fSMircea TrofinBENCHMARK(BM_SetInsert) 379a290770fSMircea Trofin ->Args({1<<10, 128}) 380a290770fSMircea Trofin ->Args({2<<10, 128}) 381a290770fSMircea Trofin ->Args({4<<10, 128}) 382a290770fSMircea Trofin ->Args({8<<10, 128}) 383a290770fSMircea Trofin ->Args({1<<10, 512}) 384a290770fSMircea Trofin ->Args({2<<10, 512}) 385a290770fSMircea Trofin ->Args({4<<10, 512}) 386a290770fSMircea Trofin ->Args({8<<10, 512}); 387a290770fSMircea Trofin``` 388a290770fSMircea Trofin 389a290770fSMircea TrofinThe preceding code is quite repetitive, and can be replaced with the following 390a290770fSMircea Trofinshort-hand. The following macro will pick a few appropriate arguments in the 391a290770fSMircea Trofinproduct of the two specified ranges and will generate a benchmark for each such 392a290770fSMircea Trofinpair. 393a290770fSMircea Trofin 394*a5b79717SMircea Trofin<!-- {% raw %} --> 395a290770fSMircea Trofin```c++ 396a290770fSMircea TrofinBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 397a290770fSMircea Trofin``` 398*a5b79717SMircea Trofin<!-- {% endraw %} --> 399a290770fSMircea Trofin 400a290770fSMircea TrofinSome benchmarks may require specific argument values that cannot be expressed 401a290770fSMircea Trofinwith `Ranges`. In this case, `ArgsProduct` offers the ability to generate a 402a290770fSMircea Trofinbenchmark input for each combination in the product of the supplied vectors. 403a290770fSMircea Trofin 404*a5b79717SMircea Trofin<!-- {% raw %} --> 405a290770fSMircea Trofin```c++ 406a290770fSMircea TrofinBENCHMARK(BM_SetInsert) 407a290770fSMircea Trofin ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}}) 408a290770fSMircea Trofin// would generate the same benchmark arguments as 409a290770fSMircea TrofinBENCHMARK(BM_SetInsert) 410a290770fSMircea Trofin ->Args({1<<10, 20}) 411a290770fSMircea Trofin ->Args({3<<10, 20}) 412a290770fSMircea Trofin ->Args({8<<10, 20}) 413a290770fSMircea Trofin ->Args({3<<10, 40}) 414a290770fSMircea Trofin ->Args({8<<10, 40}) 415a290770fSMircea Trofin ->Args({1<<10, 40}) 416a290770fSMircea Trofin ->Args({1<<10, 60}) 417a290770fSMircea Trofin ->Args({3<<10, 60}) 418a290770fSMircea Trofin ->Args({8<<10, 60}) 419a290770fSMircea Trofin ->Args({1<<10, 80}) 420a290770fSMircea Trofin ->Args({3<<10, 80}) 421a290770fSMircea Trofin ->Args({8<<10, 80}); 422a290770fSMircea Trofin``` 423*a5b79717SMircea Trofin<!-- {% endraw %} --> 424a290770fSMircea Trofin 425a290770fSMircea TrofinFor the most common scenarios, helper methods for creating a list of 426a290770fSMircea Trofinintegers for a given sparse or dense range are provided. 427a290770fSMircea Trofin 428a290770fSMircea Trofin```c++ 429a290770fSMircea TrofinBENCHMARK(BM_SetInsert) 430a290770fSMircea Trofin ->ArgsProduct({ 431a290770fSMircea Trofin benchmark::CreateRange(8, 128, /*multi=*/2), 432a290770fSMircea Trofin benchmark::CreateDenseRange(1, 4, /*step=*/1) 433a290770fSMircea Trofin }) 434a290770fSMircea Trofin// would generate the same benchmark arguments as 435a290770fSMircea TrofinBENCHMARK(BM_SetInsert) 436a290770fSMircea Trofin ->ArgsProduct({ 437a290770fSMircea Trofin {8, 16, 32, 64, 128}, 438a290770fSMircea Trofin {1, 2, 3, 4} 439a290770fSMircea Trofin }); 440a290770fSMircea Trofin``` 441a290770fSMircea Trofin 442a290770fSMircea TrofinFor more complex patterns of inputs, passing a custom function to `Apply` allows 443a290770fSMircea Trofinprogrammatic specification of an arbitrary set of arguments on which to run the 444a290770fSMircea Trofinbenchmark. The following example enumerates a dense range on one parameter, 445a290770fSMircea Trofinand a sparse range on the second. 446a290770fSMircea Trofin 447a290770fSMircea Trofin```c++ 448a290770fSMircea Trofinstatic void CustomArguments(benchmark::internal::Benchmark* b) { 449a290770fSMircea Trofin for (int i = 0; i <= 10; ++i) 450a290770fSMircea Trofin for (int j = 32; j <= 1024*1024; j *= 8) 451a290770fSMircea Trofin b->Args({i, j}); 452a290770fSMircea Trofin} 453a290770fSMircea TrofinBENCHMARK(BM_SetInsert)->Apply(CustomArguments); 454a290770fSMircea Trofin``` 455a290770fSMircea Trofin 456a290770fSMircea Trofin### Passing Arbitrary Arguments to a Benchmark 457a290770fSMircea Trofin 458a290770fSMircea TrofinIn C++11 it is possible to define a benchmark that takes an arbitrary number 459a290770fSMircea Trofinof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 460a290770fSMircea Trofinmacro creates a benchmark that invokes `func` with the `benchmark::State` as 461a290770fSMircea Trofinthe first argument followed by the specified `args...`. 462a290770fSMircea TrofinThe `test_case_name` is appended to the name of the benchmark and 463a290770fSMircea Trofinshould describe the values passed. 464a290770fSMircea Trofin 465a290770fSMircea Trofin```c++ 466*a5b79717SMircea Trofintemplate <class ...Args> 467*a5b79717SMircea Trofinvoid BM_takes_args(benchmark::State& state, Args&&... args) { 468*a5b79717SMircea Trofin auto args_tuple = std::make_tuple(std::move(args)...); 469*a5b79717SMircea Trofin for (auto _ : state) { 470*a5b79717SMircea Trofin std::cout << std::get<0>(args_tuple) << ": " << std::get<1>(args_tuple) 471*a5b79717SMircea Trofin << '\n'; 472a290770fSMircea Trofin [...] 473a290770fSMircea Trofin } 474*a5b79717SMircea Trofin} 475a290770fSMircea Trofin// Registers a benchmark named "BM_takes_args/int_string_test" that passes 476*a5b79717SMircea Trofin// the specified values to `args`. 477a290770fSMircea TrofinBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 478*a5b79717SMircea Trofin 479*a5b79717SMircea Trofin// Registers the same benchmark "BM_takes_args/int_test" that passes 480*a5b79717SMircea Trofin// the specified values to `args`. 481*a5b79717SMircea TrofinBENCHMARK_CAPTURE(BM_takes_args, int_test, 42, 43); 482a290770fSMircea Trofin``` 483a290770fSMircea Trofin 484a290770fSMircea TrofinNote that elements of `...args` may refer to global variables. Users should 485a290770fSMircea Trofinavoid modifying global state inside of a benchmark. 486a290770fSMircea Trofin 487a290770fSMircea Trofin<a name="asymptotic-complexity" /> 488a290770fSMircea Trofin 489a290770fSMircea Trofin## Calculating Asymptotic Complexity (Big O) 490a290770fSMircea Trofin 491a290770fSMircea TrofinAsymptotic complexity might be calculated for a family of benchmarks. The 492a290770fSMircea Trofinfollowing code will calculate the coefficient for the high-order term in the 493a290770fSMircea Trofinrunning time and the normalized root-mean square error of string comparison. 494a290770fSMircea Trofin 495a290770fSMircea Trofin```c++ 496a290770fSMircea Trofinstatic void BM_StringCompare(benchmark::State& state) { 497a290770fSMircea Trofin std::string s1(state.range(0), '-'); 498a290770fSMircea Trofin std::string s2(state.range(0), '-'); 499a290770fSMircea Trofin for (auto _ : state) { 500*a5b79717SMircea Trofin auto comparison_result = s1.compare(s2); 501*a5b79717SMircea Trofin benchmark::DoNotOptimize(comparison_result); 502a290770fSMircea Trofin } 503a290770fSMircea Trofin state.SetComplexityN(state.range(0)); 504a290770fSMircea Trofin} 505a290770fSMircea TrofinBENCHMARK(BM_StringCompare) 506a290770fSMircea Trofin ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 507a290770fSMircea Trofin``` 508a290770fSMircea Trofin 509a290770fSMircea TrofinAs shown in the following invocation, asymptotic complexity might also be 510a290770fSMircea Trofincalculated automatically. 511a290770fSMircea Trofin 512a290770fSMircea Trofin```c++ 513a290770fSMircea TrofinBENCHMARK(BM_StringCompare) 514a290770fSMircea Trofin ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 515a290770fSMircea Trofin``` 516a290770fSMircea Trofin 517a290770fSMircea TrofinThe following code will specify asymptotic complexity with a lambda function, 518a290770fSMircea Trofinthat might be used to customize high-order term calculation. 519a290770fSMircea Trofin 520a290770fSMircea Trofin```c++ 521a290770fSMircea TrofinBENCHMARK(BM_StringCompare)->RangeMultiplier(2) 522a290770fSMircea Trofin ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; }); 523a290770fSMircea Trofin``` 524a290770fSMircea Trofin 525a290770fSMircea Trofin<a name="custom-benchmark-name" /> 526a290770fSMircea Trofin 527a290770fSMircea Trofin## Custom Benchmark Name 528a290770fSMircea Trofin 529a290770fSMircea TrofinYou can change the benchmark's name as follows: 530a290770fSMircea Trofin 531a290770fSMircea Trofin```c++ 532a290770fSMircea TrofinBENCHMARK(BM_memcpy)->Name("memcpy")->RangeMultiplier(2)->Range(8, 8<<10); 533a290770fSMircea Trofin``` 534a290770fSMircea Trofin 535a290770fSMircea TrofinThe invocation will execute the benchmark as before using `BM_memcpy` but changes 536a290770fSMircea Trofinthe prefix in the report to `memcpy`. 537a290770fSMircea Trofin 538a290770fSMircea Trofin<a name="templated-benchmarks" /> 539a290770fSMircea Trofin 540a290770fSMircea Trofin## Templated Benchmarks 541a290770fSMircea Trofin 542a290770fSMircea TrofinThis example produces and consumes messages of size `sizeof(v)` `range_x` 543a290770fSMircea Trofintimes. It also outputs throughput in the absence of multiprogramming. 544a290770fSMircea Trofin 545a290770fSMircea Trofin```c++ 546a290770fSMircea Trofintemplate <class Q> void BM_Sequential(benchmark::State& state) { 547a290770fSMircea Trofin Q q; 548a290770fSMircea Trofin typename Q::value_type v; 549a290770fSMircea Trofin for (auto _ : state) { 550a290770fSMircea Trofin for (int i = state.range(0); i--; ) 551a290770fSMircea Trofin q.push(v); 552a290770fSMircea Trofin for (int e = state.range(0); e--; ) 553a290770fSMircea Trofin q.Wait(&v); 554a290770fSMircea Trofin } 555a290770fSMircea Trofin // actually messages, not bytes: 556a290770fSMircea Trofin state.SetBytesProcessed( 557a290770fSMircea Trofin static_cast<int64_t>(state.iterations())*state.range(0)); 558a290770fSMircea Trofin} 559a290770fSMircea Trofin// C++03 560a290770fSMircea TrofinBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 561a290770fSMircea Trofin 562a290770fSMircea Trofin// C++11 or newer, you can use the BENCHMARK macro with template parameters: 563a290770fSMircea TrofinBENCHMARK(BM_Sequential<WaitQueue<int>>)->Range(1<<0, 1<<10); 564a290770fSMircea Trofin 565a290770fSMircea Trofin``` 566a290770fSMircea Trofin 567a290770fSMircea TrofinThree macros are provided for adding benchmark templates. 568a290770fSMircea Trofin 569a290770fSMircea Trofin```c++ 570a290770fSMircea Trofin#ifdef BENCHMARK_HAS_CXX11 571a290770fSMircea Trofin#define BENCHMARK(func<...>) // Takes any number of parameters. 572a290770fSMircea Trofin#else // C++ < C++11 573a290770fSMircea Trofin#define BENCHMARK_TEMPLATE(func, arg1) 574a290770fSMircea Trofin#endif 575a290770fSMircea Trofin#define BENCHMARK_TEMPLATE1(func, arg1) 576a290770fSMircea Trofin#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 577a290770fSMircea Trofin``` 578a290770fSMircea Trofin 579*a5b79717SMircea Trofin<a name="templated-benchmarks-with-arguments" /> 580*a5b79717SMircea Trofin 581*a5b79717SMircea Trofin## Templated Benchmarks that take arguments 582*a5b79717SMircea Trofin 583*a5b79717SMircea TrofinSometimes there is a need to template benchmarks, and provide arguments to them. 584*a5b79717SMircea Trofin 585*a5b79717SMircea Trofin```c++ 586*a5b79717SMircea Trofintemplate <class Q> void BM_Sequential_With_Step(benchmark::State& state, int step) { 587*a5b79717SMircea Trofin Q q; 588*a5b79717SMircea Trofin typename Q::value_type v; 589*a5b79717SMircea Trofin for (auto _ : state) { 590*a5b79717SMircea Trofin for (int i = state.range(0); i-=step; ) 591*a5b79717SMircea Trofin q.push(v); 592*a5b79717SMircea Trofin for (int e = state.range(0); e-=step; ) 593*a5b79717SMircea Trofin q.Wait(&v); 594*a5b79717SMircea Trofin } 595*a5b79717SMircea Trofin // actually messages, not bytes: 596*a5b79717SMircea Trofin state.SetBytesProcessed( 597*a5b79717SMircea Trofin static_cast<int64_t>(state.iterations())*state.range(0)); 598*a5b79717SMircea Trofin} 599*a5b79717SMircea Trofin 600*a5b79717SMircea TrofinBENCHMARK_TEMPLATE1_CAPTURE(BM_Sequential, WaitQueue<int>, Step1, 1)->Range(1<<0, 1<<10); 601*a5b79717SMircea Trofin``` 602*a5b79717SMircea Trofin 603a290770fSMircea Trofin<a name="fixtures" /> 604a290770fSMircea Trofin 605a290770fSMircea Trofin## Fixtures 606a290770fSMircea Trofin 607a290770fSMircea TrofinFixture tests are created by first defining a type that derives from 608a290770fSMircea Trofin`::benchmark::Fixture` and then creating/registering the tests using the 609a290770fSMircea Trofinfollowing macros: 610a290770fSMircea Trofin 611a290770fSMircea Trofin* `BENCHMARK_F(ClassName, Method)` 612a290770fSMircea Trofin* `BENCHMARK_DEFINE_F(ClassName, Method)` 613a290770fSMircea Trofin* `BENCHMARK_REGISTER_F(ClassName, Method)` 614a290770fSMircea Trofin 615a290770fSMircea TrofinFor Example: 616a290770fSMircea Trofin 617a290770fSMircea Trofin```c++ 618a290770fSMircea Trofinclass MyFixture : public benchmark::Fixture { 619a290770fSMircea Trofinpublic: 620*a5b79717SMircea Trofin void SetUp(::benchmark::State& state) { 621a290770fSMircea Trofin } 622a290770fSMircea Trofin 623*a5b79717SMircea Trofin void TearDown(::benchmark::State& state) { 624a290770fSMircea Trofin } 625a290770fSMircea Trofin}; 626a290770fSMircea Trofin 627a290770fSMircea TrofinBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 628a290770fSMircea Trofin for (auto _ : st) { 629a290770fSMircea Trofin ... 630a290770fSMircea Trofin } 631a290770fSMircea Trofin} 632a290770fSMircea Trofin 633a290770fSMircea TrofinBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 634a290770fSMircea Trofin for (auto _ : st) { 635a290770fSMircea Trofin ... 636a290770fSMircea Trofin } 637a290770fSMircea Trofin} 638a290770fSMircea Trofin/* BarTest is NOT registered */ 639a290770fSMircea TrofinBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 640a290770fSMircea Trofin/* BarTest is now registered */ 641a290770fSMircea Trofin``` 642a290770fSMircea Trofin 643a290770fSMircea Trofin### Templated Fixtures 644a290770fSMircea Trofin 645a290770fSMircea TrofinAlso you can create templated fixture by using the following macros: 646a290770fSMircea Trofin 647a290770fSMircea Trofin* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 648a290770fSMircea Trofin* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 649a290770fSMircea Trofin 650a290770fSMircea TrofinFor example: 651a290770fSMircea Trofin 652a290770fSMircea Trofin```c++ 653a290770fSMircea Trofintemplate<typename T> 654a290770fSMircea Trofinclass MyFixture : public benchmark::Fixture {}; 655a290770fSMircea Trofin 656a290770fSMircea TrofinBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 657a290770fSMircea Trofin for (auto _ : st) { 658a290770fSMircea Trofin ... 659a290770fSMircea Trofin } 660a290770fSMircea Trofin} 661a290770fSMircea Trofin 662a290770fSMircea TrofinBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 663a290770fSMircea Trofin for (auto _ : st) { 664a290770fSMircea Trofin ... 665a290770fSMircea Trofin } 666a290770fSMircea Trofin} 667a290770fSMircea Trofin 668a290770fSMircea TrofinBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 669a290770fSMircea Trofin``` 670a290770fSMircea Trofin 671a290770fSMircea Trofin<a name="custom-counters" /> 672a290770fSMircea Trofin 673a290770fSMircea Trofin## Custom Counters 674a290770fSMircea Trofin 675a290770fSMircea TrofinYou can add your own counters with user-defined names. The example below 676a290770fSMircea Trofinwill add columns "Foo", "Bar" and "Baz" in its output: 677a290770fSMircea Trofin 678a290770fSMircea Trofin```c++ 679a290770fSMircea Trofinstatic void UserCountersExample1(benchmark::State& state) { 680a290770fSMircea Trofin double numFoos = 0, numBars = 0, numBazs = 0; 681a290770fSMircea Trofin for (auto _ : state) { 682a290770fSMircea Trofin // ... count Foo,Bar,Baz events 683a290770fSMircea Trofin } 684a290770fSMircea Trofin state.counters["Foo"] = numFoos; 685a290770fSMircea Trofin state.counters["Bar"] = numBars; 686a290770fSMircea Trofin state.counters["Baz"] = numBazs; 687a290770fSMircea Trofin} 688a290770fSMircea Trofin``` 689a290770fSMircea Trofin 690a290770fSMircea TrofinThe `state.counters` object is a `std::map` with `std::string` keys 691a290770fSMircea Trofinand `Counter` values. The latter is a `double`-like class, via an implicit 692a290770fSMircea Trofinconversion to `double&`. Thus you can use all of the standard arithmetic 693a290770fSMircea Trofinassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 694a290770fSMircea Trofin 695a290770fSMircea TrofinIn multithreaded benchmarks, each counter is set on the calling thread only. 696a290770fSMircea TrofinWhen the benchmark finishes, the counters from each thread will be summed; 697a290770fSMircea Trofinthe resulting sum is the value which will be shown for the benchmark. 698a290770fSMircea Trofin 699a290770fSMircea TrofinThe `Counter` constructor accepts three parameters: the value as a `double` 700a290770fSMircea Trofin; a bit flag which allows you to show counters as rates, and/or as per-thread 701a290770fSMircea Trofiniteration, and/or as per-thread averages, and/or iteration invariants, 702a290770fSMircea Trofinand/or finally inverting the result; and a flag specifying the 'unit' - i.e. 703a290770fSMircea Trofinis 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024 704a290770fSMircea Trofin(`benchmark::Counter::OneK::kIs1024`)? 705a290770fSMircea Trofin 706a290770fSMircea Trofin```c++ 707a290770fSMircea Trofin // sets a simple counter 708a290770fSMircea Trofin state.counters["Foo"] = numFoos; 709a290770fSMircea Trofin 710a290770fSMircea Trofin // Set the counter as a rate. It will be presented divided 711a290770fSMircea Trofin // by the duration of the benchmark. 712a290770fSMircea Trofin // Meaning: per one second, how many 'foo's are processed? 713a290770fSMircea Trofin state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 714a290770fSMircea Trofin 715a290770fSMircea Trofin // Set the counter as a rate. It will be presented divided 716a290770fSMircea Trofin // by the duration of the benchmark, and the result inverted. 717a290770fSMircea Trofin // Meaning: how many seconds it takes to process one 'foo'? 718a290770fSMircea Trofin state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert); 719a290770fSMircea Trofin 720a290770fSMircea Trofin // Set the counter as a thread-average quantity. It will 721a290770fSMircea Trofin // be presented divided by the number of threads. 722a290770fSMircea Trofin state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 723a290770fSMircea Trofin 724a290770fSMircea Trofin // There's also a combined flag: 725a290770fSMircea Trofin state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 726a290770fSMircea Trofin 727a290770fSMircea Trofin // This says that we process with the rate of state.range(0) bytes every iteration: 728a290770fSMircea Trofin state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); 729a290770fSMircea Trofin``` 730a290770fSMircea Trofin 731a290770fSMircea TrofinWhen you're compiling in C++11 mode or later you can use `insert()` with 732a290770fSMircea Trofin`std::initializer_list`: 733a290770fSMircea Trofin 734*a5b79717SMircea Trofin<!-- {% raw %} --> 735a290770fSMircea Trofin```c++ 736a290770fSMircea Trofin // With C++11, this can be done: 737a290770fSMircea Trofin state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 738a290770fSMircea Trofin // ... instead of: 739a290770fSMircea Trofin state.counters["Foo"] = numFoos; 740a290770fSMircea Trofin state.counters["Bar"] = numBars; 741a290770fSMircea Trofin state.counters["Baz"] = numBazs; 742a290770fSMircea Trofin``` 743*a5b79717SMircea Trofin<!-- {% endraw %} --> 744a290770fSMircea Trofin 745a290770fSMircea Trofin### Counter Reporting 746a290770fSMircea Trofin 747a290770fSMircea TrofinWhen using the console reporter, by default, user counters are printed at 748a290770fSMircea Trofinthe end after the table, the same way as ``bytes_processed`` and 749a290770fSMircea Trofin``items_processed``. This is best for cases in which there are few counters, 750a290770fSMircea Trofinor where there are only a couple of lines per benchmark. Here's an example of 751a290770fSMircea Trofinthe default output: 752a290770fSMircea Trofin 753a290770fSMircea Trofin``` 754a290770fSMircea Trofin------------------------------------------------------------------------------ 755a290770fSMircea TrofinBenchmark Time CPU Iterations UserCounters... 756a290770fSMircea Trofin------------------------------------------------------------------------------ 757a290770fSMircea TrofinBM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 758a290770fSMircea TrofinBM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 759a290770fSMircea TrofinBM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 760a290770fSMircea TrofinBM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 761a290770fSMircea TrofinBM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 762a290770fSMircea TrofinBM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 763a290770fSMircea TrofinBM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 764a290770fSMircea TrofinBM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 765a290770fSMircea TrofinBM_Factorial 26 ns 26 ns 26608979 40320 766a290770fSMircea TrofinBM_Factorial/real_time 26 ns 26 ns 26587936 40320 767a290770fSMircea TrofinBM_CalculatePiRange/1 16 ns 16 ns 45704255 0 768a290770fSMircea TrofinBM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 769a290770fSMircea TrofinBM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 770a290770fSMircea TrofinBM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 771a290770fSMircea Trofin``` 772a290770fSMircea Trofin 773a290770fSMircea TrofinIf this doesn't suit you, you can print each counter as a table column by 774a290770fSMircea Trofinpassing the flag `--benchmark_counters_tabular=true` to the benchmark 775a290770fSMircea Trofinapplication. This is best for cases in which there are a lot of counters, or 776a290770fSMircea Trofina lot of lines per individual benchmark. Note that this will trigger a 777a290770fSMircea Trofinreprinting of the table header any time the counter set changes between 778a290770fSMircea Trofinindividual benchmarks. Here's an example of corresponding output when 779a290770fSMircea Trofin`--benchmark_counters_tabular=true` is passed: 780a290770fSMircea Trofin 781a290770fSMircea Trofin``` 782a290770fSMircea Trofin--------------------------------------------------------------------------------------- 783a290770fSMircea TrofinBenchmark Time CPU Iterations Bar Bat Baz Foo 784a290770fSMircea Trofin--------------------------------------------------------------------------------------- 785a290770fSMircea TrofinBM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 786a290770fSMircea TrofinBM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 787a290770fSMircea TrofinBM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 788a290770fSMircea TrofinBM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 789a290770fSMircea TrofinBM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 790a290770fSMircea TrofinBM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 791a290770fSMircea TrofinBM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 792a290770fSMircea TrofinBM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 793a290770fSMircea Trofin-------------------------------------------------------------- 794a290770fSMircea TrofinBenchmark Time CPU Iterations 795a290770fSMircea Trofin-------------------------------------------------------------- 796a290770fSMircea TrofinBM_Factorial 26 ns 26 ns 26392245 40320 797a290770fSMircea TrofinBM_Factorial/real_time 26 ns 26 ns 26494107 40320 798a290770fSMircea TrofinBM_CalculatePiRange/1 15 ns 15 ns 45571597 0 799a290770fSMircea TrofinBM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 800a290770fSMircea TrofinBM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 801a290770fSMircea TrofinBM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 802a290770fSMircea TrofinBM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 803a290770fSMircea TrofinBM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 804a290770fSMircea TrofinBM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 805a290770fSMircea TrofinBM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 806a290770fSMircea TrofinBM_CalculatePi/threads:8 2255 ns 9943 ns 70936 807a290770fSMircea Trofin``` 808a290770fSMircea Trofin 809a290770fSMircea TrofinNote above the additional header printed when the benchmark changes from 810a290770fSMircea Trofin``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 811a290770fSMircea Trofinnot have the same counter set as ``BM_UserCounter``. 812a290770fSMircea Trofin 813a290770fSMircea Trofin<a name="multithreaded-benchmarks"/> 814a290770fSMircea Trofin 815a290770fSMircea Trofin## Multithreaded Benchmarks 816a290770fSMircea Trofin 817a290770fSMircea TrofinIn a multithreaded test (benchmark invoked by multiple threads simultaneously), 818a290770fSMircea Trofinit is guaranteed that none of the threads will start until all have reached 819a290770fSMircea Trofinthe start of the benchmark loop, and all will have finished before any thread 820a290770fSMircea Trofinexits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 821a290770fSMircea TrofinAPI) As such, any global setup or teardown can be wrapped in a check against the thread 822a290770fSMircea Trofinindex: 823a290770fSMircea Trofin 824a290770fSMircea Trofin```c++ 825a290770fSMircea Trofinstatic void BM_MultiThreaded(benchmark::State& state) { 826a290770fSMircea Trofin if (state.thread_index() == 0) { 827a290770fSMircea Trofin // Setup code here. 828a290770fSMircea Trofin } 829a290770fSMircea Trofin for (auto _ : state) { 830a290770fSMircea Trofin // Run the test as normal. 831a290770fSMircea Trofin } 832a290770fSMircea Trofin if (state.thread_index() == 0) { 833a290770fSMircea Trofin // Teardown code here. 834a290770fSMircea Trofin } 835a290770fSMircea Trofin} 836a290770fSMircea TrofinBENCHMARK(BM_MultiThreaded)->Threads(2); 837a290770fSMircea Trofin``` 838a290770fSMircea Trofin 839*a5b79717SMircea TrofinTo run the benchmark across a range of thread counts, instead of `Threads`, use 840*a5b79717SMircea Trofin`ThreadRange`. This takes two parameters (`min_threads` and `max_threads`) and 841*a5b79717SMircea Trofinruns the benchmark once for values in the inclusive range. For example: 842*a5b79717SMircea Trofin 843*a5b79717SMircea Trofin```c++ 844*a5b79717SMircea TrofinBENCHMARK(BM_MultiThreaded)->ThreadRange(1, 8); 845*a5b79717SMircea Trofin``` 846*a5b79717SMircea Trofin 847*a5b79717SMircea Trofinwill run `BM_MultiThreaded` with thread counts 1, 2, 4, and 8. 848*a5b79717SMircea Trofin 849a290770fSMircea TrofinIf the benchmarked code itself uses threads and you want to compare it to 850a290770fSMircea Trofinsingle-threaded code, you may want to use real-time ("wallclock") measurements 851a290770fSMircea Trofinfor latency comparisons: 852a290770fSMircea Trofin 853a290770fSMircea Trofin```c++ 854a290770fSMircea TrofinBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 855a290770fSMircea Trofin``` 856a290770fSMircea Trofin 857a290770fSMircea TrofinWithout `UseRealTime`, CPU time is used by default. 858a290770fSMircea Trofin 859a290770fSMircea Trofin<a name="cpu-timers" /> 860a290770fSMircea Trofin 861a290770fSMircea Trofin## CPU Timers 862a290770fSMircea Trofin 863a290770fSMircea TrofinBy default, the CPU timer only measures the time spent by the main thread. 864a290770fSMircea TrofinIf the benchmark itself uses threads internally, this measurement may not 865a290770fSMircea Trofinbe what you are looking for. Instead, there is a way to measure the total 866a290770fSMircea TrofinCPU usage of the process, by all the threads. 867a290770fSMircea Trofin 868a290770fSMircea Trofin```c++ 869a290770fSMircea Trofinvoid callee(int i); 870a290770fSMircea Trofin 871a290770fSMircea Trofinstatic void MyMain(int size) { 872a290770fSMircea Trofin#pragma omp parallel for 873a290770fSMircea Trofin for(int i = 0; i < size; i++) 874a290770fSMircea Trofin callee(i); 875a290770fSMircea Trofin} 876a290770fSMircea Trofin 877a290770fSMircea Trofinstatic void BM_OpenMP(benchmark::State& state) { 878a290770fSMircea Trofin for (auto _ : state) 879a290770fSMircea Trofin MyMain(state.range(0)); 880a290770fSMircea Trofin} 881a290770fSMircea Trofin 882a290770fSMircea Trofin// Measure the time spent by the main thread, use it to decide for how long to 883a290770fSMircea Trofin// run the benchmark loop. Depending on the internal implementation detail may 884a290770fSMircea Trofin// measure to anywhere from near-zero (the overhead spent before/after work 885a290770fSMircea Trofin// handoff to worker thread[s]) to the whole single-thread time. 886a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10); 887a290770fSMircea Trofin 888a290770fSMircea Trofin// Measure the user-visible time, the wall clock (literally, the time that 889a290770fSMircea Trofin// has passed on the clock on the wall), use it to decide for how long to 890*a5b79717SMircea Trofin// run the benchmark loop. This will always be meaningful, and will match the 891a290770fSMircea Trofin// time spent by the main thread in single-threaded case, in general decreasing 892a290770fSMircea Trofin// with the number of internal threads doing the work. 893a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime(); 894a290770fSMircea Trofin 895a290770fSMircea Trofin// Measure the total CPU consumption, use it to decide for how long to 896a290770fSMircea Trofin// run the benchmark loop. This will always measure to no less than the 897a290770fSMircea Trofin// time spent by the main thread in single-threaded case. 898a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime(); 899a290770fSMircea Trofin 900a290770fSMircea Trofin// A mixture of the last two. Measure the total CPU consumption, but use the 901a290770fSMircea Trofin// wall clock to decide for how long to run the benchmark loop. 902a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime(); 903a290770fSMircea Trofin``` 904a290770fSMircea Trofin 905a290770fSMircea Trofin### Controlling Timers 906a290770fSMircea Trofin 907a290770fSMircea TrofinNormally, the entire duration of the work loop (`for (auto _ : state) {}`) 908a290770fSMircea Trofinis measured. But sometimes, it is necessary to do some work inside of 909a290770fSMircea Trofinthat loop, every iteration, but without counting that time to the benchmark time. 910a290770fSMircea TrofinThat is possible, although it is not recommended, since it has high overhead. 911a290770fSMircea Trofin 912*a5b79717SMircea Trofin<!-- {% raw %} --> 913a290770fSMircea Trofin```c++ 914a290770fSMircea Trofinstatic void BM_SetInsert_With_Timer_Control(benchmark::State& state) { 915a290770fSMircea Trofin std::set<int> data; 916a290770fSMircea Trofin for (auto _ : state) { 917a290770fSMircea Trofin state.PauseTiming(); // Stop timers. They will not count until they are resumed. 918a290770fSMircea Trofin data = ConstructRandomSet(state.range(0)); // Do something that should not be measured 919a290770fSMircea Trofin state.ResumeTiming(); // And resume timers. They are now counting again. 920a290770fSMircea Trofin // The rest will be measured. 921a290770fSMircea Trofin for (int j = 0; j < state.range(1); ++j) 922a290770fSMircea Trofin data.insert(RandomNumber()); 923a290770fSMircea Trofin } 924a290770fSMircea Trofin} 925a290770fSMircea TrofinBENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}}); 926a290770fSMircea Trofin``` 927*a5b79717SMircea Trofin<!-- {% endraw %} --> 928a290770fSMircea Trofin 929a290770fSMircea Trofin<a name="manual-timing" /> 930a290770fSMircea Trofin 931a290770fSMircea Trofin## Manual Timing 932a290770fSMircea Trofin 933a290770fSMircea TrofinFor benchmarking something for which neither CPU time nor real-time are 934a290770fSMircea Trofincorrect or accurate enough, completely manual timing is supported using 935a290770fSMircea Trofinthe `UseManualTime` function. 936a290770fSMircea Trofin 937a290770fSMircea TrofinWhen `UseManualTime` is used, the benchmarked code must call 938a290770fSMircea Trofin`SetIterationTime` once per iteration of the benchmark loop to 939a290770fSMircea Trofinreport the manually measured time. 940a290770fSMircea Trofin 941a290770fSMircea TrofinAn example use case for this is benchmarking GPU execution (e.g. OpenCL 942a290770fSMircea Trofinor CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 943a290770fSMircea Trofinbe accurately measured using CPU time or real-time. Instead, they can be 944a290770fSMircea Trofinmeasured accurately using a dedicated API, and these measurement results 945a290770fSMircea Trofincan be reported back with `SetIterationTime`. 946a290770fSMircea Trofin 947a290770fSMircea Trofin```c++ 948a290770fSMircea Trofinstatic void BM_ManualTiming(benchmark::State& state) { 949a290770fSMircea Trofin int microseconds = state.range(0); 950a290770fSMircea Trofin std::chrono::duration<double, std::micro> sleep_duration { 951a290770fSMircea Trofin static_cast<double>(microseconds) 952a290770fSMircea Trofin }; 953a290770fSMircea Trofin 954a290770fSMircea Trofin for (auto _ : state) { 955a290770fSMircea Trofin auto start = std::chrono::high_resolution_clock::now(); 956a290770fSMircea Trofin // Simulate some useful workload with a sleep 957a290770fSMircea Trofin std::this_thread::sleep_for(sleep_duration); 958a290770fSMircea Trofin auto end = std::chrono::high_resolution_clock::now(); 959a290770fSMircea Trofin 960a290770fSMircea Trofin auto elapsed_seconds = 961a290770fSMircea Trofin std::chrono::duration_cast<std::chrono::duration<double>>( 962a290770fSMircea Trofin end - start); 963a290770fSMircea Trofin 964a290770fSMircea Trofin state.SetIterationTime(elapsed_seconds.count()); 965a290770fSMircea Trofin } 966a290770fSMircea Trofin} 967a290770fSMircea TrofinBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 968a290770fSMircea Trofin``` 969a290770fSMircea Trofin 970a290770fSMircea Trofin<a name="setting-the-time-unit" /> 971a290770fSMircea Trofin 972a290770fSMircea Trofin## Setting the Time Unit 973a290770fSMircea Trofin 974a290770fSMircea TrofinIf a benchmark runs a few milliseconds it may be hard to visually compare the 975a290770fSMircea Trofinmeasured times, since the output data is given in nanoseconds per default. In 976a290770fSMircea Trofinorder to manually set the time unit, you can specify it manually: 977a290770fSMircea Trofin 978a290770fSMircea Trofin```c++ 979a290770fSMircea TrofinBENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 980a290770fSMircea Trofin``` 981a290770fSMircea Trofin 982*a5b79717SMircea TrofinAdditionally the default time unit can be set globally with the 983*a5b79717SMircea Trofin`--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only 984*a5b79717SMircea Trofinaffects benchmarks where the time unit is not set explicitly. 985*a5b79717SMircea Trofin 986a290770fSMircea Trofin<a name="preventing-optimization" /> 987a290770fSMircea Trofin 988a290770fSMircea Trofin## Preventing Optimization 989a290770fSMircea Trofin 990a290770fSMircea TrofinTo prevent a value or expression from being optimized away by the compiler 991a290770fSMircea Trofinthe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 992a290770fSMircea Trofinfunctions can be used. 993a290770fSMircea Trofin 994a290770fSMircea Trofin```c++ 995a290770fSMircea Trofinstatic void BM_test(benchmark::State& state) { 996a290770fSMircea Trofin for (auto _ : state) { 997a290770fSMircea Trofin int x = 0; 998a290770fSMircea Trofin for (int i=0; i < 64; ++i) { 999a290770fSMircea Trofin benchmark::DoNotOptimize(x += i); 1000a290770fSMircea Trofin } 1001a290770fSMircea Trofin } 1002a290770fSMircea Trofin} 1003a290770fSMircea Trofin``` 1004a290770fSMircea Trofin 1005a290770fSMircea Trofin`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 1006a290770fSMircea Trofinmemory or a register. For GNU based compilers it acts as read/write barrier 1007a290770fSMircea Trofinfor global memory. More specifically it forces the compiler to flush pending 1008a290770fSMircea Trofinwrites to memory and reload any other values as necessary. 1009a290770fSMircea Trofin 1010a290770fSMircea TrofinNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 1011a290770fSMircea Trofinin any way. `<expr>` may even be removed entirely when the result is already 1012a290770fSMircea Trofinknown. For example: 1013a290770fSMircea Trofin 1014a290770fSMircea Trofin```c++ 1015a290770fSMircea Trofin /* Example 1: `<expr>` is removed entirely. */ 1016a290770fSMircea Trofin int foo(int x) { return x + 42; } 1017a290770fSMircea Trofin while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 1018a290770fSMircea Trofin 1019a290770fSMircea Trofin /* Example 2: Result of '<expr>' is only reused */ 1020a290770fSMircea Trofin int bar(int) __attribute__((const)); 1021a290770fSMircea Trofin while (...) DoNotOptimize(bar(0)); // Optimized to: 1022a290770fSMircea Trofin // int __result__ = bar(0); 1023a290770fSMircea Trofin // while (...) DoNotOptimize(__result__); 1024a290770fSMircea Trofin``` 1025a290770fSMircea Trofin 1026a290770fSMircea TrofinThe second tool for preventing optimizations is `ClobberMemory()`. In essence 1027a290770fSMircea Trofin`ClobberMemory()` forces the compiler to perform all pending writes to global 1028a290770fSMircea Trofinmemory. Memory managed by block scope objects must be "escaped" using 1029a290770fSMircea Trofin`DoNotOptimize(...)` before it can be clobbered. In the below example 1030a290770fSMircea Trofin`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 1031a290770fSMircea Trofinaway. 1032a290770fSMircea Trofin 1033a290770fSMircea Trofin```c++ 1034a290770fSMircea Trofinstatic void BM_vector_push_back(benchmark::State& state) { 1035a290770fSMircea Trofin for (auto _ : state) { 1036a290770fSMircea Trofin std::vector<int> v; 1037a290770fSMircea Trofin v.reserve(1); 1038*a5b79717SMircea Trofin auto data = v.data(); // Allow v.data() to be clobbered. Pass as non-const 1039*a5b79717SMircea Trofin benchmark::DoNotOptimize(data); // lvalue to avoid undesired compiler optimizations 1040a290770fSMircea Trofin v.push_back(42); 1041a290770fSMircea Trofin benchmark::ClobberMemory(); // Force 42 to be written to memory. 1042a290770fSMircea Trofin } 1043a290770fSMircea Trofin} 1044a290770fSMircea Trofin``` 1045a290770fSMircea Trofin 1046a290770fSMircea TrofinNote that `ClobberMemory()` is only available for GNU or MSVC based compilers. 1047a290770fSMircea Trofin 1048a290770fSMircea Trofin<a name="reporting-statistics" /> 1049a290770fSMircea Trofin 1050a290770fSMircea Trofin## Statistics: Reporting the Mean, Median and Standard Deviation / Coefficient of variation of Repeated Benchmarks 1051a290770fSMircea Trofin 1052a290770fSMircea TrofinBy default each benchmark is run once and that single result is reported. 1053a290770fSMircea TrofinHowever benchmarks are often noisy and a single result may not be representative 1054a290770fSMircea Trofinof the overall behavior. For this reason it's possible to repeatedly rerun the 1055a290770fSMircea Trofinbenchmark. 1056a290770fSMircea Trofin 1057a290770fSMircea TrofinThe number of runs of each benchmark is specified globally by the 1058a290770fSMircea Trofin`--benchmark_repetitions` flag or on a per benchmark basis by calling 1059a290770fSMircea Trofin`Repetitions` on the registered benchmark object. When a benchmark is run more 1060a290770fSMircea Trofinthan once the mean, median, standard deviation and coefficient of variation 1061a290770fSMircea Trofinof the runs will be reported. 1062a290770fSMircea Trofin 1063a290770fSMircea TrofinAdditionally the `--benchmark_report_aggregates_only={true|false}`, 1064a290770fSMircea Trofin`--benchmark_display_aggregates_only={true|false}` flags or 1065a290770fSMircea Trofin`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be 1066a290770fSMircea Trofinused to change how repeated tests are reported. By default the result of each 1067a290770fSMircea Trofinrepeated run is reported. When `report aggregates only` option is `true`, 1068a290770fSMircea Trofinonly the aggregates (i.e. mean, median, standard deviation and coefficient 1069a290770fSMircea Trofinof variation, maybe complexity measurements if they were requested) of the runs 1070a290770fSMircea Trofinis reported, to both the reporters - standard output (console), and the file. 1071a290770fSMircea TrofinHowever when only the `display aggregates only` option is `true`, 1072a290770fSMircea Trofinonly the aggregates are displayed in the standard output, while the file 1073a290770fSMircea Trofinoutput still contains everything. 1074a290770fSMircea TrofinCalling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a 1075a290770fSMircea Trofinregistered benchmark object overrides the value of the appropriate flag for that 1076a290770fSMircea Trofinbenchmark. 1077a290770fSMircea Trofin 1078a290770fSMircea Trofin<a name="custom-statistics" /> 1079a290770fSMircea Trofin 1080a290770fSMircea Trofin## Custom Statistics 1081a290770fSMircea Trofin 1082a290770fSMircea TrofinWhile having these aggregates is nice, this may not be enough for everyone. 1083a290770fSMircea TrofinFor example you may want to know what the largest observation is, e.g. because 1084a290770fSMircea Trofinyou have some real-time constraints. This is easy. The following code will 1085a290770fSMircea Trofinspecify a custom statistic to be calculated, defined by a lambda function. 1086a290770fSMircea Trofin 1087a290770fSMircea Trofin```c++ 1088a290770fSMircea Trofinvoid BM_spin_empty(benchmark::State& state) { 1089a290770fSMircea Trofin for (auto _ : state) { 1090a290770fSMircea Trofin for (int x = 0; x < state.range(0); ++x) { 1091a290770fSMircea Trofin benchmark::DoNotOptimize(x); 1092a290770fSMircea Trofin } 1093a290770fSMircea Trofin } 1094a290770fSMircea Trofin} 1095a290770fSMircea Trofin 1096a290770fSMircea TrofinBENCHMARK(BM_spin_empty) 1097a290770fSMircea Trofin ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 1098a290770fSMircea Trofin return *(std::max_element(std::begin(v), std::end(v))); 1099a290770fSMircea Trofin }) 1100a290770fSMircea Trofin ->Arg(512); 1101a290770fSMircea Trofin``` 1102a290770fSMircea Trofin 1103a290770fSMircea TrofinWhile usually the statistics produce values in time units, 1104a290770fSMircea Trofinyou can also produce percentages: 1105a290770fSMircea Trofin 1106a290770fSMircea Trofin```c++ 1107a290770fSMircea Trofinvoid BM_spin_empty(benchmark::State& state) { 1108a290770fSMircea Trofin for (auto _ : state) { 1109a290770fSMircea Trofin for (int x = 0; x < state.range(0); ++x) { 1110a290770fSMircea Trofin benchmark::DoNotOptimize(x); 1111a290770fSMircea Trofin } 1112a290770fSMircea Trofin } 1113a290770fSMircea Trofin} 1114a290770fSMircea Trofin 1115a290770fSMircea TrofinBENCHMARK(BM_spin_empty) 1116a290770fSMircea Trofin ->ComputeStatistics("ratio", [](const std::vector<double>& v) -> double { 1117a290770fSMircea Trofin return std::begin(v) / std::end(v); 1118*a5b79717SMircea Trofin }, benchmark::StatisticUnit::kPercentage) 1119a290770fSMircea Trofin ->Arg(512); 1120a290770fSMircea Trofin``` 1121a290770fSMircea Trofin 1122*a5b79717SMircea Trofin<a name="memory-usage" /> 1123*a5b79717SMircea Trofin 1124*a5b79717SMircea Trofin## Memory Usage 1125*a5b79717SMircea Trofin 1126*a5b79717SMircea TrofinIt's often useful to also track memory usage for benchmarks, alongside CPU 1127*a5b79717SMircea Trofinperformance. For this reason, benchmark offers the `RegisterMemoryManager` 1128*a5b79717SMircea Trofinmethod that allows a custom `MemoryManager` to be injected. 1129*a5b79717SMircea Trofin 1130*a5b79717SMircea TrofinIf set, the `MemoryManager::Start` and `MemoryManager::Stop` methods will be 1131*a5b79717SMircea Trofincalled at the start and end of benchmark runs to allow user code to fill out 1132*a5b79717SMircea Trofina report on the number of allocations, bytes used, etc. 1133*a5b79717SMircea Trofin 1134*a5b79717SMircea TrofinThis data will then be reported alongside other performance data, currently 1135*a5b79717SMircea Trofinonly when using JSON output. 1136*a5b79717SMircea Trofin 1137a290770fSMircea Trofin<a name="using-register-benchmark" /> 1138a290770fSMircea Trofin 1139a290770fSMircea Trofin## Using RegisterBenchmark(name, fn, args...) 1140a290770fSMircea Trofin 1141a290770fSMircea TrofinThe `RegisterBenchmark(name, func, args...)` function provides an alternative 1142a290770fSMircea Trofinway to create and register benchmarks. 1143a290770fSMircea Trofin`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 1144a290770fSMircea Trofinpointer to a new benchmark with the specified `name` that invokes 1145a290770fSMircea Trofin`func(st, args...)` where `st` is a `benchmark::State` object. 1146a290770fSMircea Trofin 1147a290770fSMircea TrofinUnlike the `BENCHMARK` registration macros, which can only be used at the global 1148a290770fSMircea Trofinscope, the `RegisterBenchmark` can be called anywhere. This allows for 1149a290770fSMircea Trofinbenchmark tests to be registered programmatically. 1150a290770fSMircea Trofin 1151a290770fSMircea TrofinAdditionally `RegisterBenchmark` allows any callable object to be registered 1152a290770fSMircea Trofinas a benchmark. Including capturing lambdas and function objects. 1153a290770fSMircea Trofin 1154a290770fSMircea TrofinFor Example: 1155a290770fSMircea Trofin```c++ 1156a290770fSMircea Trofinauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 1157a290770fSMircea Trofin 1158a290770fSMircea Trofinint main(int argc, char** argv) { 1159a290770fSMircea Trofin for (auto& test_input : { /* ... */ }) 1160a290770fSMircea Trofin benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 1161a290770fSMircea Trofin benchmark::Initialize(&argc, argv); 1162a290770fSMircea Trofin benchmark::RunSpecifiedBenchmarks(); 1163a290770fSMircea Trofin benchmark::Shutdown(); 1164a290770fSMircea Trofin} 1165a290770fSMircea Trofin``` 1166a290770fSMircea Trofin 1167a290770fSMircea Trofin<a name="exiting-with-an-error" /> 1168a290770fSMircea Trofin 1169a290770fSMircea Trofin## Exiting with an Error 1170a290770fSMircea Trofin 1171a290770fSMircea TrofinWhen errors caused by external influences, such as file I/O and network 1172a290770fSMircea Trofincommunication, occur within a benchmark the 1173*a5b79717SMircea Trofin`State::SkipWithError(const std::string& msg)` function can be used to skip that run 1174a290770fSMircea Trofinof benchmark and report the error. Note that only future iterations of the 1175a290770fSMircea Trofin`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 1176a290770fSMircea TrofinUsers must explicitly exit the loop, otherwise all iterations will be performed. 1177a290770fSMircea TrofinUsers may explicitly return to exit the benchmark immediately. 1178a290770fSMircea Trofin 1179a290770fSMircea TrofinThe `SkipWithError(...)` function may be used at any point within the benchmark, 1180a290770fSMircea Trofinincluding before and after the benchmark loop. Moreover, if `SkipWithError(...)` 1181a290770fSMircea Trofinhas been used, it is not required to reach the benchmark loop and one may return 1182a290770fSMircea Trofinfrom the benchmark function early. 1183a290770fSMircea Trofin 1184a290770fSMircea TrofinFor example: 1185a290770fSMircea Trofin 1186a290770fSMircea Trofin```c++ 1187a290770fSMircea Trofinstatic void BM_test(benchmark::State& state) { 1188a290770fSMircea Trofin auto resource = GetResource(); 1189a290770fSMircea Trofin if (!resource.good()) { 1190a290770fSMircea Trofin state.SkipWithError("Resource is not good!"); 1191a290770fSMircea Trofin // KeepRunning() loop will not be entered. 1192a290770fSMircea Trofin } 1193a290770fSMircea Trofin while (state.KeepRunning()) { 1194a290770fSMircea Trofin auto data = resource.read_data(); 1195a290770fSMircea Trofin if (!resource.good()) { 1196a290770fSMircea Trofin state.SkipWithError("Failed to read data!"); 1197a290770fSMircea Trofin break; // Needed to skip the rest of the iteration. 1198a290770fSMircea Trofin } 1199a290770fSMircea Trofin do_stuff(data); 1200a290770fSMircea Trofin } 1201a290770fSMircea Trofin} 1202a290770fSMircea Trofin 1203a290770fSMircea Trofinstatic void BM_test_ranged_fo(benchmark::State & state) { 1204a290770fSMircea Trofin auto resource = GetResource(); 1205a290770fSMircea Trofin if (!resource.good()) { 1206a290770fSMircea Trofin state.SkipWithError("Resource is not good!"); 1207a290770fSMircea Trofin return; // Early return is allowed when SkipWithError() has been used. 1208a290770fSMircea Trofin } 1209a290770fSMircea Trofin for (auto _ : state) { 1210a290770fSMircea Trofin auto data = resource.read_data(); 1211a290770fSMircea Trofin if (!resource.good()) { 1212a290770fSMircea Trofin state.SkipWithError("Failed to read data!"); 1213a290770fSMircea Trofin break; // REQUIRED to prevent all further iterations. 1214a290770fSMircea Trofin } 1215a290770fSMircea Trofin do_stuff(data); 1216a290770fSMircea Trofin } 1217a290770fSMircea Trofin} 1218a290770fSMircea Trofin``` 1219a290770fSMircea Trofin<a name="a-faster-keep-running-loop" /> 1220a290770fSMircea Trofin 1221a290770fSMircea Trofin## A Faster KeepRunning Loop 1222a290770fSMircea Trofin 1223a290770fSMircea TrofinIn C++11 mode, a ranged-based for loop should be used in preference to 1224a290770fSMircea Trofinthe `KeepRunning` loop for running the benchmarks. For example: 1225a290770fSMircea Trofin 1226a290770fSMircea Trofin```c++ 1227a290770fSMircea Trofinstatic void BM_Fast(benchmark::State &state) { 1228a290770fSMircea Trofin for (auto _ : state) { 1229a290770fSMircea Trofin FastOperation(); 1230a290770fSMircea Trofin } 1231a290770fSMircea Trofin} 1232a290770fSMircea TrofinBENCHMARK(BM_Fast); 1233a290770fSMircea Trofin``` 1234a290770fSMircea Trofin 1235a290770fSMircea TrofinThe reason the ranged-for loop is faster than using `KeepRunning`, is 1236a290770fSMircea Trofinbecause `KeepRunning` requires a memory load and store of the iteration count 1237a290770fSMircea Trofinever iteration, whereas the ranged-for variant is able to keep the iteration count 1238a290770fSMircea Trofinin a register. 1239a290770fSMircea Trofin 1240a290770fSMircea TrofinFor example, an empty inner loop of using the ranged-based for method looks like: 1241a290770fSMircea Trofin 1242a290770fSMircea Trofin```asm 1243a290770fSMircea Trofin# Loop Init 1244a290770fSMircea Trofin mov rbx, qword ptr [r14 + 104] 1245a290770fSMircea Trofin call benchmark::State::StartKeepRunning() 1246a290770fSMircea Trofin test rbx, rbx 1247a290770fSMircea Trofin je .LoopEnd 1248a290770fSMircea Trofin.LoopHeader: # =>This Inner Loop Header: Depth=1 1249a290770fSMircea Trofin add rbx, -1 1250a290770fSMircea Trofin jne .LoopHeader 1251a290770fSMircea Trofin.LoopEnd: 1252a290770fSMircea Trofin``` 1253a290770fSMircea Trofin 1254a290770fSMircea TrofinCompared to an empty `KeepRunning` loop, which looks like: 1255a290770fSMircea Trofin 1256a290770fSMircea Trofin```asm 1257a290770fSMircea Trofin.LoopHeader: # in Loop: Header=BB0_3 Depth=1 1258a290770fSMircea Trofin cmp byte ptr [rbx], 1 1259a290770fSMircea Trofin jne .LoopInit 1260a290770fSMircea Trofin.LoopBody: # =>This Inner Loop Header: Depth=1 1261a290770fSMircea Trofin mov rax, qword ptr [rbx + 8] 1262a290770fSMircea Trofin lea rcx, [rax + 1] 1263a290770fSMircea Trofin mov qword ptr [rbx + 8], rcx 1264a290770fSMircea Trofin cmp rax, qword ptr [rbx + 104] 1265a290770fSMircea Trofin jb .LoopHeader 1266a290770fSMircea Trofin jmp .LoopEnd 1267a290770fSMircea Trofin.LoopInit: 1268a290770fSMircea Trofin mov rdi, rbx 1269a290770fSMircea Trofin call benchmark::State::StartKeepRunning() 1270a290770fSMircea Trofin jmp .LoopBody 1271a290770fSMircea Trofin.LoopEnd: 1272a290770fSMircea Trofin``` 1273a290770fSMircea Trofin 1274a290770fSMircea TrofinUnless C++03 compatibility is required, the ranged-for variant of writing 1275a290770fSMircea Trofinthe benchmark loop should be preferred. 1276a290770fSMircea Trofin 1277a290770fSMircea Trofin<a name="disabling-cpu-frequency-scaling" /> 1278a290770fSMircea Trofin 1279a290770fSMircea Trofin## Disabling CPU Frequency Scaling 1280a290770fSMircea Trofin 1281a290770fSMircea TrofinIf you see this error: 1282a290770fSMircea Trofin 1283a290770fSMircea Trofin``` 1284*a5b79717SMircea Trofin***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 1285*a5b79717SMircea Trofinbe noisy and will incur extra overhead. 1286a290770fSMircea Trofin``` 1287a290770fSMircea Trofin 1288*a5b79717SMircea Trofinyou might want to disable the CPU frequency scaling while running the 1289*a5b79717SMircea Trofinbenchmark, as well as consider other ways to stabilize the performance of 1290*a5b79717SMircea Trofinyour system while benchmarking. 1291a290770fSMircea Trofin 1292*a5b79717SMircea TrofinSee [Reducing Variance](reducing_variance.md) for more information. 1293