xref: /llvm-project/third-party/benchmark/docs/user_guide.md (revision a5b797172cc902db166e9a695716fb81405f86e4)
1a290770fSMircea Trofin# User Guide
2a290770fSMircea Trofin
3a290770fSMircea Trofin## Command Line
4a290770fSMircea Trofin
5a290770fSMircea Trofin[Output Formats](#output-formats)
6a290770fSMircea Trofin
7a290770fSMircea Trofin[Output Files](#output-files)
8a290770fSMircea Trofin
9a290770fSMircea Trofin[Running Benchmarks](#running-benchmarks)
10a290770fSMircea Trofin
11a290770fSMircea Trofin[Running a Subset of Benchmarks](#running-a-subset-of-benchmarks)
12a290770fSMircea Trofin
13a290770fSMircea Trofin[Result Comparison](#result-comparison)
14a290770fSMircea Trofin
15a290770fSMircea Trofin[Extra Context](#extra-context)
16a290770fSMircea Trofin
17a290770fSMircea Trofin## Library
18a290770fSMircea Trofin
19a290770fSMircea Trofin[Runtime and Reporting Considerations](#runtime-and-reporting-considerations)
20a290770fSMircea Trofin
21a290770fSMircea Trofin[Setup/Teardown](#setupteardown)
22a290770fSMircea Trofin
23a290770fSMircea Trofin[Passing Arguments](#passing-arguments)
24a290770fSMircea Trofin
25a290770fSMircea Trofin[Custom Benchmark Name](#custom-benchmark-name)
26a290770fSMircea Trofin
27a290770fSMircea Trofin[Calculating Asymptotic Complexity](#asymptotic-complexity)
28a290770fSMircea Trofin
29a290770fSMircea Trofin[Templated Benchmarks](#templated-benchmarks)
30a290770fSMircea Trofin
31*a5b79717SMircea Trofin[Templated Benchmarks that take arguments](#templated-benchmarks-with-arguments)
32*a5b79717SMircea Trofin
33a290770fSMircea Trofin[Fixtures](#fixtures)
34a290770fSMircea Trofin
35a290770fSMircea Trofin[Custom Counters](#custom-counters)
36a290770fSMircea Trofin
37a290770fSMircea Trofin[Multithreaded Benchmarks](#multithreaded-benchmarks)
38a290770fSMircea Trofin
39a290770fSMircea Trofin[CPU Timers](#cpu-timers)
40a290770fSMircea Trofin
41a290770fSMircea Trofin[Manual Timing](#manual-timing)
42a290770fSMircea Trofin
43a290770fSMircea Trofin[Setting the Time Unit](#setting-the-time-unit)
44a290770fSMircea Trofin
45a290770fSMircea Trofin[Random Interleaving](random_interleaving.md)
46a290770fSMircea Trofin
47a290770fSMircea Trofin[User-Requested Performance Counters](perf_counters.md)
48a290770fSMircea Trofin
49a290770fSMircea Trofin[Preventing Optimization](#preventing-optimization)
50a290770fSMircea Trofin
51a290770fSMircea Trofin[Reporting Statistics](#reporting-statistics)
52a290770fSMircea Trofin
53a290770fSMircea Trofin[Custom Statistics](#custom-statistics)
54a290770fSMircea Trofin
55*a5b79717SMircea Trofin[Memory Usage](#memory-usage)
56*a5b79717SMircea Trofin
57a290770fSMircea Trofin[Using RegisterBenchmark](#using-register-benchmark)
58a290770fSMircea Trofin
59a290770fSMircea Trofin[Exiting with an Error](#exiting-with-an-error)
60a290770fSMircea Trofin
61*a5b79717SMircea Trofin[A Faster `KeepRunning` Loop](#a-faster-keep-running-loop)
62*a5b79717SMircea Trofin
63*a5b79717SMircea Trofin## Benchmarking Tips
64a290770fSMircea Trofin
65a290770fSMircea Trofin[Disabling CPU Frequency Scaling](#disabling-cpu-frequency-scaling)
66a290770fSMircea Trofin
67*a5b79717SMircea Trofin[Reducing Variance in Benchmarks](reducing_variance.md)
68a290770fSMircea Trofin
69a290770fSMircea Trofin<a name="output-formats" />
70a290770fSMircea Trofin
71a290770fSMircea Trofin## Output Formats
72a290770fSMircea Trofin
73a290770fSMircea TrofinThe library supports multiple output formats. Use the
74a290770fSMircea Trofin`--benchmark_format=<console|json|csv>` flag (or set the
75a290770fSMircea Trofin`BENCHMARK_FORMAT=<console|json|csv>` environment variable) to set
76a290770fSMircea Trofinthe format type. `console` is the default format.
77a290770fSMircea Trofin
78a290770fSMircea TrofinThe Console format is intended to be a human readable format. By default
79a290770fSMircea Trofinthe format generates color output. Context is output on stderr and the
80a290770fSMircea Trofintabular data on stdout. Example tabular output looks like:
81a290770fSMircea Trofin
82a290770fSMircea Trofin```
83a290770fSMircea TrofinBenchmark                               Time(ns)    CPU(ns) Iterations
84a290770fSMircea Trofin----------------------------------------------------------------------
85a290770fSMircea TrofinBM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
86a290770fSMircea TrofinBM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
87a290770fSMircea TrofinBM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
88a290770fSMircea Trofin```
89a290770fSMircea Trofin
90a290770fSMircea TrofinThe JSON format outputs human readable json split into two top level attributes.
91a290770fSMircea TrofinThe `context` attribute contains information about the run in general, including
92a290770fSMircea Trofininformation about the CPU and the date.
93a290770fSMircea TrofinThe `benchmarks` attribute contains a list of every benchmark run. Example json
94a290770fSMircea Trofinoutput looks like:
95a290770fSMircea Trofin
96a290770fSMircea Trofin```json
97a290770fSMircea Trofin{
98a290770fSMircea Trofin  "context": {
99a290770fSMircea Trofin    "date": "2015/03/17-18:40:25",
100a290770fSMircea Trofin    "num_cpus": 40,
101a290770fSMircea Trofin    "mhz_per_cpu": 2801,
102a290770fSMircea Trofin    "cpu_scaling_enabled": false,
103a290770fSMircea Trofin    "build_type": "debug"
104a290770fSMircea Trofin  },
105a290770fSMircea Trofin  "benchmarks": [
106a290770fSMircea Trofin    {
107a290770fSMircea Trofin      "name": "BM_SetInsert/1024/1",
108a290770fSMircea Trofin      "iterations": 94877,
109a290770fSMircea Trofin      "real_time": 29275,
110a290770fSMircea Trofin      "cpu_time": 29836,
111a290770fSMircea Trofin      "bytes_per_second": 134066,
112a290770fSMircea Trofin      "items_per_second": 33516
113a290770fSMircea Trofin    },
114a290770fSMircea Trofin    {
115a290770fSMircea Trofin      "name": "BM_SetInsert/1024/8",
116a290770fSMircea Trofin      "iterations": 21609,
117a290770fSMircea Trofin      "real_time": 32317,
118a290770fSMircea Trofin      "cpu_time": 32429,
119a290770fSMircea Trofin      "bytes_per_second": 986770,
120a290770fSMircea Trofin      "items_per_second": 246693
121a290770fSMircea Trofin    },
122a290770fSMircea Trofin    {
123a290770fSMircea Trofin      "name": "BM_SetInsert/1024/10",
124a290770fSMircea Trofin      "iterations": 21393,
125a290770fSMircea Trofin      "real_time": 32724,
126a290770fSMircea Trofin      "cpu_time": 33355,
127a290770fSMircea Trofin      "bytes_per_second": 1199226,
128a290770fSMircea Trofin      "items_per_second": 299807
129a290770fSMircea Trofin    }
130a290770fSMircea Trofin  ]
131a290770fSMircea Trofin}
132a290770fSMircea Trofin```
133a290770fSMircea Trofin
134a290770fSMircea TrofinThe CSV format outputs comma-separated values. The `context` is output on stderr
135a290770fSMircea Trofinand the CSV itself on stdout. Example CSV output looks like:
136a290770fSMircea Trofin
137a290770fSMircea Trofin```
138a290770fSMircea Trofinname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
139a290770fSMircea Trofin"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
140a290770fSMircea Trofin"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
141a290770fSMircea Trofin"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
142a290770fSMircea Trofin```
143a290770fSMircea Trofin
144a290770fSMircea Trofin<a name="output-files" />
145a290770fSMircea Trofin
146a290770fSMircea Trofin## Output Files
147a290770fSMircea Trofin
148a290770fSMircea TrofinWrite benchmark results to a file with the `--benchmark_out=<filename>` option
149a290770fSMircea Trofin(or set `BENCHMARK_OUT`). Specify the output format with
150a290770fSMircea Trofin`--benchmark_out_format={json|console|csv}` (or set
151a290770fSMircea Trofin`BENCHMARK_OUT_FORMAT={json|console|csv}`). Note that the 'csv' reporter is
152a290770fSMircea Trofindeprecated and the saved `.csv` file
153a290770fSMircea Trofin[is not parsable](https://github.com/google/benchmark/issues/794) by csv
154a290770fSMircea Trofinparsers.
155a290770fSMircea Trofin
156a290770fSMircea TrofinSpecifying `--benchmark_out` does not suppress the console output.
157a290770fSMircea Trofin
158a290770fSMircea Trofin<a name="running-benchmarks" />
159a290770fSMircea Trofin
160a290770fSMircea Trofin## Running Benchmarks
161a290770fSMircea Trofin
162a290770fSMircea TrofinBenchmarks are executed by running the produced binaries. Benchmarks binaries,
163a290770fSMircea Trofinby default, accept options that may be specified either through their command
164a290770fSMircea Trofinline interface or by setting environment variables before execution. For every
165a290770fSMircea Trofin`--option_flag=<value>` CLI switch, a corresponding environment variable
166a290770fSMircea Trofin`OPTION_FLAG=<value>` exist and is used as default if set (CLI switches always
167a290770fSMircea Trofin prevails). A complete list of CLI options is available running benchmarks
168a290770fSMircea Trofin with the `--help` switch.
169a290770fSMircea Trofin
170a290770fSMircea Trofin<a name="running-a-subset-of-benchmarks" />
171a290770fSMircea Trofin
172a290770fSMircea Trofin## Running a Subset of Benchmarks
173a290770fSMircea Trofin
174a290770fSMircea TrofinThe `--benchmark_filter=<regex>` option (or `BENCHMARK_FILTER=<regex>`
175a290770fSMircea Trofinenvironment variable) can be used to only run the benchmarks that match
176a290770fSMircea Trofinthe specified `<regex>`. For example:
177a290770fSMircea Trofin
178a290770fSMircea Trofin```bash
179a290770fSMircea Trofin$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
180a290770fSMircea TrofinRun on (1 X 2300 MHz CPU )
181a290770fSMircea Trofin2016-06-25 19:34:24
182a290770fSMircea TrofinBenchmark              Time           CPU Iterations
183a290770fSMircea Trofin----------------------------------------------------
184a290770fSMircea TrofinBM_memcpy/32          11 ns         11 ns   79545455
185a290770fSMircea TrofinBM_memcpy/32k       2181 ns       2185 ns     324074
186a290770fSMircea TrofinBM_memcpy/32          12 ns         12 ns   54687500
187a290770fSMircea TrofinBM_memcpy/32k       1834 ns       1837 ns     357143
188a290770fSMircea Trofin```
189a290770fSMircea Trofin
190*a5b79717SMircea Trofin## Disabling Benchmarks
191*a5b79717SMircea Trofin
192*a5b79717SMircea TrofinIt is possible to temporarily disable benchmarks by renaming the benchmark
193*a5b79717SMircea Trofinfunction to have the prefix "DISABLED_". This will cause the benchmark to
194*a5b79717SMircea Trofinbe skipped at runtime.
195*a5b79717SMircea Trofin
196a290770fSMircea Trofin<a name="result-comparison" />
197a290770fSMircea Trofin
198a290770fSMircea Trofin## Result comparison
199a290770fSMircea Trofin
200a290770fSMircea TrofinIt is possible to compare the benchmarking results.
201a290770fSMircea TrofinSee [Additional Tooling Documentation](tools.md)
202a290770fSMircea Trofin
203a290770fSMircea Trofin<a name="extra-context" />
204a290770fSMircea Trofin
205a290770fSMircea Trofin## Extra Context
206a290770fSMircea Trofin
207a290770fSMircea TrofinSometimes it's useful to add extra context to the content printed before the
208a290770fSMircea Trofinresults. By default this section includes information about the CPU on which
209a290770fSMircea Trofinthe benchmarks are running. If you do want to add more context, you can use
210a290770fSMircea Trofinthe `benchmark_context` command line flag:
211a290770fSMircea Trofin
212a290770fSMircea Trofin```bash
213a290770fSMircea Trofin$ ./run_benchmarks --benchmark_context=pwd=`pwd`
214a290770fSMircea TrofinRun on (1 x 2300 MHz CPU)
215a290770fSMircea Trofinpwd: /home/user/benchmark/
216a290770fSMircea TrofinBenchmark              Time           CPU Iterations
217a290770fSMircea Trofin----------------------------------------------------
218a290770fSMircea TrofinBM_memcpy/32          11 ns         11 ns   79545455
219a290770fSMircea TrofinBM_memcpy/32k       2181 ns       2185 ns     324074
220a290770fSMircea Trofin```
221a290770fSMircea Trofin
222a290770fSMircea TrofinYou can get the same effect with the API:
223a290770fSMircea Trofin
224a290770fSMircea Trofin```c++
225a290770fSMircea Trofin  benchmark::AddCustomContext("foo", "bar");
226a290770fSMircea Trofin```
227a290770fSMircea Trofin
228a290770fSMircea TrofinNote that attempts to add a second value with the same key will fail with an
229a290770fSMircea Trofinerror message.
230a290770fSMircea Trofin
231a290770fSMircea Trofin<a name="runtime-and-reporting-considerations" />
232a290770fSMircea Trofin
233a290770fSMircea Trofin## Runtime and Reporting Considerations
234a290770fSMircea Trofin
235a290770fSMircea TrofinWhen the benchmark binary is executed, each benchmark function is run serially.
236a290770fSMircea TrofinThe number of iterations to run is determined dynamically by running the
237a290770fSMircea Trofinbenchmark a few times and measuring the time taken and ensuring that the
238a290770fSMircea Trofinultimate result will be statistically stable. As such, faster benchmark
239a290770fSMircea Trofinfunctions will be run for more iterations than slower benchmark functions, and
240a290770fSMircea Trofinthe number of iterations is thus reported.
241a290770fSMircea Trofin
242a290770fSMircea TrofinIn all cases, the number of iterations for which the benchmark is run is
243a290770fSMircea Trofingoverned by the amount of time the benchmark takes. Concretely, the number of
244a290770fSMircea Trofiniterations is at least one, not more than 1e9, until CPU time is greater than
245a290770fSMircea Trofinthe minimum time, or the wallclock time is 5x minimum time. The minimum time is
246a290770fSMircea Trofinset per benchmark by calling `MinTime` on the registered benchmark object.
247a290770fSMircea Trofin
248*a5b79717SMircea TrofinFurthermore warming up a benchmark might be necessary in order to get
249*a5b79717SMircea Trofinstable results because of e.g caching effects of the code under benchmark.
250*a5b79717SMircea TrofinWarming up means running the benchmark a given amount of time, before
251*a5b79717SMircea Trofinresults are actually taken into account. The amount of time for which
252*a5b79717SMircea Trofinthe warmup should be run can be set per benchmark by calling
253*a5b79717SMircea Trofin`MinWarmUpTime` on the registered benchmark object or for all benchmarks
254*a5b79717SMircea Trofinusing the `--benchmark_min_warmup_time` command-line option. Note that
255*a5b79717SMircea Trofin`MinWarmUpTime` will overwrite the value of `--benchmark_min_warmup_time`
256*a5b79717SMircea Trofinfor the single benchmark. How many iterations the warmup run of each
257*a5b79717SMircea Trofinbenchmark takes is determined the same way as described in the paragraph
258*a5b79717SMircea Trofinabove. Per default the warmup phase is set to 0 seconds and is therefore
259*a5b79717SMircea Trofindisabled.
260*a5b79717SMircea Trofin
261a290770fSMircea TrofinAverage timings are then reported over the iterations run. If multiple
262a290770fSMircea Trofinrepetitions are requested using the `--benchmark_repetitions` command-line
263a290770fSMircea Trofinoption, or at registration time, the benchmark function will be run several
264a290770fSMircea Trofintimes and statistical results across these repetitions will also be reported.
265a290770fSMircea Trofin
266a290770fSMircea TrofinAs well as the per-benchmark entries, a preamble in the report will include
267a290770fSMircea Trofininformation about the machine on which the benchmarks are run.
268a290770fSMircea Trofin
269a290770fSMircea Trofin<a name="setup-teardown" />
270a290770fSMircea Trofin
271a290770fSMircea Trofin## Setup/Teardown
272a290770fSMircea Trofin
273a290770fSMircea TrofinGlobal setup/teardown specific to each benchmark can be done by
274a290770fSMircea Trofinpassing a callback to Setup/Teardown:
275a290770fSMircea Trofin
276*a5b79717SMircea TrofinThe setup/teardown callbacks will be invoked once for each benchmark. If the
277*a5b79717SMircea Trofinbenchmark is multi-threaded (will run in k threads), they will be invoked
278*a5b79717SMircea Trofinexactly once before each run with k threads.
279*a5b79717SMircea Trofin
280*a5b79717SMircea TrofinIf the benchmark uses different size groups of threads, the above will be true
281*a5b79717SMircea Trofinfor each size group.
282a290770fSMircea Trofin
283a290770fSMircea TrofinEg.,
284a290770fSMircea Trofin
285a290770fSMircea Trofin```c++
286a290770fSMircea Trofinstatic void DoSetup(const benchmark::State& state) {
287a290770fSMircea Trofin}
288a290770fSMircea Trofin
289a290770fSMircea Trofinstatic void DoTeardown(const benchmark::State& state) {
290a290770fSMircea Trofin}
291a290770fSMircea Trofin
292a290770fSMircea Trofinstatic void BM_func(benchmark::State& state) {...}
293a290770fSMircea Trofin
294a290770fSMircea TrofinBENCHMARK(BM_func)->Arg(1)->Arg(3)->Threads(16)->Threads(32)->Setup(DoSetup)->Teardown(DoTeardown);
295a290770fSMircea Trofin
296a290770fSMircea Trofin```
297a290770fSMircea Trofin
298a290770fSMircea TrofinIn this example, `DoSetup` and `DoTearDown` will be invoked 4 times each,
299a290770fSMircea Trofinspecifically, once for each of this family:
300a290770fSMircea Trofin - BM_func_Arg_1_Threads_16, BM_func_Arg_1_Threads_32
301a290770fSMircea Trofin - BM_func_Arg_3_Threads_16, BM_func_Arg_3_Threads_32
302a290770fSMircea Trofin
303a290770fSMircea Trofin<a name="passing-arguments" />
304a290770fSMircea Trofin
305a290770fSMircea Trofin## Passing Arguments
306a290770fSMircea Trofin
307a290770fSMircea TrofinSometimes a family of benchmarks can be implemented with just one routine that
308a290770fSMircea Trofintakes an extra argument to specify which one of the family of benchmarks to
309a290770fSMircea Trofinrun. For example, the following code defines a family of benchmarks for
310a290770fSMircea Trofinmeasuring the speed of `memcpy()` calls of different lengths:
311a290770fSMircea Trofin
312a290770fSMircea Trofin```c++
313a290770fSMircea Trofinstatic void BM_memcpy(benchmark::State& state) {
314a290770fSMircea Trofin  char* src = new char[state.range(0)];
315a290770fSMircea Trofin  char* dst = new char[state.range(0)];
316a290770fSMircea Trofin  memset(src, 'x', state.range(0));
317a290770fSMircea Trofin  for (auto _ : state)
318a290770fSMircea Trofin    memcpy(dst, src, state.range(0));
319a290770fSMircea Trofin  state.SetBytesProcessed(int64_t(state.iterations()) *
320a290770fSMircea Trofin                          int64_t(state.range(0)));
321a290770fSMircea Trofin  delete[] src;
322a290770fSMircea Trofin  delete[] dst;
323a290770fSMircea Trofin}
324*a5b79717SMircea TrofinBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(4<<10)->Arg(8<<10);
325a290770fSMircea Trofin```
326a290770fSMircea Trofin
327a290770fSMircea TrofinThe preceding code is quite repetitive, and can be replaced with the following
328a290770fSMircea Trofinshort-hand. The following invocation will pick a few appropriate arguments in
329a290770fSMircea Trofinthe specified range and will generate a benchmark for each such argument.
330a290770fSMircea Trofin
331a290770fSMircea Trofin```c++
332a290770fSMircea TrofinBENCHMARK(BM_memcpy)->Range(8, 8<<10);
333a290770fSMircea Trofin```
334a290770fSMircea Trofin
335a290770fSMircea TrofinBy default the arguments in the range are generated in multiples of eight and
336a290770fSMircea Trofinthe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
337a290770fSMircea Trofinrange multiplier is changed to multiples of two.
338a290770fSMircea Trofin
339a290770fSMircea Trofin```c++
340a290770fSMircea TrofinBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
341a290770fSMircea Trofin```
342a290770fSMircea Trofin
343a290770fSMircea TrofinNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
344a290770fSMircea Trofin
345a290770fSMircea TrofinThe preceding code shows a method of defining a sparse range.  The following
346a290770fSMircea Trofinexample shows a method of defining a dense range. It is then used to benchmark
347a290770fSMircea Trofinthe performance of `std::vector` initialization for uniformly increasing sizes.
348a290770fSMircea Trofin
349a290770fSMircea Trofin```c++
350a290770fSMircea Trofinstatic void BM_DenseRange(benchmark::State& state) {
351a290770fSMircea Trofin  for(auto _ : state) {
352a290770fSMircea Trofin    std::vector<int> v(state.range(0), state.range(0));
353*a5b79717SMircea Trofin    auto data = v.data();
354*a5b79717SMircea Trofin    benchmark::DoNotOptimize(data);
355a290770fSMircea Trofin    benchmark::ClobberMemory();
356a290770fSMircea Trofin  }
357a290770fSMircea Trofin}
358a290770fSMircea TrofinBENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128);
359a290770fSMircea Trofin```
360a290770fSMircea Trofin
361a290770fSMircea TrofinNow arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ].
362a290770fSMircea Trofin
363a290770fSMircea TrofinYou might have a benchmark that depends on two or more inputs. For example, the
364a290770fSMircea Trofinfollowing code defines a family of benchmarks for measuring the speed of set
365a290770fSMircea Trofininsertion.
366a290770fSMircea Trofin
367a290770fSMircea Trofin```c++
368a290770fSMircea Trofinstatic void BM_SetInsert(benchmark::State& state) {
369a290770fSMircea Trofin  std::set<int> data;
370a290770fSMircea Trofin  for (auto _ : state) {
371a290770fSMircea Trofin    state.PauseTiming();
372a290770fSMircea Trofin    data = ConstructRandomSet(state.range(0));
373a290770fSMircea Trofin    state.ResumeTiming();
374a290770fSMircea Trofin    for (int j = 0; j < state.range(1); ++j)
375a290770fSMircea Trofin      data.insert(RandomNumber());
376a290770fSMircea Trofin  }
377a290770fSMircea Trofin}
378a290770fSMircea TrofinBENCHMARK(BM_SetInsert)
379a290770fSMircea Trofin    ->Args({1<<10, 128})
380a290770fSMircea Trofin    ->Args({2<<10, 128})
381a290770fSMircea Trofin    ->Args({4<<10, 128})
382a290770fSMircea Trofin    ->Args({8<<10, 128})
383a290770fSMircea Trofin    ->Args({1<<10, 512})
384a290770fSMircea Trofin    ->Args({2<<10, 512})
385a290770fSMircea Trofin    ->Args({4<<10, 512})
386a290770fSMircea Trofin    ->Args({8<<10, 512});
387a290770fSMircea Trofin```
388a290770fSMircea Trofin
389a290770fSMircea TrofinThe preceding code is quite repetitive, and can be replaced with the following
390a290770fSMircea Trofinshort-hand. The following macro will pick a few appropriate arguments in the
391a290770fSMircea Trofinproduct of the two specified ranges and will generate a benchmark for each such
392a290770fSMircea Trofinpair.
393a290770fSMircea Trofin
394*a5b79717SMircea Trofin<!-- {% raw %} -->
395a290770fSMircea Trofin```c++
396a290770fSMircea TrofinBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
397a290770fSMircea Trofin```
398*a5b79717SMircea Trofin<!-- {% endraw %} -->
399a290770fSMircea Trofin
400a290770fSMircea TrofinSome benchmarks may require specific argument values that cannot be expressed
401a290770fSMircea Trofinwith `Ranges`. In this case, `ArgsProduct` offers the ability to generate a
402a290770fSMircea Trofinbenchmark input for each combination in the product of the supplied vectors.
403a290770fSMircea Trofin
404*a5b79717SMircea Trofin<!-- {% raw %} -->
405a290770fSMircea Trofin```c++
406a290770fSMircea TrofinBENCHMARK(BM_SetInsert)
407a290770fSMircea Trofin    ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}})
408a290770fSMircea Trofin// would generate the same benchmark arguments as
409a290770fSMircea TrofinBENCHMARK(BM_SetInsert)
410a290770fSMircea Trofin    ->Args({1<<10, 20})
411a290770fSMircea Trofin    ->Args({3<<10, 20})
412a290770fSMircea Trofin    ->Args({8<<10, 20})
413a290770fSMircea Trofin    ->Args({3<<10, 40})
414a290770fSMircea Trofin    ->Args({8<<10, 40})
415a290770fSMircea Trofin    ->Args({1<<10, 40})
416a290770fSMircea Trofin    ->Args({1<<10, 60})
417a290770fSMircea Trofin    ->Args({3<<10, 60})
418a290770fSMircea Trofin    ->Args({8<<10, 60})
419a290770fSMircea Trofin    ->Args({1<<10, 80})
420a290770fSMircea Trofin    ->Args({3<<10, 80})
421a290770fSMircea Trofin    ->Args({8<<10, 80});
422a290770fSMircea Trofin```
423*a5b79717SMircea Trofin<!-- {% endraw %} -->
424a290770fSMircea Trofin
425a290770fSMircea TrofinFor the most common scenarios, helper methods for creating a list of
426a290770fSMircea Trofinintegers for a given sparse or dense range are provided.
427a290770fSMircea Trofin
428a290770fSMircea Trofin```c++
429a290770fSMircea TrofinBENCHMARK(BM_SetInsert)
430a290770fSMircea Trofin    ->ArgsProduct({
431a290770fSMircea Trofin      benchmark::CreateRange(8, 128, /*multi=*/2),
432a290770fSMircea Trofin      benchmark::CreateDenseRange(1, 4, /*step=*/1)
433a290770fSMircea Trofin    })
434a290770fSMircea Trofin// would generate the same benchmark arguments as
435a290770fSMircea TrofinBENCHMARK(BM_SetInsert)
436a290770fSMircea Trofin    ->ArgsProduct({
437a290770fSMircea Trofin      {8, 16, 32, 64, 128},
438a290770fSMircea Trofin      {1, 2, 3, 4}
439a290770fSMircea Trofin    });
440a290770fSMircea Trofin```
441a290770fSMircea Trofin
442a290770fSMircea TrofinFor more complex patterns of inputs, passing a custom function to `Apply` allows
443a290770fSMircea Trofinprogrammatic specification of an arbitrary set of arguments on which to run the
444a290770fSMircea Trofinbenchmark. The following example enumerates a dense range on one parameter,
445a290770fSMircea Trofinand a sparse range on the second.
446a290770fSMircea Trofin
447a290770fSMircea Trofin```c++
448a290770fSMircea Trofinstatic void CustomArguments(benchmark::internal::Benchmark* b) {
449a290770fSMircea Trofin  for (int i = 0; i <= 10; ++i)
450a290770fSMircea Trofin    for (int j = 32; j <= 1024*1024; j *= 8)
451a290770fSMircea Trofin      b->Args({i, j});
452a290770fSMircea Trofin}
453a290770fSMircea TrofinBENCHMARK(BM_SetInsert)->Apply(CustomArguments);
454a290770fSMircea Trofin```
455a290770fSMircea Trofin
456a290770fSMircea Trofin### Passing Arbitrary Arguments to a Benchmark
457a290770fSMircea Trofin
458a290770fSMircea TrofinIn C++11 it is possible to define a benchmark that takes an arbitrary number
459a290770fSMircea Trofinof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
460a290770fSMircea Trofinmacro creates a benchmark that invokes `func`  with the `benchmark::State` as
461a290770fSMircea Trofinthe first argument followed by the specified `args...`.
462a290770fSMircea TrofinThe `test_case_name` is appended to the name of the benchmark and
463a290770fSMircea Trofinshould describe the values passed.
464a290770fSMircea Trofin
465a290770fSMircea Trofin```c++
466*a5b79717SMircea Trofintemplate <class ...Args>
467*a5b79717SMircea Trofinvoid BM_takes_args(benchmark::State& state, Args&&... args) {
468*a5b79717SMircea Trofin  auto args_tuple = std::make_tuple(std::move(args)...);
469*a5b79717SMircea Trofin  for (auto _ : state) {
470*a5b79717SMircea Trofin    std::cout << std::get<0>(args_tuple) << ": " << std::get<1>(args_tuple)
471*a5b79717SMircea Trofin              << '\n';
472a290770fSMircea Trofin    [...]
473a290770fSMircea Trofin  }
474*a5b79717SMircea Trofin}
475a290770fSMircea Trofin// Registers a benchmark named "BM_takes_args/int_string_test" that passes
476*a5b79717SMircea Trofin// the specified values to `args`.
477a290770fSMircea TrofinBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
478*a5b79717SMircea Trofin
479*a5b79717SMircea Trofin// Registers the same benchmark "BM_takes_args/int_test" that passes
480*a5b79717SMircea Trofin// the specified values to `args`.
481*a5b79717SMircea TrofinBENCHMARK_CAPTURE(BM_takes_args, int_test, 42, 43);
482a290770fSMircea Trofin```
483a290770fSMircea Trofin
484a290770fSMircea TrofinNote that elements of `...args` may refer to global variables. Users should
485a290770fSMircea Trofinavoid modifying global state inside of a benchmark.
486a290770fSMircea Trofin
487a290770fSMircea Trofin<a name="asymptotic-complexity" />
488a290770fSMircea Trofin
489a290770fSMircea Trofin## Calculating Asymptotic Complexity (Big O)
490a290770fSMircea Trofin
491a290770fSMircea TrofinAsymptotic complexity might be calculated for a family of benchmarks. The
492a290770fSMircea Trofinfollowing code will calculate the coefficient for the high-order term in the
493a290770fSMircea Trofinrunning time and the normalized root-mean square error of string comparison.
494a290770fSMircea Trofin
495a290770fSMircea Trofin```c++
496a290770fSMircea Trofinstatic void BM_StringCompare(benchmark::State& state) {
497a290770fSMircea Trofin  std::string s1(state.range(0), '-');
498a290770fSMircea Trofin  std::string s2(state.range(0), '-');
499a290770fSMircea Trofin  for (auto _ : state) {
500*a5b79717SMircea Trofin    auto comparison_result = s1.compare(s2);
501*a5b79717SMircea Trofin    benchmark::DoNotOptimize(comparison_result);
502a290770fSMircea Trofin  }
503a290770fSMircea Trofin  state.SetComplexityN(state.range(0));
504a290770fSMircea Trofin}
505a290770fSMircea TrofinBENCHMARK(BM_StringCompare)
506a290770fSMircea Trofin    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
507a290770fSMircea Trofin```
508a290770fSMircea Trofin
509a290770fSMircea TrofinAs shown in the following invocation, asymptotic complexity might also be
510a290770fSMircea Trofincalculated automatically.
511a290770fSMircea Trofin
512a290770fSMircea Trofin```c++
513a290770fSMircea TrofinBENCHMARK(BM_StringCompare)
514a290770fSMircea Trofin    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
515a290770fSMircea Trofin```
516a290770fSMircea Trofin
517a290770fSMircea TrofinThe following code will specify asymptotic complexity with a lambda function,
518a290770fSMircea Trofinthat might be used to customize high-order term calculation.
519a290770fSMircea Trofin
520a290770fSMircea Trofin```c++
521a290770fSMircea TrofinBENCHMARK(BM_StringCompare)->RangeMultiplier(2)
522a290770fSMircea Trofin    ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; });
523a290770fSMircea Trofin```
524a290770fSMircea Trofin
525a290770fSMircea Trofin<a name="custom-benchmark-name" />
526a290770fSMircea Trofin
527a290770fSMircea Trofin## Custom Benchmark Name
528a290770fSMircea Trofin
529a290770fSMircea TrofinYou can change the benchmark's name as follows:
530a290770fSMircea Trofin
531a290770fSMircea Trofin```c++
532a290770fSMircea TrofinBENCHMARK(BM_memcpy)->Name("memcpy")->RangeMultiplier(2)->Range(8, 8<<10);
533a290770fSMircea Trofin```
534a290770fSMircea Trofin
535a290770fSMircea TrofinThe invocation will execute the benchmark as before using `BM_memcpy` but changes
536a290770fSMircea Trofinthe prefix in the report to `memcpy`.
537a290770fSMircea Trofin
538a290770fSMircea Trofin<a name="templated-benchmarks" />
539a290770fSMircea Trofin
540a290770fSMircea Trofin## Templated Benchmarks
541a290770fSMircea Trofin
542a290770fSMircea TrofinThis example produces and consumes messages of size `sizeof(v)` `range_x`
543a290770fSMircea Trofintimes. It also outputs throughput in the absence of multiprogramming.
544a290770fSMircea Trofin
545a290770fSMircea Trofin```c++
546a290770fSMircea Trofintemplate <class Q> void BM_Sequential(benchmark::State& state) {
547a290770fSMircea Trofin  Q q;
548a290770fSMircea Trofin  typename Q::value_type v;
549a290770fSMircea Trofin  for (auto _ : state) {
550a290770fSMircea Trofin    for (int i = state.range(0); i--; )
551a290770fSMircea Trofin      q.push(v);
552a290770fSMircea Trofin    for (int e = state.range(0); e--; )
553a290770fSMircea Trofin      q.Wait(&v);
554a290770fSMircea Trofin  }
555a290770fSMircea Trofin  // actually messages, not bytes:
556a290770fSMircea Trofin  state.SetBytesProcessed(
557a290770fSMircea Trofin      static_cast<int64_t>(state.iterations())*state.range(0));
558a290770fSMircea Trofin}
559a290770fSMircea Trofin// C++03
560a290770fSMircea TrofinBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
561a290770fSMircea Trofin
562a290770fSMircea Trofin// C++11 or newer, you can use the BENCHMARK macro with template parameters:
563a290770fSMircea TrofinBENCHMARK(BM_Sequential<WaitQueue<int>>)->Range(1<<0, 1<<10);
564a290770fSMircea Trofin
565a290770fSMircea Trofin```
566a290770fSMircea Trofin
567a290770fSMircea TrofinThree macros are provided for adding benchmark templates.
568a290770fSMircea Trofin
569a290770fSMircea Trofin```c++
570a290770fSMircea Trofin#ifdef BENCHMARK_HAS_CXX11
571a290770fSMircea Trofin#define BENCHMARK(func<...>) // Takes any number of parameters.
572a290770fSMircea Trofin#else // C++ < C++11
573a290770fSMircea Trofin#define BENCHMARK_TEMPLATE(func, arg1)
574a290770fSMircea Trofin#endif
575a290770fSMircea Trofin#define BENCHMARK_TEMPLATE1(func, arg1)
576a290770fSMircea Trofin#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
577a290770fSMircea Trofin```
578a290770fSMircea Trofin
579*a5b79717SMircea Trofin<a name="templated-benchmarks-with-arguments" />
580*a5b79717SMircea Trofin
581*a5b79717SMircea Trofin## Templated Benchmarks that take arguments
582*a5b79717SMircea Trofin
583*a5b79717SMircea TrofinSometimes there is a need to template benchmarks, and provide arguments to them.
584*a5b79717SMircea Trofin
585*a5b79717SMircea Trofin```c++
586*a5b79717SMircea Trofintemplate <class Q> void BM_Sequential_With_Step(benchmark::State& state, int step) {
587*a5b79717SMircea Trofin  Q q;
588*a5b79717SMircea Trofin  typename Q::value_type v;
589*a5b79717SMircea Trofin  for (auto _ : state) {
590*a5b79717SMircea Trofin    for (int i = state.range(0); i-=step; )
591*a5b79717SMircea Trofin      q.push(v);
592*a5b79717SMircea Trofin    for (int e = state.range(0); e-=step; )
593*a5b79717SMircea Trofin      q.Wait(&v);
594*a5b79717SMircea Trofin  }
595*a5b79717SMircea Trofin  // actually messages, not bytes:
596*a5b79717SMircea Trofin  state.SetBytesProcessed(
597*a5b79717SMircea Trofin      static_cast<int64_t>(state.iterations())*state.range(0));
598*a5b79717SMircea Trofin}
599*a5b79717SMircea Trofin
600*a5b79717SMircea TrofinBENCHMARK_TEMPLATE1_CAPTURE(BM_Sequential, WaitQueue<int>, Step1, 1)->Range(1<<0, 1<<10);
601*a5b79717SMircea Trofin```
602*a5b79717SMircea Trofin
603a290770fSMircea Trofin<a name="fixtures" />
604a290770fSMircea Trofin
605a290770fSMircea Trofin## Fixtures
606a290770fSMircea Trofin
607a290770fSMircea TrofinFixture tests are created by first defining a type that derives from
608a290770fSMircea Trofin`::benchmark::Fixture` and then creating/registering the tests using the
609a290770fSMircea Trofinfollowing macros:
610a290770fSMircea Trofin
611a290770fSMircea Trofin* `BENCHMARK_F(ClassName, Method)`
612a290770fSMircea Trofin* `BENCHMARK_DEFINE_F(ClassName, Method)`
613a290770fSMircea Trofin* `BENCHMARK_REGISTER_F(ClassName, Method)`
614a290770fSMircea Trofin
615a290770fSMircea TrofinFor Example:
616a290770fSMircea Trofin
617a290770fSMircea Trofin```c++
618a290770fSMircea Trofinclass MyFixture : public benchmark::Fixture {
619a290770fSMircea Trofinpublic:
620*a5b79717SMircea Trofin  void SetUp(::benchmark::State& state) {
621a290770fSMircea Trofin  }
622a290770fSMircea Trofin
623*a5b79717SMircea Trofin  void TearDown(::benchmark::State& state) {
624a290770fSMircea Trofin  }
625a290770fSMircea Trofin};
626a290770fSMircea Trofin
627a290770fSMircea TrofinBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
628a290770fSMircea Trofin   for (auto _ : st) {
629a290770fSMircea Trofin     ...
630a290770fSMircea Trofin  }
631a290770fSMircea Trofin}
632a290770fSMircea Trofin
633a290770fSMircea TrofinBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
634a290770fSMircea Trofin   for (auto _ : st) {
635a290770fSMircea Trofin     ...
636a290770fSMircea Trofin  }
637a290770fSMircea Trofin}
638a290770fSMircea Trofin/* BarTest is NOT registered */
639a290770fSMircea TrofinBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
640a290770fSMircea Trofin/* BarTest is now registered */
641a290770fSMircea Trofin```
642a290770fSMircea Trofin
643a290770fSMircea Trofin### Templated Fixtures
644a290770fSMircea Trofin
645a290770fSMircea TrofinAlso you can create templated fixture by using the following macros:
646a290770fSMircea Trofin
647a290770fSMircea Trofin* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
648a290770fSMircea Trofin* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
649a290770fSMircea Trofin
650a290770fSMircea TrofinFor example:
651a290770fSMircea Trofin
652a290770fSMircea Trofin```c++
653a290770fSMircea Trofintemplate<typename T>
654a290770fSMircea Trofinclass MyFixture : public benchmark::Fixture {};
655a290770fSMircea Trofin
656a290770fSMircea TrofinBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
657a290770fSMircea Trofin   for (auto _ : st) {
658a290770fSMircea Trofin     ...
659a290770fSMircea Trofin  }
660a290770fSMircea Trofin}
661a290770fSMircea Trofin
662a290770fSMircea TrofinBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
663a290770fSMircea Trofin   for (auto _ : st) {
664a290770fSMircea Trofin     ...
665a290770fSMircea Trofin  }
666a290770fSMircea Trofin}
667a290770fSMircea Trofin
668a290770fSMircea TrofinBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
669a290770fSMircea Trofin```
670a290770fSMircea Trofin
671a290770fSMircea Trofin<a name="custom-counters" />
672a290770fSMircea Trofin
673a290770fSMircea Trofin## Custom Counters
674a290770fSMircea Trofin
675a290770fSMircea TrofinYou can add your own counters with user-defined names. The example below
676a290770fSMircea Trofinwill add columns "Foo", "Bar" and "Baz" in its output:
677a290770fSMircea Trofin
678a290770fSMircea Trofin```c++
679a290770fSMircea Trofinstatic void UserCountersExample1(benchmark::State& state) {
680a290770fSMircea Trofin  double numFoos = 0, numBars = 0, numBazs = 0;
681a290770fSMircea Trofin  for (auto _ : state) {
682a290770fSMircea Trofin    // ... count Foo,Bar,Baz events
683a290770fSMircea Trofin  }
684a290770fSMircea Trofin  state.counters["Foo"] = numFoos;
685a290770fSMircea Trofin  state.counters["Bar"] = numBars;
686a290770fSMircea Trofin  state.counters["Baz"] = numBazs;
687a290770fSMircea Trofin}
688a290770fSMircea Trofin```
689a290770fSMircea Trofin
690a290770fSMircea TrofinThe `state.counters` object is a `std::map` with `std::string` keys
691a290770fSMircea Trofinand `Counter` values. The latter is a `double`-like class, via an implicit
692a290770fSMircea Trofinconversion to `double&`. Thus you can use all of the standard arithmetic
693a290770fSMircea Trofinassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
694a290770fSMircea Trofin
695a290770fSMircea TrofinIn multithreaded benchmarks, each counter is set on the calling thread only.
696a290770fSMircea TrofinWhen the benchmark finishes, the counters from each thread will be summed;
697a290770fSMircea Trofinthe resulting sum is the value which will be shown for the benchmark.
698a290770fSMircea Trofin
699a290770fSMircea TrofinThe `Counter` constructor accepts three parameters: the value as a `double`
700a290770fSMircea Trofin; a bit flag which allows you to show counters as rates, and/or as per-thread
701a290770fSMircea Trofiniteration, and/or as per-thread averages, and/or iteration invariants,
702a290770fSMircea Trofinand/or finally inverting the result; and a flag specifying the 'unit' - i.e.
703a290770fSMircea Trofinis 1k a 1000 (default, `benchmark::Counter::OneK::kIs1000`), or 1024
704a290770fSMircea Trofin(`benchmark::Counter::OneK::kIs1024`)?
705a290770fSMircea Trofin
706a290770fSMircea Trofin```c++
707a290770fSMircea Trofin  // sets a simple counter
708a290770fSMircea Trofin  state.counters["Foo"] = numFoos;
709a290770fSMircea Trofin
710a290770fSMircea Trofin  // Set the counter as a rate. It will be presented divided
711a290770fSMircea Trofin  // by the duration of the benchmark.
712a290770fSMircea Trofin  // Meaning: per one second, how many 'foo's are processed?
713a290770fSMircea Trofin  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
714a290770fSMircea Trofin
715a290770fSMircea Trofin  // Set the counter as a rate. It will be presented divided
716a290770fSMircea Trofin  // by the duration of the benchmark, and the result inverted.
717a290770fSMircea Trofin  // Meaning: how many seconds it takes to process one 'foo'?
718a290770fSMircea Trofin  state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert);
719a290770fSMircea Trofin
720a290770fSMircea Trofin  // Set the counter as a thread-average quantity. It will
721a290770fSMircea Trofin  // be presented divided by the number of threads.
722a290770fSMircea Trofin  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
723a290770fSMircea Trofin
724a290770fSMircea Trofin  // There's also a combined flag:
725a290770fSMircea Trofin  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
726a290770fSMircea Trofin
727a290770fSMircea Trofin  // This says that we process with the rate of state.range(0) bytes every iteration:
728a290770fSMircea Trofin  state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024);
729a290770fSMircea Trofin```
730a290770fSMircea Trofin
731a290770fSMircea TrofinWhen you're compiling in C++11 mode or later you can use `insert()` with
732a290770fSMircea Trofin`std::initializer_list`:
733a290770fSMircea Trofin
734*a5b79717SMircea Trofin<!-- {% raw %} -->
735a290770fSMircea Trofin```c++
736a290770fSMircea Trofin  // With C++11, this can be done:
737a290770fSMircea Trofin  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
738a290770fSMircea Trofin  // ... instead of:
739a290770fSMircea Trofin  state.counters["Foo"] = numFoos;
740a290770fSMircea Trofin  state.counters["Bar"] = numBars;
741a290770fSMircea Trofin  state.counters["Baz"] = numBazs;
742a290770fSMircea Trofin```
743*a5b79717SMircea Trofin<!-- {% endraw %} -->
744a290770fSMircea Trofin
745a290770fSMircea Trofin### Counter Reporting
746a290770fSMircea Trofin
747a290770fSMircea TrofinWhen using the console reporter, by default, user counters are printed at
748a290770fSMircea Trofinthe end after the table, the same way as ``bytes_processed`` and
749a290770fSMircea Trofin``items_processed``. This is best for cases in which there are few counters,
750a290770fSMircea Trofinor where there are only a couple of lines per benchmark. Here's an example of
751a290770fSMircea Trofinthe default output:
752a290770fSMircea Trofin
753a290770fSMircea Trofin```
754a290770fSMircea Trofin------------------------------------------------------------------------------
755a290770fSMircea TrofinBenchmark                        Time           CPU Iterations UserCounters...
756a290770fSMircea Trofin------------------------------------------------------------------------------
757a290770fSMircea TrofinBM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
758a290770fSMircea TrofinBM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
759a290770fSMircea TrofinBM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
760a290770fSMircea TrofinBM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
761a290770fSMircea TrofinBM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
762a290770fSMircea TrofinBM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
763a290770fSMircea TrofinBM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
764a290770fSMircea TrofinBM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
765a290770fSMircea TrofinBM_Factorial                    26 ns         26 ns   26608979 40320
766a290770fSMircea TrofinBM_Factorial/real_time          26 ns         26 ns   26587936 40320
767a290770fSMircea TrofinBM_CalculatePiRange/1           16 ns         16 ns   45704255 0
768a290770fSMircea TrofinBM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
769a290770fSMircea TrofinBM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
770a290770fSMircea TrofinBM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
771a290770fSMircea Trofin```
772a290770fSMircea Trofin
773a290770fSMircea TrofinIf this doesn't suit you, you can print each counter as a table column by
774a290770fSMircea Trofinpassing the flag `--benchmark_counters_tabular=true` to the benchmark
775a290770fSMircea Trofinapplication. This is best for cases in which there are a lot of counters, or
776a290770fSMircea Trofina lot of lines per individual benchmark. Note that this will trigger a
777a290770fSMircea Trofinreprinting of the table header any time the counter set changes between
778a290770fSMircea Trofinindividual benchmarks. Here's an example of corresponding output when
779a290770fSMircea Trofin`--benchmark_counters_tabular=true` is passed:
780a290770fSMircea Trofin
781a290770fSMircea Trofin```
782a290770fSMircea Trofin---------------------------------------------------------------------------------------
783a290770fSMircea TrofinBenchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
784a290770fSMircea Trofin---------------------------------------------------------------------------------------
785a290770fSMircea TrofinBM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
786a290770fSMircea TrofinBM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
787a290770fSMircea TrofinBM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
788a290770fSMircea TrofinBM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
789a290770fSMircea TrofinBM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
790a290770fSMircea TrofinBM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
791a290770fSMircea TrofinBM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
792a290770fSMircea TrofinBM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
793a290770fSMircea Trofin--------------------------------------------------------------
794a290770fSMircea TrofinBenchmark                        Time           CPU Iterations
795a290770fSMircea Trofin--------------------------------------------------------------
796a290770fSMircea TrofinBM_Factorial                    26 ns         26 ns   26392245 40320
797a290770fSMircea TrofinBM_Factorial/real_time          26 ns         26 ns   26494107 40320
798a290770fSMircea TrofinBM_CalculatePiRange/1           15 ns         15 ns   45571597 0
799a290770fSMircea TrofinBM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
800a290770fSMircea TrofinBM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
801a290770fSMircea TrofinBM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
802a290770fSMircea TrofinBM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
803a290770fSMircea TrofinBM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
804a290770fSMircea TrofinBM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
805a290770fSMircea TrofinBM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
806a290770fSMircea TrofinBM_CalculatePi/threads:8      2255 ns       9943 ns      70936
807a290770fSMircea Trofin```
808a290770fSMircea Trofin
809a290770fSMircea TrofinNote above the additional header printed when the benchmark changes from
810a290770fSMircea Trofin``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
811a290770fSMircea Trofinnot have the same counter set as ``BM_UserCounter``.
812a290770fSMircea Trofin
813a290770fSMircea Trofin<a name="multithreaded-benchmarks"/>
814a290770fSMircea Trofin
815a290770fSMircea Trofin## Multithreaded Benchmarks
816a290770fSMircea Trofin
817a290770fSMircea TrofinIn a multithreaded test (benchmark invoked by multiple threads simultaneously),
818a290770fSMircea Trofinit is guaranteed that none of the threads will start until all have reached
819a290770fSMircea Trofinthe start of the benchmark loop, and all will have finished before any thread
820a290770fSMircea Trofinexits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
821a290770fSMircea TrofinAPI) As such, any global setup or teardown can be wrapped in a check against the thread
822a290770fSMircea Trofinindex:
823a290770fSMircea Trofin
824a290770fSMircea Trofin```c++
825a290770fSMircea Trofinstatic void BM_MultiThreaded(benchmark::State& state) {
826a290770fSMircea Trofin  if (state.thread_index() == 0) {
827a290770fSMircea Trofin    // Setup code here.
828a290770fSMircea Trofin  }
829a290770fSMircea Trofin  for (auto _ : state) {
830a290770fSMircea Trofin    // Run the test as normal.
831a290770fSMircea Trofin  }
832a290770fSMircea Trofin  if (state.thread_index() == 0) {
833a290770fSMircea Trofin    // Teardown code here.
834a290770fSMircea Trofin  }
835a290770fSMircea Trofin}
836a290770fSMircea TrofinBENCHMARK(BM_MultiThreaded)->Threads(2);
837a290770fSMircea Trofin```
838a290770fSMircea Trofin
839*a5b79717SMircea TrofinTo run the benchmark across a range of thread counts, instead of `Threads`, use
840*a5b79717SMircea Trofin`ThreadRange`. This takes two parameters (`min_threads` and `max_threads`) and
841*a5b79717SMircea Trofinruns the benchmark once for values in the inclusive range. For example:
842*a5b79717SMircea Trofin
843*a5b79717SMircea Trofin```c++
844*a5b79717SMircea TrofinBENCHMARK(BM_MultiThreaded)->ThreadRange(1, 8);
845*a5b79717SMircea Trofin```
846*a5b79717SMircea Trofin
847*a5b79717SMircea Trofinwill run `BM_MultiThreaded` with thread counts 1, 2, 4, and 8.
848*a5b79717SMircea Trofin
849a290770fSMircea TrofinIf the benchmarked code itself uses threads and you want to compare it to
850a290770fSMircea Trofinsingle-threaded code, you may want to use real-time ("wallclock") measurements
851a290770fSMircea Trofinfor latency comparisons:
852a290770fSMircea Trofin
853a290770fSMircea Trofin```c++
854a290770fSMircea TrofinBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
855a290770fSMircea Trofin```
856a290770fSMircea Trofin
857a290770fSMircea TrofinWithout `UseRealTime`, CPU time is used by default.
858a290770fSMircea Trofin
859a290770fSMircea Trofin<a name="cpu-timers" />
860a290770fSMircea Trofin
861a290770fSMircea Trofin## CPU Timers
862a290770fSMircea Trofin
863a290770fSMircea TrofinBy default, the CPU timer only measures the time spent by the main thread.
864a290770fSMircea TrofinIf the benchmark itself uses threads internally, this measurement may not
865a290770fSMircea Trofinbe what you are looking for. Instead, there is a way to measure the total
866a290770fSMircea TrofinCPU usage of the process, by all the threads.
867a290770fSMircea Trofin
868a290770fSMircea Trofin```c++
869a290770fSMircea Trofinvoid callee(int i);
870a290770fSMircea Trofin
871a290770fSMircea Trofinstatic void MyMain(int size) {
872a290770fSMircea Trofin#pragma omp parallel for
873a290770fSMircea Trofin  for(int i = 0; i < size; i++)
874a290770fSMircea Trofin    callee(i);
875a290770fSMircea Trofin}
876a290770fSMircea Trofin
877a290770fSMircea Trofinstatic void BM_OpenMP(benchmark::State& state) {
878a290770fSMircea Trofin  for (auto _ : state)
879a290770fSMircea Trofin    MyMain(state.range(0));
880a290770fSMircea Trofin}
881a290770fSMircea Trofin
882a290770fSMircea Trofin// Measure the time spent by the main thread, use it to decide for how long to
883a290770fSMircea Trofin// run the benchmark loop. Depending on the internal implementation detail may
884a290770fSMircea Trofin// measure to anywhere from near-zero (the overhead spent before/after work
885a290770fSMircea Trofin// handoff to worker thread[s]) to the whole single-thread time.
886a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10);
887a290770fSMircea Trofin
888a290770fSMircea Trofin// Measure the user-visible time, the wall clock (literally, the time that
889a290770fSMircea Trofin// has passed on the clock on the wall), use it to decide for how long to
890*a5b79717SMircea Trofin// run the benchmark loop. This will always be meaningful, and will match the
891a290770fSMircea Trofin// time spent by the main thread in single-threaded case, in general decreasing
892a290770fSMircea Trofin// with the number of internal threads doing the work.
893a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime();
894a290770fSMircea Trofin
895a290770fSMircea Trofin// Measure the total CPU consumption, use it to decide for how long to
896a290770fSMircea Trofin// run the benchmark loop. This will always measure to no less than the
897a290770fSMircea Trofin// time spent by the main thread in single-threaded case.
898a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime();
899a290770fSMircea Trofin
900a290770fSMircea Trofin// A mixture of the last two. Measure the total CPU consumption, but use the
901a290770fSMircea Trofin// wall clock to decide for how long to run the benchmark loop.
902a290770fSMircea TrofinBENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime();
903a290770fSMircea Trofin```
904a290770fSMircea Trofin
905a290770fSMircea Trofin### Controlling Timers
906a290770fSMircea Trofin
907a290770fSMircea TrofinNormally, the entire duration of the work loop (`for (auto _ : state) {}`)
908a290770fSMircea Trofinis measured. But sometimes, it is necessary to do some work inside of
909a290770fSMircea Trofinthat loop, every iteration, but without counting that time to the benchmark time.
910a290770fSMircea TrofinThat is possible, although it is not recommended, since it has high overhead.
911a290770fSMircea Trofin
912*a5b79717SMircea Trofin<!-- {% raw %} -->
913a290770fSMircea Trofin```c++
914a290770fSMircea Trofinstatic void BM_SetInsert_With_Timer_Control(benchmark::State& state) {
915a290770fSMircea Trofin  std::set<int> data;
916a290770fSMircea Trofin  for (auto _ : state) {
917a290770fSMircea Trofin    state.PauseTiming(); // Stop timers. They will not count until they are resumed.
918a290770fSMircea Trofin    data = ConstructRandomSet(state.range(0)); // Do something that should not be measured
919a290770fSMircea Trofin    state.ResumeTiming(); // And resume timers. They are now counting again.
920a290770fSMircea Trofin    // The rest will be measured.
921a290770fSMircea Trofin    for (int j = 0; j < state.range(1); ++j)
922a290770fSMircea Trofin      data.insert(RandomNumber());
923a290770fSMircea Trofin  }
924a290770fSMircea Trofin}
925a290770fSMircea TrofinBENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}});
926a290770fSMircea Trofin```
927*a5b79717SMircea Trofin<!-- {% endraw %} -->
928a290770fSMircea Trofin
929a290770fSMircea Trofin<a name="manual-timing" />
930a290770fSMircea Trofin
931a290770fSMircea Trofin## Manual Timing
932a290770fSMircea Trofin
933a290770fSMircea TrofinFor benchmarking something for which neither CPU time nor real-time are
934a290770fSMircea Trofincorrect or accurate enough, completely manual timing is supported using
935a290770fSMircea Trofinthe `UseManualTime` function.
936a290770fSMircea Trofin
937a290770fSMircea TrofinWhen `UseManualTime` is used, the benchmarked code must call
938a290770fSMircea Trofin`SetIterationTime` once per iteration of the benchmark loop to
939a290770fSMircea Trofinreport the manually measured time.
940a290770fSMircea Trofin
941a290770fSMircea TrofinAn example use case for this is benchmarking GPU execution (e.g. OpenCL
942a290770fSMircea Trofinor CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
943a290770fSMircea Trofinbe accurately measured using CPU time or real-time. Instead, they can be
944a290770fSMircea Trofinmeasured accurately using a dedicated API, and these measurement results
945a290770fSMircea Trofincan be reported back with `SetIterationTime`.
946a290770fSMircea Trofin
947a290770fSMircea Trofin```c++
948a290770fSMircea Trofinstatic void BM_ManualTiming(benchmark::State& state) {
949a290770fSMircea Trofin  int microseconds = state.range(0);
950a290770fSMircea Trofin  std::chrono::duration<double, std::micro> sleep_duration {
951a290770fSMircea Trofin    static_cast<double>(microseconds)
952a290770fSMircea Trofin  };
953a290770fSMircea Trofin
954a290770fSMircea Trofin  for (auto _ : state) {
955a290770fSMircea Trofin    auto start = std::chrono::high_resolution_clock::now();
956a290770fSMircea Trofin    // Simulate some useful workload with a sleep
957a290770fSMircea Trofin    std::this_thread::sleep_for(sleep_duration);
958a290770fSMircea Trofin    auto end = std::chrono::high_resolution_clock::now();
959a290770fSMircea Trofin
960a290770fSMircea Trofin    auto elapsed_seconds =
961a290770fSMircea Trofin      std::chrono::duration_cast<std::chrono::duration<double>>(
962a290770fSMircea Trofin        end - start);
963a290770fSMircea Trofin
964a290770fSMircea Trofin    state.SetIterationTime(elapsed_seconds.count());
965a290770fSMircea Trofin  }
966a290770fSMircea Trofin}
967a290770fSMircea TrofinBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
968a290770fSMircea Trofin```
969a290770fSMircea Trofin
970a290770fSMircea Trofin<a name="setting-the-time-unit" />
971a290770fSMircea Trofin
972a290770fSMircea Trofin## Setting the Time Unit
973a290770fSMircea Trofin
974a290770fSMircea TrofinIf a benchmark runs a few milliseconds it may be hard to visually compare the
975a290770fSMircea Trofinmeasured times, since the output data is given in nanoseconds per default. In
976a290770fSMircea Trofinorder to manually set the time unit, you can specify it manually:
977a290770fSMircea Trofin
978a290770fSMircea Trofin```c++
979a290770fSMircea TrofinBENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
980a290770fSMircea Trofin```
981a290770fSMircea Trofin
982*a5b79717SMircea TrofinAdditionally the default time unit can be set globally with the
983*a5b79717SMircea Trofin`--benchmark_time_unit={ns|us|ms|s}` command line argument. The argument only
984*a5b79717SMircea Trofinaffects benchmarks where the time unit is not set explicitly.
985*a5b79717SMircea Trofin
986a290770fSMircea Trofin<a name="preventing-optimization" />
987a290770fSMircea Trofin
988a290770fSMircea Trofin## Preventing Optimization
989a290770fSMircea Trofin
990a290770fSMircea TrofinTo prevent a value or expression from being optimized away by the compiler
991a290770fSMircea Trofinthe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
992a290770fSMircea Trofinfunctions can be used.
993a290770fSMircea Trofin
994a290770fSMircea Trofin```c++
995a290770fSMircea Trofinstatic void BM_test(benchmark::State& state) {
996a290770fSMircea Trofin  for (auto _ : state) {
997a290770fSMircea Trofin      int x = 0;
998a290770fSMircea Trofin      for (int i=0; i < 64; ++i) {
999a290770fSMircea Trofin        benchmark::DoNotOptimize(x += i);
1000a290770fSMircea Trofin      }
1001a290770fSMircea Trofin  }
1002a290770fSMircea Trofin}
1003a290770fSMircea Trofin```
1004a290770fSMircea Trofin
1005a290770fSMircea Trofin`DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
1006a290770fSMircea Trofinmemory or a register. For GNU based compilers it acts as read/write barrier
1007a290770fSMircea Trofinfor global memory. More specifically it forces the compiler to flush pending
1008a290770fSMircea Trofinwrites to memory and reload any other values as necessary.
1009a290770fSMircea Trofin
1010a290770fSMircea TrofinNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
1011a290770fSMircea Trofinin any way. `<expr>` may even be removed entirely when the result is already
1012a290770fSMircea Trofinknown. For example:
1013a290770fSMircea Trofin
1014a290770fSMircea Trofin```c++
1015a290770fSMircea Trofin  /* Example 1: `<expr>` is removed entirely. */
1016a290770fSMircea Trofin  int foo(int x) { return x + 42; }
1017a290770fSMircea Trofin  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
1018a290770fSMircea Trofin
1019a290770fSMircea Trofin  /*  Example 2: Result of '<expr>' is only reused */
1020a290770fSMircea Trofin  int bar(int) __attribute__((const));
1021a290770fSMircea Trofin  while (...) DoNotOptimize(bar(0)); // Optimized to:
1022a290770fSMircea Trofin  // int __result__ = bar(0);
1023a290770fSMircea Trofin  // while (...) DoNotOptimize(__result__);
1024a290770fSMircea Trofin```
1025a290770fSMircea Trofin
1026a290770fSMircea TrofinThe second tool for preventing optimizations is `ClobberMemory()`. In essence
1027a290770fSMircea Trofin`ClobberMemory()` forces the compiler to perform all pending writes to global
1028a290770fSMircea Trofinmemory. Memory managed by block scope objects must be "escaped" using
1029a290770fSMircea Trofin`DoNotOptimize(...)` before it can be clobbered. In the below example
1030a290770fSMircea Trofin`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
1031a290770fSMircea Trofinaway.
1032a290770fSMircea Trofin
1033a290770fSMircea Trofin```c++
1034a290770fSMircea Trofinstatic void BM_vector_push_back(benchmark::State& state) {
1035a290770fSMircea Trofin  for (auto _ : state) {
1036a290770fSMircea Trofin    std::vector<int> v;
1037a290770fSMircea Trofin    v.reserve(1);
1038*a5b79717SMircea Trofin    auto data = v.data();           // Allow v.data() to be clobbered. Pass as non-const
1039*a5b79717SMircea Trofin    benchmark::DoNotOptimize(data); // lvalue to avoid undesired compiler optimizations
1040a290770fSMircea Trofin    v.push_back(42);
1041a290770fSMircea Trofin    benchmark::ClobberMemory(); // Force 42 to be written to memory.
1042a290770fSMircea Trofin  }
1043a290770fSMircea Trofin}
1044a290770fSMircea Trofin```
1045a290770fSMircea Trofin
1046a290770fSMircea TrofinNote that `ClobberMemory()` is only available for GNU or MSVC based compilers.
1047a290770fSMircea Trofin
1048a290770fSMircea Trofin<a name="reporting-statistics" />
1049a290770fSMircea Trofin
1050a290770fSMircea Trofin## Statistics: Reporting the Mean, Median and Standard Deviation / Coefficient of variation of Repeated Benchmarks
1051a290770fSMircea Trofin
1052a290770fSMircea TrofinBy default each benchmark is run once and that single result is reported.
1053a290770fSMircea TrofinHowever benchmarks are often noisy and a single result may not be representative
1054a290770fSMircea Trofinof the overall behavior. For this reason it's possible to repeatedly rerun the
1055a290770fSMircea Trofinbenchmark.
1056a290770fSMircea Trofin
1057a290770fSMircea TrofinThe number of runs of each benchmark is specified globally by the
1058a290770fSMircea Trofin`--benchmark_repetitions` flag or on a per benchmark basis by calling
1059a290770fSMircea Trofin`Repetitions` on the registered benchmark object. When a benchmark is run more
1060a290770fSMircea Trofinthan once the mean, median, standard deviation and coefficient of variation
1061a290770fSMircea Trofinof the runs will be reported.
1062a290770fSMircea Trofin
1063a290770fSMircea TrofinAdditionally the `--benchmark_report_aggregates_only={true|false}`,
1064a290770fSMircea Trofin`--benchmark_display_aggregates_only={true|false}` flags or
1065a290770fSMircea Trofin`ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be
1066a290770fSMircea Trofinused to change how repeated tests are reported. By default the result of each
1067a290770fSMircea Trofinrepeated run is reported. When `report aggregates only` option is `true`,
1068a290770fSMircea Trofinonly the aggregates (i.e. mean, median, standard deviation and coefficient
1069a290770fSMircea Trofinof variation, maybe complexity measurements if they were requested) of the runs
1070a290770fSMircea Trofinis reported, to both the reporters - standard output (console), and the file.
1071a290770fSMircea TrofinHowever when only the `display aggregates only` option is `true`,
1072a290770fSMircea Trofinonly the aggregates are displayed in the standard output, while the file
1073a290770fSMircea Trofinoutput still contains everything.
1074a290770fSMircea TrofinCalling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a
1075a290770fSMircea Trofinregistered benchmark object overrides the value of the appropriate flag for that
1076a290770fSMircea Trofinbenchmark.
1077a290770fSMircea Trofin
1078a290770fSMircea Trofin<a name="custom-statistics" />
1079a290770fSMircea Trofin
1080a290770fSMircea Trofin## Custom Statistics
1081a290770fSMircea Trofin
1082a290770fSMircea TrofinWhile having these aggregates is nice, this may not be enough for everyone.
1083a290770fSMircea TrofinFor example you may want to know what the largest observation is, e.g. because
1084a290770fSMircea Trofinyou have some real-time constraints. This is easy. The following code will
1085a290770fSMircea Trofinspecify a custom statistic to be calculated, defined by a lambda function.
1086a290770fSMircea Trofin
1087a290770fSMircea Trofin```c++
1088a290770fSMircea Trofinvoid BM_spin_empty(benchmark::State& state) {
1089a290770fSMircea Trofin  for (auto _ : state) {
1090a290770fSMircea Trofin    for (int x = 0; x < state.range(0); ++x) {
1091a290770fSMircea Trofin      benchmark::DoNotOptimize(x);
1092a290770fSMircea Trofin    }
1093a290770fSMircea Trofin  }
1094a290770fSMircea Trofin}
1095a290770fSMircea Trofin
1096a290770fSMircea TrofinBENCHMARK(BM_spin_empty)
1097a290770fSMircea Trofin  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
1098a290770fSMircea Trofin    return *(std::max_element(std::begin(v), std::end(v)));
1099a290770fSMircea Trofin  })
1100a290770fSMircea Trofin  ->Arg(512);
1101a290770fSMircea Trofin```
1102a290770fSMircea Trofin
1103a290770fSMircea TrofinWhile usually the statistics produce values in time units,
1104a290770fSMircea Trofinyou can also produce percentages:
1105a290770fSMircea Trofin
1106a290770fSMircea Trofin```c++
1107a290770fSMircea Trofinvoid BM_spin_empty(benchmark::State& state) {
1108a290770fSMircea Trofin  for (auto _ : state) {
1109a290770fSMircea Trofin    for (int x = 0; x < state.range(0); ++x) {
1110a290770fSMircea Trofin      benchmark::DoNotOptimize(x);
1111a290770fSMircea Trofin    }
1112a290770fSMircea Trofin  }
1113a290770fSMircea Trofin}
1114a290770fSMircea Trofin
1115a290770fSMircea TrofinBENCHMARK(BM_spin_empty)
1116a290770fSMircea Trofin  ->ComputeStatistics("ratio", [](const std::vector<double>& v) -> double {
1117a290770fSMircea Trofin    return std::begin(v) / std::end(v);
1118*a5b79717SMircea Trofin  }, benchmark::StatisticUnit::kPercentage)
1119a290770fSMircea Trofin  ->Arg(512);
1120a290770fSMircea Trofin```
1121a290770fSMircea Trofin
1122*a5b79717SMircea Trofin<a name="memory-usage" />
1123*a5b79717SMircea Trofin
1124*a5b79717SMircea Trofin## Memory Usage
1125*a5b79717SMircea Trofin
1126*a5b79717SMircea TrofinIt's often useful to also track memory usage for benchmarks, alongside CPU
1127*a5b79717SMircea Trofinperformance. For this reason, benchmark offers the `RegisterMemoryManager`
1128*a5b79717SMircea Trofinmethod that allows a custom `MemoryManager` to be injected.
1129*a5b79717SMircea Trofin
1130*a5b79717SMircea TrofinIf set, the `MemoryManager::Start` and `MemoryManager::Stop` methods will be
1131*a5b79717SMircea Trofincalled at the start and end of benchmark runs to allow user code to fill out
1132*a5b79717SMircea Trofina report on the number of allocations, bytes used, etc.
1133*a5b79717SMircea Trofin
1134*a5b79717SMircea TrofinThis data will then be reported alongside other performance data, currently
1135*a5b79717SMircea Trofinonly when using JSON output.
1136*a5b79717SMircea Trofin
1137a290770fSMircea Trofin<a name="using-register-benchmark" />
1138a290770fSMircea Trofin
1139a290770fSMircea Trofin## Using RegisterBenchmark(name, fn, args...)
1140a290770fSMircea Trofin
1141a290770fSMircea TrofinThe `RegisterBenchmark(name, func, args...)` function provides an alternative
1142a290770fSMircea Trofinway to create and register benchmarks.
1143a290770fSMircea Trofin`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
1144a290770fSMircea Trofinpointer to a new benchmark with the specified `name` that invokes
1145a290770fSMircea Trofin`func(st, args...)` where `st` is a `benchmark::State` object.
1146a290770fSMircea Trofin
1147a290770fSMircea TrofinUnlike the `BENCHMARK` registration macros, which can only be used at the global
1148a290770fSMircea Trofinscope, the `RegisterBenchmark` can be called anywhere. This allows for
1149a290770fSMircea Trofinbenchmark tests to be registered programmatically.
1150a290770fSMircea Trofin
1151a290770fSMircea TrofinAdditionally `RegisterBenchmark` allows any callable object to be registered
1152a290770fSMircea Trofinas a benchmark. Including capturing lambdas and function objects.
1153a290770fSMircea Trofin
1154a290770fSMircea TrofinFor Example:
1155a290770fSMircea Trofin```c++
1156a290770fSMircea Trofinauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
1157a290770fSMircea Trofin
1158a290770fSMircea Trofinint main(int argc, char** argv) {
1159a290770fSMircea Trofin  for (auto& test_input : { /* ... */ })
1160a290770fSMircea Trofin      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
1161a290770fSMircea Trofin  benchmark::Initialize(&argc, argv);
1162a290770fSMircea Trofin  benchmark::RunSpecifiedBenchmarks();
1163a290770fSMircea Trofin  benchmark::Shutdown();
1164a290770fSMircea Trofin}
1165a290770fSMircea Trofin```
1166a290770fSMircea Trofin
1167a290770fSMircea Trofin<a name="exiting-with-an-error" />
1168a290770fSMircea Trofin
1169a290770fSMircea Trofin## Exiting with an Error
1170a290770fSMircea Trofin
1171a290770fSMircea TrofinWhen errors caused by external influences, such as file I/O and network
1172a290770fSMircea Trofincommunication, occur within a benchmark the
1173*a5b79717SMircea Trofin`State::SkipWithError(const std::string& msg)` function can be used to skip that run
1174a290770fSMircea Trofinof benchmark and report the error. Note that only future iterations of the
1175a290770fSMircea Trofin`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
1176a290770fSMircea TrofinUsers must explicitly exit the loop, otherwise all iterations will be performed.
1177a290770fSMircea TrofinUsers may explicitly return to exit the benchmark immediately.
1178a290770fSMircea Trofin
1179a290770fSMircea TrofinThe `SkipWithError(...)` function may be used at any point within the benchmark,
1180a290770fSMircea Trofinincluding before and after the benchmark loop. Moreover, if `SkipWithError(...)`
1181a290770fSMircea Trofinhas been used, it is not required to reach the benchmark loop and one may return
1182a290770fSMircea Trofinfrom the benchmark function early.
1183a290770fSMircea Trofin
1184a290770fSMircea TrofinFor example:
1185a290770fSMircea Trofin
1186a290770fSMircea Trofin```c++
1187a290770fSMircea Trofinstatic void BM_test(benchmark::State& state) {
1188a290770fSMircea Trofin  auto resource = GetResource();
1189a290770fSMircea Trofin  if (!resource.good()) {
1190a290770fSMircea Trofin    state.SkipWithError("Resource is not good!");
1191a290770fSMircea Trofin    // KeepRunning() loop will not be entered.
1192a290770fSMircea Trofin  }
1193a290770fSMircea Trofin  while (state.KeepRunning()) {
1194a290770fSMircea Trofin    auto data = resource.read_data();
1195a290770fSMircea Trofin    if (!resource.good()) {
1196a290770fSMircea Trofin      state.SkipWithError("Failed to read data!");
1197a290770fSMircea Trofin      break; // Needed to skip the rest of the iteration.
1198a290770fSMircea Trofin    }
1199a290770fSMircea Trofin    do_stuff(data);
1200a290770fSMircea Trofin  }
1201a290770fSMircea Trofin}
1202a290770fSMircea Trofin
1203a290770fSMircea Trofinstatic void BM_test_ranged_fo(benchmark::State & state) {
1204a290770fSMircea Trofin  auto resource = GetResource();
1205a290770fSMircea Trofin  if (!resource.good()) {
1206a290770fSMircea Trofin    state.SkipWithError("Resource is not good!");
1207a290770fSMircea Trofin    return; // Early return is allowed when SkipWithError() has been used.
1208a290770fSMircea Trofin  }
1209a290770fSMircea Trofin  for (auto _ : state) {
1210a290770fSMircea Trofin    auto data = resource.read_data();
1211a290770fSMircea Trofin    if (!resource.good()) {
1212a290770fSMircea Trofin      state.SkipWithError("Failed to read data!");
1213a290770fSMircea Trofin      break; // REQUIRED to prevent all further iterations.
1214a290770fSMircea Trofin    }
1215a290770fSMircea Trofin    do_stuff(data);
1216a290770fSMircea Trofin  }
1217a290770fSMircea Trofin}
1218a290770fSMircea Trofin```
1219a290770fSMircea Trofin<a name="a-faster-keep-running-loop" />
1220a290770fSMircea Trofin
1221a290770fSMircea Trofin## A Faster KeepRunning Loop
1222a290770fSMircea Trofin
1223a290770fSMircea TrofinIn C++11 mode, a ranged-based for loop should be used in preference to
1224a290770fSMircea Trofinthe `KeepRunning` loop for running the benchmarks. For example:
1225a290770fSMircea Trofin
1226a290770fSMircea Trofin```c++
1227a290770fSMircea Trofinstatic void BM_Fast(benchmark::State &state) {
1228a290770fSMircea Trofin  for (auto _ : state) {
1229a290770fSMircea Trofin    FastOperation();
1230a290770fSMircea Trofin  }
1231a290770fSMircea Trofin}
1232a290770fSMircea TrofinBENCHMARK(BM_Fast);
1233a290770fSMircea Trofin```
1234a290770fSMircea Trofin
1235a290770fSMircea TrofinThe reason the ranged-for loop is faster than using `KeepRunning`, is
1236a290770fSMircea Trofinbecause `KeepRunning` requires a memory load and store of the iteration count
1237a290770fSMircea Trofinever iteration, whereas the ranged-for variant is able to keep the iteration count
1238a290770fSMircea Trofinin a register.
1239a290770fSMircea Trofin
1240a290770fSMircea TrofinFor example, an empty inner loop of using the ranged-based for method looks like:
1241a290770fSMircea Trofin
1242a290770fSMircea Trofin```asm
1243a290770fSMircea Trofin# Loop Init
1244a290770fSMircea Trofin  mov rbx, qword ptr [r14 + 104]
1245a290770fSMircea Trofin  call benchmark::State::StartKeepRunning()
1246a290770fSMircea Trofin  test rbx, rbx
1247a290770fSMircea Trofin  je .LoopEnd
1248a290770fSMircea Trofin.LoopHeader: # =>This Inner Loop Header: Depth=1
1249a290770fSMircea Trofin  add rbx, -1
1250a290770fSMircea Trofin  jne .LoopHeader
1251a290770fSMircea Trofin.LoopEnd:
1252a290770fSMircea Trofin```
1253a290770fSMircea Trofin
1254a290770fSMircea TrofinCompared to an empty `KeepRunning` loop, which looks like:
1255a290770fSMircea Trofin
1256a290770fSMircea Trofin```asm
1257a290770fSMircea Trofin.LoopHeader: # in Loop: Header=BB0_3 Depth=1
1258a290770fSMircea Trofin  cmp byte ptr [rbx], 1
1259a290770fSMircea Trofin  jne .LoopInit
1260a290770fSMircea Trofin.LoopBody: # =>This Inner Loop Header: Depth=1
1261a290770fSMircea Trofin  mov rax, qword ptr [rbx + 8]
1262a290770fSMircea Trofin  lea rcx, [rax + 1]
1263a290770fSMircea Trofin  mov qword ptr [rbx + 8], rcx
1264a290770fSMircea Trofin  cmp rax, qword ptr [rbx + 104]
1265a290770fSMircea Trofin  jb .LoopHeader
1266a290770fSMircea Trofin  jmp .LoopEnd
1267a290770fSMircea Trofin.LoopInit:
1268a290770fSMircea Trofin  mov rdi, rbx
1269a290770fSMircea Trofin  call benchmark::State::StartKeepRunning()
1270a290770fSMircea Trofin  jmp .LoopBody
1271a290770fSMircea Trofin.LoopEnd:
1272a290770fSMircea Trofin```
1273a290770fSMircea Trofin
1274a290770fSMircea TrofinUnless C++03 compatibility is required, the ranged-for variant of writing
1275a290770fSMircea Trofinthe benchmark loop should be preferred.
1276a290770fSMircea Trofin
1277a290770fSMircea Trofin<a name="disabling-cpu-frequency-scaling" />
1278a290770fSMircea Trofin
1279a290770fSMircea Trofin## Disabling CPU Frequency Scaling
1280a290770fSMircea Trofin
1281a290770fSMircea TrofinIf you see this error:
1282a290770fSMircea Trofin
1283a290770fSMircea Trofin```
1284*a5b79717SMircea Trofin***WARNING*** CPU scaling is enabled, the benchmark real time measurements may
1285*a5b79717SMircea Trofinbe noisy and will incur extra overhead.
1286a290770fSMircea Trofin```
1287a290770fSMircea Trofin
1288*a5b79717SMircea Trofinyou might want to disable the CPU frequency scaling while running the
1289*a5b79717SMircea Trofinbenchmark, as well as consider other ways to stabilize the performance of
1290*a5b79717SMircea Trofinyour system while benchmarking.
1291a290770fSMircea Trofin
1292*a5b79717SMircea TrofinSee [Reducing Variance](reducing_variance.md) for more information.
1293