xref: /netbsd-src/external/apache2/llvm/dist/llvm/utils/benchmark/README.md (revision 7330f729ccf0bd976a06f95fad452fe774fc7fd1)
1*7330f729Sjoerg# benchmark
2*7330f729Sjoerg[![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
3*7330f729Sjoerg[![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
4*7330f729Sjoerg[![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
5*7330f729Sjoerg[![slackin](https://slackin-iqtfqnpzxd.now.sh/badge.svg)](https://slackin-iqtfqnpzxd.now.sh/)
6*7330f729Sjoerg
7*7330f729SjoergA library to support the benchmarking of functions, similar to unit-tests.
8*7330f729Sjoerg
9*7330f729SjoergDiscussion group: https://groups.google.com/d/forum/benchmark-discuss
10*7330f729Sjoerg
11*7330f729SjoergIRC channel: https://freenode.net #googlebenchmark
12*7330f729Sjoerg
13*7330f729Sjoerg[Known issues and common problems](#known-issues)
14*7330f729Sjoerg
15*7330f729Sjoerg[Additional Tooling Documentation](docs/tools.md)
16*7330f729Sjoerg
17*7330f729Sjoerg[Assembly Testing Documentation](docs/AssemblyTests.md)
18*7330f729Sjoerg
19*7330f729Sjoerg
20*7330f729Sjoerg## Building
21*7330f729Sjoerg
22*7330f729SjoergThe basic steps for configuring and building the library look like this:
23*7330f729Sjoerg
24*7330f729Sjoerg```bash
25*7330f729Sjoerg$ git clone https://github.com/google/benchmark.git
26*7330f729Sjoerg# Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory.
27*7330f729Sjoerg$ git clone https://github.com/google/googletest.git benchmark/googletest
28*7330f729Sjoerg$ mkdir build && cd build
29*7330f729Sjoerg$ cmake -G <generator> [options] ../benchmark
30*7330f729Sjoerg# Assuming a makefile generator was used
31*7330f729Sjoerg$ make
32*7330f729Sjoerg```
33*7330f729Sjoerg
34*7330f729SjoergNote that Google Benchmark requires Google Test to build and run the tests. This
35*7330f729Sjoergdependency can be provided two ways:
36*7330f729Sjoerg
37*7330f729Sjoerg* Checkout the Google Test sources into `benchmark/googletest` as above.
38*7330f729Sjoerg* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
39*7330f729Sjoerg  configuration, the library will automatically download and build any required
40*7330f729Sjoerg  dependencies.
41*7330f729Sjoerg
42*7330f729SjoergIf you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF`
43*7330f729Sjoergto `CMAKE_ARGS`.
44*7330f729Sjoerg
45*7330f729Sjoerg
46*7330f729Sjoerg## Installation Guide
47*7330f729Sjoerg
48*7330f729SjoergFor Ubuntu and Debian Based System
49*7330f729Sjoerg
50*7330f729SjoergFirst make sure you have git and cmake installed (If not please install it)
51*7330f729Sjoerg
52*7330f729Sjoerg```
53*7330f729Sjoergsudo apt-get install git
54*7330f729Sjoergsudo apt-get install cmake
55*7330f729Sjoerg```
56*7330f729Sjoerg
57*7330f729SjoergNow, let's clone the repository and build it
58*7330f729Sjoerg
59*7330f729Sjoerg```
60*7330f729Sjoerggit clone https://github.com/google/benchmark.git
61*7330f729Sjoergcd benchmark
62*7330f729Sjoerggit clone https://github.com/google/googletest.git
63*7330f729Sjoergmkdir build
64*7330f729Sjoergcd build
65*7330f729Sjoergcmake .. -DCMAKE_BUILD_TYPE=RELEASE
66*7330f729Sjoergmake
67*7330f729Sjoerg```
68*7330f729Sjoerg
69*7330f729SjoergWe need to install the library globally now
70*7330f729Sjoerg
71*7330f729Sjoerg```
72*7330f729Sjoergsudo make install
73*7330f729Sjoerg```
74*7330f729Sjoerg
75*7330f729SjoergNow you have google/benchmark installed in your machine
76*7330f729SjoergNote: Don't forget to link to pthread library while building
77*7330f729Sjoerg
78*7330f729Sjoerg## Stable and Experimental Library Versions
79*7330f729Sjoerg
80*7330f729SjoergThe main branch contains the latest stable version of the benchmarking library;
81*7330f729Sjoergthe API of which can be considered largely stable, with source breaking changes
82*7330f729Sjoergbeing made only upon the release of a new major version.
83*7330f729Sjoerg
84*7330f729SjoergNewer, experimental, features are implemented and tested on the
85*7330f729Sjoerg[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
86*7330f729Sjoergto use, test, and provide feedback on the new features are encouraged to try
87*7330f729Sjoergthis branch. However, this branch provides no stability guarantees and reserves
88*7330f729Sjoergthe right to change and break the API at any time.
89*7330f729Sjoerg
90*7330f729Sjoerg##Prerequisite knowledge
91*7330f729Sjoerg
92*7330f729SjoergBefore attempting to understand this framework one should ideally have some familiarity with the structure and format of the Google Test framework, upon which it is based. Documentation for Google Test, including a "Getting Started" (primer) guide, is available here:
93*7330f729Sjoerghttps://github.com/google/googletest/blob/master/googletest/docs/Documentation.md
94*7330f729Sjoerg
95*7330f729Sjoerg
96*7330f729Sjoerg## Example usage
97*7330f729Sjoerg### Basic usage
98*7330f729SjoergDefine a function that executes the code to be measured.
99*7330f729Sjoerg
100*7330f729Sjoerg```c++
101*7330f729Sjoerg#include <benchmark/benchmark.h>
102*7330f729Sjoerg
103*7330f729Sjoergstatic void BM_StringCreation(benchmark::State& state) {
104*7330f729Sjoerg  for (auto _ : state)
105*7330f729Sjoerg    std::string empty_string;
106*7330f729Sjoerg}
107*7330f729Sjoerg// Register the function as a benchmark
108*7330f729SjoergBENCHMARK(BM_StringCreation);
109*7330f729Sjoerg
110*7330f729Sjoerg// Define another benchmark
111*7330f729Sjoergstatic void BM_StringCopy(benchmark::State& state) {
112*7330f729Sjoerg  std::string x = "hello";
113*7330f729Sjoerg  for (auto _ : state)
114*7330f729Sjoerg    std::string copy(x);
115*7330f729Sjoerg}
116*7330f729SjoergBENCHMARK(BM_StringCopy);
117*7330f729Sjoerg
118*7330f729SjoergBENCHMARK_MAIN();
119*7330f729Sjoerg```
120*7330f729Sjoerg
121*7330f729SjoergDon't forget to inform your linker to add benchmark library e.g. through
122*7330f729Sjoerg`-lbenchmark` compilation flag. Alternatively, you may leave out the
123*7330f729Sjoerg`BENCHMARK_MAIN();` at the end of the source file and link against
124*7330f729Sjoerg`-lbenchmark_main` to get the same default behavior.
125*7330f729Sjoerg
126*7330f729SjoergThe benchmark library will reporting the timing for the code within the `for(...)` loop.
127*7330f729Sjoerg
128*7330f729Sjoerg### Passing arguments
129*7330f729SjoergSometimes a family of benchmarks can be implemented with just one routine that
130*7330f729Sjoergtakes an extra argument to specify which one of the family of benchmarks to
131*7330f729Sjoergrun. For example, the following code defines a family of benchmarks for
132*7330f729Sjoergmeasuring the speed of `memcpy()` calls of different lengths:
133*7330f729Sjoerg
134*7330f729Sjoerg```c++
135*7330f729Sjoergstatic void BM_memcpy(benchmark::State& state) {
136*7330f729Sjoerg  char* src = new char[state.range(0)];
137*7330f729Sjoerg  char* dst = new char[state.range(0)];
138*7330f729Sjoerg  memset(src, 'x', state.range(0));
139*7330f729Sjoerg  for (auto _ : state)
140*7330f729Sjoerg    memcpy(dst, src, state.range(0));
141*7330f729Sjoerg  state.SetBytesProcessed(int64_t(state.iterations()) *
142*7330f729Sjoerg                          int64_t(state.range(0)));
143*7330f729Sjoerg  delete[] src;
144*7330f729Sjoerg  delete[] dst;
145*7330f729Sjoerg}
146*7330f729SjoergBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
147*7330f729Sjoerg```
148*7330f729Sjoerg
149*7330f729SjoergThe preceding code is quite repetitive, and can be replaced with the following
150*7330f729Sjoergshort-hand. The following invocation will pick a few appropriate arguments in
151*7330f729Sjoergthe specified range and will generate a benchmark for each such argument.
152*7330f729Sjoerg
153*7330f729Sjoerg```c++
154*7330f729SjoergBENCHMARK(BM_memcpy)->Range(8, 8<<10);
155*7330f729Sjoerg```
156*7330f729Sjoerg
157*7330f729SjoergBy default the arguments in the range are generated in multiples of eight and
158*7330f729Sjoergthe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
159*7330f729Sjoergrange multiplier is changed to multiples of two.
160*7330f729Sjoerg
161*7330f729Sjoerg```c++
162*7330f729SjoergBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
163*7330f729Sjoerg```
164*7330f729SjoergNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
165*7330f729Sjoerg
166*7330f729SjoergYou might have a benchmark that depends on two or more inputs. For example, the
167*7330f729Sjoergfollowing code defines a family of benchmarks for measuring the speed of set
168*7330f729Sjoerginsertion.
169*7330f729Sjoerg
170*7330f729Sjoerg```c++
171*7330f729Sjoergstatic void BM_SetInsert(benchmark::State& state) {
172*7330f729Sjoerg  std::set<int> data;
173*7330f729Sjoerg  for (auto _ : state) {
174*7330f729Sjoerg    state.PauseTiming();
175*7330f729Sjoerg    data = ConstructRandomSet(state.range(0));
176*7330f729Sjoerg    state.ResumeTiming();
177*7330f729Sjoerg    for (int j = 0; j < state.range(1); ++j)
178*7330f729Sjoerg      data.insert(RandomNumber());
179*7330f729Sjoerg  }
180*7330f729Sjoerg}
181*7330f729SjoergBENCHMARK(BM_SetInsert)
182*7330f729Sjoerg    ->Args({1<<10, 128})
183*7330f729Sjoerg    ->Args({2<<10, 128})
184*7330f729Sjoerg    ->Args({4<<10, 128})
185*7330f729Sjoerg    ->Args({8<<10, 128})
186*7330f729Sjoerg    ->Args({1<<10, 512})
187*7330f729Sjoerg    ->Args({2<<10, 512})
188*7330f729Sjoerg    ->Args({4<<10, 512})
189*7330f729Sjoerg    ->Args({8<<10, 512});
190*7330f729Sjoerg```
191*7330f729Sjoerg
192*7330f729SjoergThe preceding code is quite repetitive, and can be replaced with the following
193*7330f729Sjoergshort-hand. The following macro will pick a few appropriate arguments in the
194*7330f729Sjoergproduct of the two specified ranges and will generate a benchmark for each such
195*7330f729Sjoergpair.
196*7330f729Sjoerg
197*7330f729Sjoerg```c++
198*7330f729SjoergBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
199*7330f729Sjoerg```
200*7330f729Sjoerg
201*7330f729SjoergFor more complex patterns of inputs, passing a custom function to `Apply` allows
202*7330f729Sjoergprogrammatic specification of an arbitrary set of arguments on which to run the
203*7330f729Sjoergbenchmark. The following example enumerates a dense range on one parameter,
204*7330f729Sjoergand a sparse range on the second.
205*7330f729Sjoerg
206*7330f729Sjoerg```c++
207*7330f729Sjoergstatic void CustomArguments(benchmark::internal::Benchmark* b) {
208*7330f729Sjoerg  for (int i = 0; i <= 10; ++i)
209*7330f729Sjoerg    for (int j = 32; j <= 1024*1024; j *= 8)
210*7330f729Sjoerg      b->Args({i, j});
211*7330f729Sjoerg}
212*7330f729SjoergBENCHMARK(BM_SetInsert)->Apply(CustomArguments);
213*7330f729Sjoerg```
214*7330f729Sjoerg
215*7330f729Sjoerg### Calculate asymptotic complexity (Big O)
216*7330f729SjoergAsymptotic complexity might be calculated for a family of benchmarks. The
217*7330f729Sjoergfollowing code will calculate the coefficient for the high-order term in the
218*7330f729Sjoergrunning time and the normalized root-mean square error of string comparison.
219*7330f729Sjoerg
220*7330f729Sjoerg```c++
221*7330f729Sjoergstatic void BM_StringCompare(benchmark::State& state) {
222*7330f729Sjoerg  std::string s1(state.range(0), '-');
223*7330f729Sjoerg  std::string s2(state.range(0), '-');
224*7330f729Sjoerg  for (auto _ : state) {
225*7330f729Sjoerg    benchmark::DoNotOptimize(s1.compare(s2));
226*7330f729Sjoerg  }
227*7330f729Sjoerg  state.SetComplexityN(state.range(0));
228*7330f729Sjoerg}
229*7330f729SjoergBENCHMARK(BM_StringCompare)
230*7330f729Sjoerg    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
231*7330f729Sjoerg```
232*7330f729Sjoerg
233*7330f729SjoergAs shown in the following invocation, asymptotic complexity might also be
234*7330f729Sjoergcalculated automatically.
235*7330f729Sjoerg
236*7330f729Sjoerg```c++
237*7330f729SjoergBENCHMARK(BM_StringCompare)
238*7330f729Sjoerg    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
239*7330f729Sjoerg```
240*7330f729Sjoerg
241*7330f729SjoergThe following code will specify asymptotic complexity with a lambda function,
242*7330f729Sjoergthat might be used to customize high-order term calculation.
243*7330f729Sjoerg
244*7330f729Sjoerg```c++
245*7330f729SjoergBENCHMARK(BM_StringCompare)->RangeMultiplier(2)
246*7330f729Sjoerg    ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; });
247*7330f729Sjoerg```
248*7330f729Sjoerg
249*7330f729Sjoerg### Templated benchmarks
250*7330f729SjoergTemplated benchmarks work the same way: This example produces and consumes
251*7330f729Sjoergmessages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
252*7330f729Sjoergabsence of multiprogramming.
253*7330f729Sjoerg
254*7330f729Sjoerg```c++
255*7330f729Sjoergtemplate <class Q> int BM_Sequential(benchmark::State& state) {
256*7330f729Sjoerg  Q q;
257*7330f729Sjoerg  typename Q::value_type v;
258*7330f729Sjoerg  for (auto _ : state) {
259*7330f729Sjoerg    for (int i = state.range(0); i--; )
260*7330f729Sjoerg      q.push(v);
261*7330f729Sjoerg    for (int e = state.range(0); e--; )
262*7330f729Sjoerg      q.Wait(&v);
263*7330f729Sjoerg  }
264*7330f729Sjoerg  // actually messages, not bytes:
265*7330f729Sjoerg  state.SetBytesProcessed(
266*7330f729Sjoerg      static_cast<int64_t>(state.iterations())*state.range(0));
267*7330f729Sjoerg}
268*7330f729SjoergBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
269*7330f729Sjoerg```
270*7330f729Sjoerg
271*7330f729SjoergThree macros are provided for adding benchmark templates.
272*7330f729Sjoerg
273*7330f729Sjoerg```c++
274*7330f729Sjoerg#ifdef BENCHMARK_HAS_CXX11
275*7330f729Sjoerg#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
276*7330f729Sjoerg#else // C++ < C++11
277*7330f729Sjoerg#define BENCHMARK_TEMPLATE(func, arg1)
278*7330f729Sjoerg#endif
279*7330f729Sjoerg#define BENCHMARK_TEMPLATE1(func, arg1)
280*7330f729Sjoerg#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
281*7330f729Sjoerg```
282*7330f729Sjoerg
283*7330f729Sjoerg### A Faster KeepRunning loop
284*7330f729Sjoerg
285*7330f729SjoergIn C++11 mode, a ranged-based for loop should be used in preference to
286*7330f729Sjoergthe `KeepRunning` loop for running the benchmarks. For example:
287*7330f729Sjoerg
288*7330f729Sjoerg```c++
289*7330f729Sjoergstatic void BM_Fast(benchmark::State &state) {
290*7330f729Sjoerg  for (auto _ : state) {
291*7330f729Sjoerg    FastOperation();
292*7330f729Sjoerg  }
293*7330f729Sjoerg}
294*7330f729SjoergBENCHMARK(BM_Fast);
295*7330f729Sjoerg```
296*7330f729Sjoerg
297*7330f729SjoergThe reason the ranged-for loop is faster than using `KeepRunning`, is
298*7330f729Sjoergbecause `KeepRunning` requires a memory load and store of the iteration count
299*7330f729Sjoergever iteration, whereas the ranged-for variant is able to keep the iteration count
300*7330f729Sjoergin a register.
301*7330f729Sjoerg
302*7330f729SjoergFor example, an empty inner loop of using the ranged-based for method looks like:
303*7330f729Sjoerg
304*7330f729Sjoerg```asm
305*7330f729Sjoerg# Loop Init
306*7330f729Sjoerg  mov rbx, qword ptr [r14 + 104]
307*7330f729Sjoerg  call benchmark::State::StartKeepRunning()
308*7330f729Sjoerg  test rbx, rbx
309*7330f729Sjoerg  je .LoopEnd
310*7330f729Sjoerg.LoopHeader: # =>This Inner Loop Header: Depth=1
311*7330f729Sjoerg  add rbx, -1
312*7330f729Sjoerg  jne .LoopHeader
313*7330f729Sjoerg.LoopEnd:
314*7330f729Sjoerg```
315*7330f729Sjoerg
316*7330f729SjoergCompared to an empty `KeepRunning` loop, which looks like:
317*7330f729Sjoerg
318*7330f729Sjoerg```asm
319*7330f729Sjoerg.LoopHeader: # in Loop: Header=BB0_3 Depth=1
320*7330f729Sjoerg  cmp byte ptr [rbx], 1
321*7330f729Sjoerg  jne .LoopInit
322*7330f729Sjoerg.LoopBody: # =>This Inner Loop Header: Depth=1
323*7330f729Sjoerg  mov rax, qword ptr [rbx + 8]
324*7330f729Sjoerg  lea rcx, [rax + 1]
325*7330f729Sjoerg  mov qword ptr [rbx + 8], rcx
326*7330f729Sjoerg  cmp rax, qword ptr [rbx + 104]
327*7330f729Sjoerg  jb .LoopHeader
328*7330f729Sjoerg  jmp .LoopEnd
329*7330f729Sjoerg.LoopInit:
330*7330f729Sjoerg  mov rdi, rbx
331*7330f729Sjoerg  call benchmark::State::StartKeepRunning()
332*7330f729Sjoerg  jmp .LoopBody
333*7330f729Sjoerg.LoopEnd:
334*7330f729Sjoerg```
335*7330f729Sjoerg
336*7330f729SjoergUnless C++03 compatibility is required, the ranged-for variant of writing
337*7330f729Sjoergthe benchmark loop should be preferred.
338*7330f729Sjoerg
339*7330f729Sjoerg## Passing arbitrary arguments to a benchmark
340*7330f729SjoergIn C++11 it is possible to define a benchmark that takes an arbitrary number
341*7330f729Sjoergof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
342*7330f729Sjoergmacro creates a benchmark that invokes `func`  with the `benchmark::State` as
343*7330f729Sjoergthe first argument followed by the specified `args...`.
344*7330f729SjoergThe `test_case_name` is appended to the name of the benchmark and
345*7330f729Sjoergshould describe the values passed.
346*7330f729Sjoerg
347*7330f729Sjoerg```c++
348*7330f729Sjoergtemplate <class ...ExtraArgs>
349*7330f729Sjoergvoid BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
350*7330f729Sjoerg  [...]
351*7330f729Sjoerg}
352*7330f729Sjoerg// Registers a benchmark named "BM_takes_args/int_string_test" that passes
353*7330f729Sjoerg// the specified values to `extra_args`.
354*7330f729SjoergBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
355*7330f729Sjoerg```
356*7330f729SjoergNote that elements of `...args` may refer to global variables. Users should
357*7330f729Sjoergavoid modifying global state inside of a benchmark.
358*7330f729Sjoerg
359*7330f729Sjoerg## Using RegisterBenchmark(name, fn, args...)
360*7330f729Sjoerg
361*7330f729SjoergThe `RegisterBenchmark(name, func, args...)` function provides an alternative
362*7330f729Sjoergway to create and register benchmarks.
363*7330f729Sjoerg`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
364*7330f729Sjoergpointer to a new benchmark with the specified `name` that invokes
365*7330f729Sjoerg`func(st, args...)` where `st` is a `benchmark::State` object.
366*7330f729Sjoerg
367*7330f729SjoergUnlike the `BENCHMARK` registration macros, which can only be used at the global
368*7330f729Sjoergscope, the `RegisterBenchmark` can be called anywhere. This allows for
369*7330f729Sjoergbenchmark tests to be registered programmatically.
370*7330f729Sjoerg
371*7330f729SjoergAdditionally `RegisterBenchmark` allows any callable object to be registered
372*7330f729Sjoergas a benchmark. Including capturing lambdas and function objects.
373*7330f729Sjoerg
374*7330f729SjoergFor Example:
375*7330f729Sjoerg```c++
376*7330f729Sjoergauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
377*7330f729Sjoerg
378*7330f729Sjoergint main(int argc, char** argv) {
379*7330f729Sjoerg  for (auto& test_input : { /* ... */ })
380*7330f729Sjoerg      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
381*7330f729Sjoerg  benchmark::Initialize(&argc, argv);
382*7330f729Sjoerg  benchmark::RunSpecifiedBenchmarks();
383*7330f729Sjoerg}
384*7330f729Sjoerg```
385*7330f729Sjoerg
386*7330f729Sjoerg### Multithreaded benchmarks
387*7330f729SjoergIn a multithreaded test (benchmark invoked by multiple threads simultaneously),
388*7330f729Sjoergit is guaranteed that none of the threads will start until all have reached
389*7330f729Sjoergthe start of the benchmark loop, and all will have finished before any thread
390*7330f729Sjoergexits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
391*7330f729SjoergAPI) As such, any global setup or teardown can be wrapped in a check against the thread
392*7330f729Sjoergindex:
393*7330f729Sjoerg
394*7330f729Sjoerg```c++
395*7330f729Sjoergstatic void BM_MultiThreaded(benchmark::State& state) {
396*7330f729Sjoerg  if (state.thread_index == 0) {
397*7330f729Sjoerg    // Setup code here.
398*7330f729Sjoerg  }
399*7330f729Sjoerg  for (auto _ : state) {
400*7330f729Sjoerg    // Run the test as normal.
401*7330f729Sjoerg  }
402*7330f729Sjoerg  if (state.thread_index == 0) {
403*7330f729Sjoerg    // Teardown code here.
404*7330f729Sjoerg  }
405*7330f729Sjoerg}
406*7330f729SjoergBENCHMARK(BM_MultiThreaded)->Threads(2);
407*7330f729Sjoerg```
408*7330f729Sjoerg
409*7330f729SjoergIf the benchmarked code itself uses threads and you want to compare it to
410*7330f729Sjoergsingle-threaded code, you may want to use real-time ("wallclock") measurements
411*7330f729Sjoergfor latency comparisons:
412*7330f729Sjoerg
413*7330f729Sjoerg```c++
414*7330f729SjoergBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
415*7330f729Sjoerg```
416*7330f729Sjoerg
417*7330f729SjoergWithout `UseRealTime`, CPU time is used by default.
418*7330f729Sjoerg
419*7330f729Sjoerg
420*7330f729Sjoerg## Manual timing
421*7330f729SjoergFor benchmarking something for which neither CPU time nor real-time are
422*7330f729Sjoergcorrect or accurate enough, completely manual timing is supported using
423*7330f729Sjoergthe `UseManualTime` function.
424*7330f729Sjoerg
425*7330f729SjoergWhen `UseManualTime` is used, the benchmarked code must call
426*7330f729Sjoerg`SetIterationTime` once per iteration of the benchmark loop to
427*7330f729Sjoergreport the manually measured time.
428*7330f729Sjoerg
429*7330f729SjoergAn example use case for this is benchmarking GPU execution (e.g. OpenCL
430*7330f729Sjoergor CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
431*7330f729Sjoergbe accurately measured using CPU time or real-time. Instead, they can be
432*7330f729Sjoergmeasured accurately using a dedicated API, and these measurement results
433*7330f729Sjoergcan be reported back with `SetIterationTime`.
434*7330f729Sjoerg
435*7330f729Sjoerg```c++
436*7330f729Sjoergstatic void BM_ManualTiming(benchmark::State& state) {
437*7330f729Sjoerg  int microseconds = state.range(0);
438*7330f729Sjoerg  std::chrono::duration<double, std::micro> sleep_duration {
439*7330f729Sjoerg    static_cast<double>(microseconds)
440*7330f729Sjoerg  };
441*7330f729Sjoerg
442*7330f729Sjoerg  for (auto _ : state) {
443*7330f729Sjoerg    auto start = std::chrono::high_resolution_clock::now();
444*7330f729Sjoerg    // Simulate some useful workload with a sleep
445*7330f729Sjoerg    std::this_thread::sleep_for(sleep_duration);
446*7330f729Sjoerg    auto end   = std::chrono::high_resolution_clock::now();
447*7330f729Sjoerg
448*7330f729Sjoerg    auto elapsed_seconds =
449*7330f729Sjoerg      std::chrono::duration_cast<std::chrono::duration<double>>(
450*7330f729Sjoerg        end - start);
451*7330f729Sjoerg
452*7330f729Sjoerg    state.SetIterationTime(elapsed_seconds.count());
453*7330f729Sjoerg  }
454*7330f729Sjoerg}
455*7330f729SjoergBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
456*7330f729Sjoerg```
457*7330f729Sjoerg
458*7330f729Sjoerg### Preventing optimisation
459*7330f729SjoergTo prevent a value or expression from being optimized away by the compiler
460*7330f729Sjoergthe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
461*7330f729Sjoergfunctions can be used.
462*7330f729Sjoerg
463*7330f729Sjoerg```c++
464*7330f729Sjoergstatic void BM_test(benchmark::State& state) {
465*7330f729Sjoerg  for (auto _ : state) {
466*7330f729Sjoerg      int x = 0;
467*7330f729Sjoerg      for (int i=0; i < 64; ++i) {
468*7330f729Sjoerg        benchmark::DoNotOptimize(x += i);
469*7330f729Sjoerg      }
470*7330f729Sjoerg  }
471*7330f729Sjoerg}
472*7330f729Sjoerg```
473*7330f729Sjoerg
474*7330f729Sjoerg`DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
475*7330f729Sjoergmemory or a register. For GNU based compilers it acts as read/write barrier
476*7330f729Sjoergfor global memory. More specifically it forces the compiler to flush pending
477*7330f729Sjoergwrites to memory and reload any other values as necessary.
478*7330f729Sjoerg
479*7330f729SjoergNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
480*7330f729Sjoergin any way. `<expr>` may even be removed entirely when the result is already
481*7330f729Sjoergknown. For example:
482*7330f729Sjoerg
483*7330f729Sjoerg```c++
484*7330f729Sjoerg  /* Example 1: `<expr>` is removed entirely. */
485*7330f729Sjoerg  int foo(int x) { return x + 42; }
486*7330f729Sjoerg  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
487*7330f729Sjoerg
488*7330f729Sjoerg  /*  Example 2: Result of '<expr>' is only reused */
489*7330f729Sjoerg  int bar(int) __attribute__((const));
490*7330f729Sjoerg  while (...) DoNotOptimize(bar(0)); // Optimized to:
491*7330f729Sjoerg  // int __result__ = bar(0);
492*7330f729Sjoerg  // while (...) DoNotOptimize(__result__);
493*7330f729Sjoerg```
494*7330f729Sjoerg
495*7330f729SjoergThe second tool for preventing optimizations is `ClobberMemory()`. In essence
496*7330f729Sjoerg`ClobberMemory()` forces the compiler to perform all pending writes to global
497*7330f729Sjoergmemory. Memory managed by block scope objects must be "escaped" using
498*7330f729Sjoerg`DoNotOptimize(...)` before it can be clobbered. In the below example
499*7330f729Sjoerg`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
500*7330f729Sjoergaway.
501*7330f729Sjoerg
502*7330f729Sjoerg```c++
503*7330f729Sjoergstatic void BM_vector_push_back(benchmark::State& state) {
504*7330f729Sjoerg  for (auto _ : state) {
505*7330f729Sjoerg    std::vector<int> v;
506*7330f729Sjoerg    v.reserve(1);
507*7330f729Sjoerg    benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
508*7330f729Sjoerg    v.push_back(42);
509*7330f729Sjoerg    benchmark::ClobberMemory(); // Force 42 to be written to memory.
510*7330f729Sjoerg  }
511*7330f729Sjoerg}
512*7330f729Sjoerg```
513*7330f729Sjoerg
514*7330f729SjoergNote that `ClobberMemory()` is only available for GNU or MSVC based compilers.
515*7330f729Sjoerg
516*7330f729Sjoerg### Set time unit manually
517*7330f729SjoergIf a benchmark runs a few milliseconds it may be hard to visually compare the
518*7330f729Sjoergmeasured times, since the output data is given in nanoseconds per default. In
519*7330f729Sjoergorder to manually set the time unit, you can specify it manually:
520*7330f729Sjoerg
521*7330f729Sjoerg```c++
522*7330f729SjoergBENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
523*7330f729Sjoerg```
524*7330f729Sjoerg
525*7330f729Sjoerg## Controlling number of iterations
526*7330f729SjoergIn all cases, the number of iterations for which the benchmark is run is
527*7330f729Sjoerggoverned by the amount of time the benchmark takes. Concretely, the number of
528*7330f729Sjoergiterations is at least one, not more than 1e9, until CPU time is greater than
529*7330f729Sjoergthe minimum time, or the wallclock time is 5x minimum time. The minimum time is
530*7330f729Sjoergset as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on
531*7330f729Sjoergthe registered benchmark object.
532*7330f729Sjoerg
533*7330f729Sjoerg## Reporting the mean, median and standard deviation by repeated benchmarks
534*7330f729SjoergBy default each benchmark is run once and that single result is reported.
535*7330f729SjoergHowever benchmarks are often noisy and a single result may not be representative
536*7330f729Sjoergof the overall behavior. For this reason it's possible to repeatedly rerun the
537*7330f729Sjoergbenchmark.
538*7330f729Sjoerg
539*7330f729SjoergThe number of runs of each benchmark is specified globally by the
540*7330f729Sjoerg`--benchmark_repetitions` flag or on a per benchmark basis by calling
541*7330f729Sjoerg`Repetitions` on the registered benchmark object. When a benchmark is run more
542*7330f729Sjoergthan once the mean, median and standard deviation of the runs will be reported.
543*7330f729Sjoerg
544*7330f729SjoergAdditionally the `--benchmark_report_aggregates_only={true|false}` flag or
545*7330f729Sjoerg`ReportAggregatesOnly(bool)` function can be used to change how repeated tests
546*7330f729Sjoergare reported. By default the result of each repeated run is reported. When this
547*7330f729Sjoergoption is `true` only the mean, median and standard deviation of the runs is reported.
548*7330f729SjoergCalling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides
549*7330f729Sjoergthe value of the flag for that benchmark.
550*7330f729Sjoerg
551*7330f729Sjoerg## User-defined statistics for repeated benchmarks
552*7330f729SjoergWhile having mean, median and standard deviation is nice, this may not be
553*7330f729Sjoergenough for everyone. For example you may want to know what is the largest
554*7330f729Sjoergobservation, e.g. because you have some real-time constraints. This is easy.
555*7330f729SjoergThe following code will specify a custom statistic to be calculated, defined
556*7330f729Sjoergby a lambda function.
557*7330f729Sjoerg
558*7330f729Sjoerg```c++
559*7330f729Sjoergvoid BM_spin_empty(benchmark::State& state) {
560*7330f729Sjoerg  for (auto _ : state) {
561*7330f729Sjoerg    for (int x = 0; x < state.range(0); ++x) {
562*7330f729Sjoerg      benchmark::DoNotOptimize(x);
563*7330f729Sjoerg    }
564*7330f729Sjoerg  }
565*7330f729Sjoerg}
566*7330f729Sjoerg
567*7330f729SjoergBENCHMARK(BM_spin_empty)
568*7330f729Sjoerg  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
569*7330f729Sjoerg    return *(std::max_element(std::begin(v), std::end(v)));
570*7330f729Sjoerg  })
571*7330f729Sjoerg  ->Arg(512);
572*7330f729Sjoerg```
573*7330f729Sjoerg
574*7330f729Sjoerg## Fixtures
575*7330f729SjoergFixture tests are created by
576*7330f729Sjoergfirst defining a type that derives from `::benchmark::Fixture` and then
577*7330f729Sjoergcreating/registering the tests using the following macros:
578*7330f729Sjoerg
579*7330f729Sjoerg* `BENCHMARK_F(ClassName, Method)`
580*7330f729Sjoerg* `BENCHMARK_DEFINE_F(ClassName, Method)`
581*7330f729Sjoerg* `BENCHMARK_REGISTER_F(ClassName, Method)`
582*7330f729Sjoerg
583*7330f729SjoergFor Example:
584*7330f729Sjoerg
585*7330f729Sjoerg```c++
586*7330f729Sjoergclass MyFixture : public benchmark::Fixture {};
587*7330f729Sjoerg
588*7330f729SjoergBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
589*7330f729Sjoerg   for (auto _ : st) {
590*7330f729Sjoerg     ...
591*7330f729Sjoerg  }
592*7330f729Sjoerg}
593*7330f729Sjoerg
594*7330f729SjoergBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
595*7330f729Sjoerg   for (auto _ : st) {
596*7330f729Sjoerg     ...
597*7330f729Sjoerg  }
598*7330f729Sjoerg}
599*7330f729Sjoerg/* BarTest is NOT registered */
600*7330f729SjoergBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
601*7330f729Sjoerg/* BarTest is now registered */
602*7330f729Sjoerg```
603*7330f729Sjoerg
604*7330f729Sjoerg### Templated fixtures
605*7330f729SjoergAlso you can create templated fixture by using the following macros:
606*7330f729Sjoerg
607*7330f729Sjoerg* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
608*7330f729Sjoerg* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
609*7330f729Sjoerg
610*7330f729SjoergFor example:
611*7330f729Sjoerg```c++
612*7330f729Sjoergtemplate<typename T>
613*7330f729Sjoergclass MyFixture : public benchmark::Fixture {};
614*7330f729Sjoerg
615*7330f729SjoergBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
616*7330f729Sjoerg   for (auto _ : st) {
617*7330f729Sjoerg     ...
618*7330f729Sjoerg  }
619*7330f729Sjoerg}
620*7330f729Sjoerg
621*7330f729SjoergBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
622*7330f729Sjoerg   for (auto _ : st) {
623*7330f729Sjoerg     ...
624*7330f729Sjoerg  }
625*7330f729Sjoerg}
626*7330f729Sjoerg
627*7330f729SjoergBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
628*7330f729Sjoerg```
629*7330f729Sjoerg
630*7330f729Sjoerg## User-defined counters
631*7330f729Sjoerg
632*7330f729SjoergYou can add your own counters with user-defined names. The example below
633*7330f729Sjoergwill add columns "Foo", "Bar" and "Baz" in its output:
634*7330f729Sjoerg
635*7330f729Sjoerg```c++
636*7330f729Sjoergstatic void UserCountersExample1(benchmark::State& state) {
637*7330f729Sjoerg  double numFoos = 0, numBars = 0, numBazs = 0;
638*7330f729Sjoerg  for (auto _ : state) {
639*7330f729Sjoerg    // ... count Foo,Bar,Baz events
640*7330f729Sjoerg  }
641*7330f729Sjoerg  state.counters["Foo"] = numFoos;
642*7330f729Sjoerg  state.counters["Bar"] = numBars;
643*7330f729Sjoerg  state.counters["Baz"] = numBazs;
644*7330f729Sjoerg}
645*7330f729Sjoerg```
646*7330f729Sjoerg
647*7330f729SjoergThe `state.counters` object is a `std::map` with `std::string` keys
648*7330f729Sjoergand `Counter` values. The latter is a `double`-like class, via an implicit
649*7330f729Sjoergconversion to `double&`. Thus you can use all of the standard arithmetic
650*7330f729Sjoergassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
651*7330f729Sjoerg
652*7330f729SjoergIn multithreaded benchmarks, each counter is set on the calling thread only.
653*7330f729SjoergWhen the benchmark finishes, the counters from each thread will be summed;
654*7330f729Sjoergthe resulting sum is the value which will be shown for the benchmark.
655*7330f729Sjoerg
656*7330f729SjoergThe `Counter` constructor accepts two parameters: the value as a `double`
657*7330f729Sjoergand a bit flag which allows you to show counters as rates and/or as
658*7330f729Sjoergper-thread averages:
659*7330f729Sjoerg
660*7330f729Sjoerg```c++
661*7330f729Sjoerg  // sets a simple counter
662*7330f729Sjoerg  state.counters["Foo"] = numFoos;
663*7330f729Sjoerg
664*7330f729Sjoerg  // Set the counter as a rate. It will be presented divided
665*7330f729Sjoerg  // by the duration of the benchmark.
666*7330f729Sjoerg  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
667*7330f729Sjoerg
668*7330f729Sjoerg  // Set the counter as a thread-average quantity. It will
669*7330f729Sjoerg  // be presented divided by the number of threads.
670*7330f729Sjoerg  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
671*7330f729Sjoerg
672*7330f729Sjoerg  // There's also a combined flag:
673*7330f729Sjoerg  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
674*7330f729Sjoerg```
675*7330f729Sjoerg
676*7330f729SjoergWhen you're compiling in C++11 mode or later you can use `insert()` with
677*7330f729Sjoerg`std::initializer_list`:
678*7330f729Sjoerg
679*7330f729Sjoerg```c++
680*7330f729Sjoerg  // With C++11, this can be done:
681*7330f729Sjoerg  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
682*7330f729Sjoerg  // ... instead of:
683*7330f729Sjoerg  state.counters["Foo"] = numFoos;
684*7330f729Sjoerg  state.counters["Bar"] = numBars;
685*7330f729Sjoerg  state.counters["Baz"] = numBazs;
686*7330f729Sjoerg```
687*7330f729Sjoerg
688*7330f729Sjoerg### Counter reporting
689*7330f729Sjoerg
690*7330f729SjoergWhen using the console reporter, by default, user counters are are printed at
691*7330f729Sjoergthe end after the table, the same way as ``bytes_processed`` and
692*7330f729Sjoerg``items_processed``. This is best for cases in which there are few counters,
693*7330f729Sjoergor where there are only a couple of lines per benchmark. Here's an example of
694*7330f729Sjoergthe default output:
695*7330f729Sjoerg
696*7330f729Sjoerg```
697*7330f729Sjoerg------------------------------------------------------------------------------
698*7330f729SjoergBenchmark                        Time           CPU Iterations UserCounters...
699*7330f729Sjoerg------------------------------------------------------------------------------
700*7330f729SjoergBM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
701*7330f729SjoergBM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
702*7330f729SjoergBM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
703*7330f729SjoergBM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
704*7330f729SjoergBM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
705*7330f729SjoergBM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
706*7330f729SjoergBM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
707*7330f729SjoergBM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
708*7330f729SjoergBM_Factorial                    26 ns         26 ns   26608979 40320
709*7330f729SjoergBM_Factorial/real_time          26 ns         26 ns   26587936 40320
710*7330f729SjoergBM_CalculatePiRange/1           16 ns         16 ns   45704255 0
711*7330f729SjoergBM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
712*7330f729SjoergBM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
713*7330f729SjoergBM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
714*7330f729Sjoerg```
715*7330f729Sjoerg
716*7330f729SjoergIf this doesn't suit you, you can print each counter as a table column by
717*7330f729Sjoergpassing the flag `--benchmark_counters_tabular=true` to the benchmark
718*7330f729Sjoergapplication. This is best for cases in which there are a lot of counters, or
719*7330f729Sjoerga lot of lines per individual benchmark. Note that this will trigger a
720*7330f729Sjoergreprinting of the table header any time the counter set changes between
721*7330f729Sjoergindividual benchmarks. Here's an example of corresponding output when
722*7330f729Sjoerg`--benchmark_counters_tabular=true` is passed:
723*7330f729Sjoerg
724*7330f729Sjoerg```
725*7330f729Sjoerg---------------------------------------------------------------------------------------
726*7330f729SjoergBenchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
727*7330f729Sjoerg---------------------------------------------------------------------------------------
728*7330f729SjoergBM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
729*7330f729SjoergBM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
730*7330f729SjoergBM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
731*7330f729SjoergBM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
732*7330f729SjoergBM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
733*7330f729SjoergBM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
734*7330f729SjoergBM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
735*7330f729SjoergBM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
736*7330f729Sjoerg--------------------------------------------------------------
737*7330f729SjoergBenchmark                        Time           CPU Iterations
738*7330f729Sjoerg--------------------------------------------------------------
739*7330f729SjoergBM_Factorial                    26 ns         26 ns   26392245 40320
740*7330f729SjoergBM_Factorial/real_time          26 ns         26 ns   26494107 40320
741*7330f729SjoergBM_CalculatePiRange/1           15 ns         15 ns   45571597 0
742*7330f729SjoergBM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
743*7330f729SjoergBM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
744*7330f729SjoergBM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
745*7330f729SjoergBM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
746*7330f729SjoergBM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
747*7330f729SjoergBM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
748*7330f729SjoergBM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
749*7330f729SjoergBM_CalculatePi/threads:8      2255 ns       9943 ns      70936
750*7330f729Sjoerg```
751*7330f729SjoergNote above the additional header printed when the benchmark changes from
752*7330f729Sjoerg``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
753*7330f729Sjoergnot have the same counter set as ``BM_UserCounter``.
754*7330f729Sjoerg
755*7330f729Sjoerg## Exiting Benchmarks in Error
756*7330f729Sjoerg
757*7330f729SjoergWhen errors caused by external influences, such as file I/O and network
758*7330f729Sjoergcommunication, occur within a benchmark the
759*7330f729Sjoerg`State::SkipWithError(const char* msg)` function can be used to skip that run
760*7330f729Sjoergof benchmark and report the error. Note that only future iterations of the
761*7330f729Sjoerg`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
762*7330f729SjoergUsers must explicitly exit the loop, otherwise all iterations will be performed.
763*7330f729SjoergUsers may explicitly return to exit the benchmark immediately.
764*7330f729Sjoerg
765*7330f729SjoergThe `SkipWithError(...)` function may be used at any point within the benchmark,
766*7330f729Sjoergincluding before and after the benchmark loop.
767*7330f729Sjoerg
768*7330f729SjoergFor example:
769*7330f729Sjoerg
770*7330f729Sjoerg```c++
771*7330f729Sjoergstatic void BM_test(benchmark::State& state) {
772*7330f729Sjoerg  auto resource = GetResource();
773*7330f729Sjoerg  if (!resource.good()) {
774*7330f729Sjoerg      state.SkipWithError("Resource is not good!");
775*7330f729Sjoerg      // KeepRunning() loop will not be entered.
776*7330f729Sjoerg  }
777*7330f729Sjoerg  for (state.KeepRunning()) {
778*7330f729Sjoerg      auto data = resource.read_data();
779*7330f729Sjoerg      if (!resource.good()) {
780*7330f729Sjoerg        state.SkipWithError("Failed to read data!");
781*7330f729Sjoerg        break; // Needed to skip the rest of the iteration.
782*7330f729Sjoerg     }
783*7330f729Sjoerg     do_stuff(data);
784*7330f729Sjoerg  }
785*7330f729Sjoerg}
786*7330f729Sjoerg
787*7330f729Sjoergstatic void BM_test_ranged_fo(benchmark::State & state) {
788*7330f729Sjoerg  state.SkipWithError("test will not be entered");
789*7330f729Sjoerg  for (auto _ : state) {
790*7330f729Sjoerg    state.SkipWithError("Failed!");
791*7330f729Sjoerg    break; // REQUIRED to prevent all further iterations.
792*7330f729Sjoerg  }
793*7330f729Sjoerg}
794*7330f729Sjoerg```
795*7330f729Sjoerg
796*7330f729Sjoerg## Running a subset of the benchmarks
797*7330f729Sjoerg
798*7330f729SjoergThe `--benchmark_filter=<regex>` option can be used to only run the benchmarks
799*7330f729Sjoergwhich match the specified `<regex>`. For example:
800*7330f729Sjoerg
801*7330f729Sjoerg```bash
802*7330f729Sjoerg$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
803*7330f729SjoergRun on (1 X 2300 MHz CPU )
804*7330f729Sjoerg2016-06-25 19:34:24
805*7330f729SjoergBenchmark              Time           CPU Iterations
806*7330f729Sjoerg----------------------------------------------------
807*7330f729SjoergBM_memcpy/32          11 ns         11 ns   79545455
808*7330f729SjoergBM_memcpy/32k       2181 ns       2185 ns     324074
809*7330f729SjoergBM_memcpy/32          12 ns         12 ns   54687500
810*7330f729SjoergBM_memcpy/32k       1834 ns       1837 ns     357143
811*7330f729Sjoerg```
812*7330f729Sjoerg
813*7330f729Sjoerg
814*7330f729Sjoerg## Output Formats
815*7330f729SjoergThe library supports multiple output formats. Use the
816*7330f729Sjoerg`--benchmark_format=<console|json|csv>` flag to set the format type. `console`
817*7330f729Sjoergis the default format.
818*7330f729Sjoerg
819*7330f729SjoergThe Console format is intended to be a human readable format. By default
820*7330f729Sjoergthe format generates color output. Context is output on stderr and the
821*7330f729Sjoergtabular data on stdout. Example tabular output looks like:
822*7330f729Sjoerg```
823*7330f729SjoergBenchmark                               Time(ns)    CPU(ns) Iterations
824*7330f729Sjoerg----------------------------------------------------------------------
825*7330f729SjoergBM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
826*7330f729SjoergBM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
827*7330f729SjoergBM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
828*7330f729Sjoerg```
829*7330f729Sjoerg
830*7330f729SjoergThe JSON format outputs human readable json split into two top level attributes.
831*7330f729SjoergThe `context` attribute contains information about the run in general, including
832*7330f729Sjoerginformation about the CPU and the date.
833*7330f729SjoergThe `benchmarks` attribute contains a list of every benchmark run. Example json
834*7330f729Sjoergoutput looks like:
835*7330f729Sjoerg```json
836*7330f729Sjoerg{
837*7330f729Sjoerg  "context": {
838*7330f729Sjoerg    "date": "2015/03/17-18:40:25",
839*7330f729Sjoerg    "num_cpus": 40,
840*7330f729Sjoerg    "mhz_per_cpu": 2801,
841*7330f729Sjoerg    "cpu_scaling_enabled": false,
842*7330f729Sjoerg    "build_type": "debug"
843*7330f729Sjoerg  },
844*7330f729Sjoerg  "benchmarks": [
845*7330f729Sjoerg    {
846*7330f729Sjoerg      "name": "BM_SetInsert/1024/1",
847*7330f729Sjoerg      "iterations": 94877,
848*7330f729Sjoerg      "real_time": 29275,
849*7330f729Sjoerg      "cpu_time": 29836,
850*7330f729Sjoerg      "bytes_per_second": 134066,
851*7330f729Sjoerg      "items_per_second": 33516
852*7330f729Sjoerg    },
853*7330f729Sjoerg    {
854*7330f729Sjoerg      "name": "BM_SetInsert/1024/8",
855*7330f729Sjoerg      "iterations": 21609,
856*7330f729Sjoerg      "real_time": 32317,
857*7330f729Sjoerg      "cpu_time": 32429,
858*7330f729Sjoerg      "bytes_per_second": 986770,
859*7330f729Sjoerg      "items_per_second": 246693
860*7330f729Sjoerg    },
861*7330f729Sjoerg    {
862*7330f729Sjoerg      "name": "BM_SetInsert/1024/10",
863*7330f729Sjoerg      "iterations": 21393,
864*7330f729Sjoerg      "real_time": 32724,
865*7330f729Sjoerg      "cpu_time": 33355,
866*7330f729Sjoerg      "bytes_per_second": 1199226,
867*7330f729Sjoerg      "items_per_second": 299807
868*7330f729Sjoerg    }
869*7330f729Sjoerg  ]
870*7330f729Sjoerg}
871*7330f729Sjoerg```
872*7330f729Sjoerg
873*7330f729SjoergThe CSV format outputs comma-separated values. The `context` is output on stderr
874*7330f729Sjoergand the CSV itself on stdout. Example CSV output looks like:
875*7330f729Sjoerg```
876*7330f729Sjoergname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
877*7330f729Sjoerg"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
878*7330f729Sjoerg"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
879*7330f729Sjoerg"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
880*7330f729Sjoerg```
881*7330f729Sjoerg
882*7330f729Sjoerg## Output Files
883*7330f729SjoergThe library supports writing the output of the benchmark to a file specified
884*7330f729Sjoergby `--benchmark_out=<filename>`. The format of the output can be specified
885*7330f729Sjoergusing `--benchmark_out_format={json|console|csv}`. Specifying
886*7330f729Sjoerg`--benchmark_out` does not suppress the console output.
887*7330f729Sjoerg
888*7330f729Sjoerg## Debug vs Release
889*7330f729SjoergBy default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
890*7330f729Sjoerg
891*7330f729Sjoerg```
892*7330f729Sjoergcmake -DCMAKE_BUILD_TYPE=Release
893*7330f729Sjoerg```
894*7330f729Sjoerg
895*7330f729SjoergTo enable link-time optimisation, use
896*7330f729Sjoerg
897*7330f729Sjoerg```
898*7330f729Sjoergcmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
899*7330f729Sjoerg```
900*7330f729Sjoerg
901*7330f729SjoergIf you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails.
902*7330f729SjoergIf you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
903*7330f729Sjoerg
904*7330f729Sjoerg## Linking against the library
905*7330f729Sjoerg
906*7330f729SjoergWhen the library is built using GCC it is necessary to link with `-pthread`,
907*7330f729Sjoergdue to how GCC implements `std::thread`.
908*7330f729Sjoerg
909*7330f729SjoergFor GCC 4.x failing to link to pthreads will lead to runtime exceptions, not linker errors.
910*7330f729SjoergSee [issue #67](https://github.com/google/benchmark/issues/67) for more details.
911*7330f729Sjoerg
912*7330f729Sjoerg## Compiler Support
913*7330f729Sjoerg
914*7330f729SjoergGoogle Benchmark uses C++11 when building the library. As such we require
915*7330f729Sjoerga modern C++ toolchain, both compiler and standard library.
916*7330f729Sjoerg
917*7330f729SjoergThe following minimum versions are strongly recommended build the library:
918*7330f729Sjoerg
919*7330f729Sjoerg* GCC 4.8
920*7330f729Sjoerg* Clang 3.4
921*7330f729Sjoerg* Visual Studio 2013
922*7330f729Sjoerg* Intel 2015 Update 1
923*7330f729Sjoerg
924*7330f729SjoergAnything older *may* work.
925*7330f729Sjoerg
926*7330f729SjoergNote: Using the library and its headers in C++03 is supported. C++11 is only
927*7330f729Sjoergrequired to build the library.
928*7330f729Sjoerg
929*7330f729Sjoerg## Disable CPU frequency scaling
930*7330f729SjoergIf you see this error:
931*7330f729Sjoerg```
932*7330f729Sjoerg***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
933*7330f729Sjoerg```
934*7330f729Sjoergyou might want to disable the CPU frequency scaling while running the benchmark:
935*7330f729Sjoerg```bash
936*7330f729Sjoergsudo cpupower frequency-set --governor performance
937*7330f729Sjoerg./mybench
938*7330f729Sjoergsudo cpupower frequency-set --governor powersave
939*7330f729Sjoerg```
940*7330f729Sjoerg
941*7330f729Sjoerg# Known Issues
942*7330f729Sjoerg
943*7330f729Sjoerg### Windows with CMake
944*7330f729Sjoerg
945*7330f729Sjoerg* Users must manually link `shlwapi.lib`. Failure to do so may result
946*7330f729Sjoergin unresolved symbols.
947*7330f729Sjoerg
948*7330f729Sjoerg### Solaris
949*7330f729Sjoerg
950*7330f729Sjoerg* Users must explicitly link with kstat library (-lkstat compilation flag).
951