utils/benchmark/README.md

*7330f729Sjoerg# benchmark
*7330f729Sjoerg[![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
*7330f729Sjoerg[![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
*7330f729Sjoerg[![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
*7330f729Sjoerg[![slackin](https://slackin-iqtfqnpzxd.now.sh/badge.svg)](https://slackin-iqtfqnpzxd.now.sh/)
*7330f729Sjoerg
*7330f729SjoergA library to support the benchmarking of functions, similar to unit-tests.
*7330f729Sjoerg
*7330f729SjoergDiscussion group: https://groups.google.com/d/forum/benchmark-discuss
*7330f729Sjoerg
*7330f729SjoergIRC channel: https://freenode.net #googlebenchmark
*7330f729Sjoerg
*7330f729Sjoerg[Known issues and common problems](#known-issues)
*7330f729Sjoerg
*7330f729Sjoerg[Additional Tooling Documentation](docs/tools.md)
*7330f729Sjoerg
*7330f729Sjoerg[Assembly Testing Documentation](docs/AssemblyTests.md)
*7330f729Sjoerg
*7330f729Sjoerg
*7330f729Sjoerg## Building
*7330f729Sjoerg
*7330f729SjoergThe basic steps for configuring and building the library look like this:
*7330f729Sjoerg
*7330f729Sjoerg```bash
*7330f729Sjoerg$ git clone https://github.com/google/benchmark.git
*7330f729Sjoerg# Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory.
*7330f729Sjoerg$ git clone https://github.com/google/googletest.git benchmark/googletest
*7330f729Sjoerg$ mkdir build && cd build
*7330f729Sjoerg$ cmake -G <generator> [options] ../benchmark
*7330f729Sjoerg# Assuming a makefile generator was used
*7330f729Sjoerg$ make
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergNote that Google Benchmark requires Google Test to build and run the tests. This
*7330f729Sjoergdependency can be provided two ways:
*7330f729Sjoerg
*7330f729Sjoerg* Checkout the Google Test sources into `benchmark/googletest` as above.
*7330f729Sjoerg* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
*7330f729Sjoerg  configuration, the library will automatically download and build any required
*7330f729Sjoerg  dependencies.
*7330f729Sjoerg
*7330f729SjoergIf you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF`
*7330f729Sjoergto `CMAKE_ARGS`.
*7330f729Sjoerg
*7330f729Sjoerg
*7330f729Sjoerg## Installation Guide
*7330f729Sjoerg
*7330f729SjoergFor Ubuntu and Debian Based System
*7330f729Sjoerg
*7330f729SjoergFirst make sure you have git and cmake installed (If not please install it)
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoergsudo apt-get install git
*7330f729Sjoergsudo apt-get install cmake
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergNow, let's clone the repository and build it
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoerggit clone https://github.com/google/benchmark.git
*7330f729Sjoergcd benchmark
*7330f729Sjoerggit clone https://github.com/google/googletest.git
*7330f729Sjoergmkdir build
*7330f729Sjoergcd build
*7330f729Sjoergcmake .. -DCMAKE_BUILD_TYPE=RELEASE
*7330f729Sjoergmake
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergWe need to install the library globally now
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoergsudo make install
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergNow you have google/benchmark installed in your machine
*7330f729SjoergNote: Don't forget to link to pthread library while building
*7330f729Sjoerg
*7330f729Sjoerg## Stable and Experimental Library Versions
*7330f729Sjoerg
*7330f729SjoergThe main branch contains the latest stable version of the benchmarking library;
*7330f729Sjoergthe API of which can be considered largely stable, with source breaking changes
*7330f729Sjoergbeing made only upon the release of a new major version.
*7330f729Sjoerg
*7330f729SjoergNewer, experimental, features are implemented and tested on the
*7330f729Sjoerg[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
*7330f729Sjoergto use, test, and provide feedback on the new features are encouraged to try
*7330f729Sjoergthis branch. However, this branch provides no stability guarantees and reserves
*7330f729Sjoergthe right to change and break the API at any time.
*7330f729Sjoerg
*7330f729Sjoerg##Prerequisite knowledge
*7330f729Sjoerg
*7330f729SjoergBefore attempting to understand this framework one should ideally have some familiarity with the structure and format of the Google Test framework, upon which it is based. Documentation for Google Test, including a "Getting Started" (primer) guide, is available here:
*7330f729Sjoerghttps://github.com/google/googletest/blob/master/googletest/docs/Documentation.md
*7330f729Sjoerg
*7330f729Sjoerg
*7330f729Sjoerg## Example usage
*7330f729Sjoerg### Basic usage
*7330f729SjoergDefine a function that executes the code to be measured.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoerg#include <benchmark/benchmark.h>
*7330f729Sjoerg
*7330f729Sjoergstatic void BM_StringCreation(benchmark::State& state) {
*7330f729Sjoerg  for (auto _ : state)
*7330f729Sjoerg    std::string empty_string;
*7330f729Sjoerg}
*7330f729Sjoerg// Register the function as a benchmark
*7330f729SjoergBENCHMARK(BM_StringCreation);
*7330f729Sjoerg
*7330f729Sjoerg// Define another benchmark
*7330f729Sjoergstatic void BM_StringCopy(benchmark::State& state) {
*7330f729Sjoerg  std::string x = "hello";
*7330f729Sjoerg  for (auto _ : state)
*7330f729Sjoerg    std::string copy(x);
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_StringCopy);
*7330f729Sjoerg
*7330f729SjoergBENCHMARK_MAIN();
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergDon't forget to inform your linker to add benchmark library e.g. through
*7330f729Sjoerg`-lbenchmark` compilation flag. Alternatively, you may leave out the
*7330f729Sjoerg`BENCHMARK_MAIN();` at the end of the source file and link against
*7330f729Sjoerg`-lbenchmark_main` to get the same default behavior.
*7330f729Sjoerg
*7330f729SjoergThe benchmark library will reporting the timing for the code within the `for(...)` loop.
*7330f729Sjoerg
*7330f729Sjoerg### Passing arguments
*7330f729SjoergSometimes a family of benchmarks can be implemented with just one routine that
*7330f729Sjoergtakes an extra argument to specify which one of the family of benchmarks to
*7330f729Sjoergrun. For example, the following code defines a family of benchmarks for
*7330f729Sjoergmeasuring the speed of `memcpy()` calls of different lengths:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_memcpy(benchmark::State& state) {
*7330f729Sjoerg  char* src = new char[state.range(0)];
*7330f729Sjoerg  char* dst = new char[state.range(0)];
*7330f729Sjoerg  memset(src, 'x', state.range(0));
*7330f729Sjoerg  for (auto _ : state)
*7330f729Sjoerg    memcpy(dst, src, state.range(0));
*7330f729Sjoerg  state.SetBytesProcessed(int64_t(state.iterations()) *
*7330f729Sjoerg                          int64_t(state.range(0)));
*7330f729Sjoerg  delete[] src;
*7330f729Sjoerg  delete[] dst;
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe preceding code is quite repetitive, and can be replaced with the following
*7330f729Sjoergshort-hand. The following invocation will pick a few appropriate arguments in
*7330f729Sjoergthe specified range and will generate a benchmark for each such argument.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_memcpy)->Range(8, 8<<10);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergBy default the arguments in the range are generated in multiples of eight and
*7330f729Sjoergthe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
*7330f729Sjoergrange multiplier is changed to multiples of two.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
*7330f729Sjoerg```
*7330f729SjoergNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
*7330f729Sjoerg
*7330f729SjoergYou might have a benchmark that depends on two or more inputs. For example, the
*7330f729Sjoergfollowing code defines a family of benchmarks for measuring the speed of set
*7330f729Sjoerginsertion.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_SetInsert(benchmark::State& state) {
*7330f729Sjoerg  std::set<int> data;
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    state.PauseTiming();
*7330f729Sjoerg    data = ConstructRandomSet(state.range(0));
*7330f729Sjoerg    state.ResumeTiming();
*7330f729Sjoerg    for (int j = 0; j < state.range(1); ++j)
*7330f729Sjoerg      data.insert(RandomNumber());
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_SetInsert)
*7330f729Sjoerg    ->Args({1<<10, 128})
*7330f729Sjoerg    ->Args({2<<10, 128})
*7330f729Sjoerg    ->Args({4<<10, 128})
*7330f729Sjoerg    ->Args({8<<10, 128})
*7330f729Sjoerg    ->Args({1<<10, 512})
*7330f729Sjoerg    ->Args({2<<10, 512})
*7330f729Sjoerg    ->Args({4<<10, 512})
*7330f729Sjoerg    ->Args({8<<10, 512});
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe preceding code is quite repetitive, and can be replaced with the following
*7330f729Sjoergshort-hand. The following macro will pick a few appropriate arguments in the
*7330f729Sjoergproduct of the two specified ranges and will generate a benchmark for each such
*7330f729Sjoergpair.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergFor more complex patterns of inputs, passing a custom function to `Apply` allows
*7330f729Sjoergprogrammatic specification of an arbitrary set of arguments on which to run the
*7330f729Sjoergbenchmark. The following example enumerates a dense range on one parameter,
*7330f729Sjoergand a sparse range on the second.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void CustomArguments(benchmark::internal::Benchmark* b) {
*7330f729Sjoerg  for (int i = 0; i <= 10; ++i)
*7330f729Sjoerg    for (int j = 32; j <= 1024*1024; j *= 8)
*7330f729Sjoerg      b->Args({i, j});
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_SetInsert)->Apply(CustomArguments);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### Calculate asymptotic complexity (Big O)
*7330f729SjoergAsymptotic complexity might be calculated for a family of benchmarks. The
*7330f729Sjoergfollowing code will calculate the coefficient for the high-order term in the
*7330f729Sjoergrunning time and the normalized root-mean square error of string comparison.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_StringCompare(benchmark::State& state) {
*7330f729Sjoerg  std::string s1(state.range(0), '-');
*7330f729Sjoerg  std::string s2(state.range(0), '-');
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    benchmark::DoNotOptimize(s1.compare(s2));
*7330f729Sjoerg  }
*7330f729Sjoerg  state.SetComplexityN(state.range(0));
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_StringCompare)
*7330f729Sjoerg    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergAs shown in the following invocation, asymptotic complexity might also be
*7330f729Sjoergcalculated automatically.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_StringCompare)
*7330f729Sjoerg    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe following code will specify asymptotic complexity with a lambda function,
*7330f729Sjoergthat might be used to customize high-order term calculation.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_StringCompare)->RangeMultiplier(2)
*7330f729Sjoerg    ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; });
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### Templated benchmarks
*7330f729SjoergTemplated benchmarks work the same way: This example produces and consumes
*7330f729Sjoergmessages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
*7330f729Sjoergabsence of multiprogramming.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergtemplate <class Q> int BM_Sequential(benchmark::State& state) {
*7330f729Sjoerg  Q q;
*7330f729Sjoerg  typename Q::value_type v;
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    for (int i = state.range(0); i--; )
*7330f729Sjoerg      q.push(v);
*7330f729Sjoerg    for (int e = state.range(0); e--; )
*7330f729Sjoerg      q.Wait(&v);
*7330f729Sjoerg  }
*7330f729Sjoerg  // actually messages, not bytes:
*7330f729Sjoerg  state.SetBytesProcessed(
*7330f729Sjoerg      static_cast<int64_t>(state.iterations())*state.range(0));
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThree macros are provided for adding benchmark templates.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoerg#ifdef BENCHMARK_HAS_CXX11
*7330f729Sjoerg#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
*7330f729Sjoerg#else // C++ < C++11
*7330f729Sjoerg#define BENCHMARK_TEMPLATE(func, arg1)
*7330f729Sjoerg#endif
*7330f729Sjoerg#define BENCHMARK_TEMPLATE1(func, arg1)
*7330f729Sjoerg#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### A Faster KeepRunning loop
*7330f729Sjoerg
*7330f729SjoergIn C++11 mode, a ranged-based for loop should be used in preference to
*7330f729Sjoergthe `KeepRunning` loop for running the benchmarks. For example:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_Fast(benchmark::State &state) {
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    FastOperation();
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_Fast);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe reason the ranged-for loop is faster than using `KeepRunning`, is
*7330f729Sjoergbecause `KeepRunning` requires a memory load and store of the iteration count
*7330f729Sjoergever iteration, whereas the ranged-for variant is able to keep the iteration count
*7330f729Sjoergin a register.
*7330f729Sjoerg
*7330f729SjoergFor example, an empty inner loop of using the ranged-based for method looks like:
*7330f729Sjoerg
*7330f729Sjoerg```asm
*7330f729Sjoerg# Loop Init
*7330f729Sjoerg  mov rbx, qword ptr [r14 + 104]
*7330f729Sjoerg  call benchmark::State::StartKeepRunning()
*7330f729Sjoerg  test rbx, rbx
*7330f729Sjoerg  je .LoopEnd
*7330f729Sjoerg.LoopHeader: # =>This Inner Loop Header: Depth=1
*7330f729Sjoerg  add rbx, -1
*7330f729Sjoerg  jne .LoopHeader
*7330f729Sjoerg.LoopEnd:
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergCompared to an empty `KeepRunning` loop, which looks like:
*7330f729Sjoerg
*7330f729Sjoerg```asm
*7330f729Sjoerg.LoopHeader: # in Loop: Header=BB0_3 Depth=1
*7330f729Sjoerg  cmp byte ptr [rbx], 1
*7330f729Sjoerg  jne .LoopInit
*7330f729Sjoerg.LoopBody: # =>This Inner Loop Header: Depth=1
*7330f729Sjoerg  mov rax, qword ptr [rbx + 8]
*7330f729Sjoerg  lea rcx, [rax + 1]
*7330f729Sjoerg  mov qword ptr [rbx + 8], rcx
*7330f729Sjoerg  cmp rax, qword ptr [rbx + 104]
*7330f729Sjoerg  jb .LoopHeader
*7330f729Sjoerg  jmp .LoopEnd
*7330f729Sjoerg.LoopInit:
*7330f729Sjoerg  mov rdi, rbx
*7330f729Sjoerg  call benchmark::State::StartKeepRunning()
*7330f729Sjoerg  jmp .LoopBody
*7330f729Sjoerg.LoopEnd:
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergUnless C++03 compatibility is required, the ranged-for variant of writing
*7330f729Sjoergthe benchmark loop should be preferred.
*7330f729Sjoerg
*7330f729Sjoerg## Passing arbitrary arguments to a benchmark
*7330f729SjoergIn C++11 it is possible to define a benchmark that takes an arbitrary number
*7330f729Sjoergof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
*7330f729Sjoergmacro creates a benchmark that invokes `func`  with the `benchmark::State` as
*7330f729Sjoergthe first argument followed by the specified `args...`.
*7330f729SjoergThe `test_case_name` is appended to the name of the benchmark and
*7330f729Sjoergshould describe the values passed.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergtemplate <class ...ExtraArgs>
*7330f729Sjoergvoid BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
*7330f729Sjoerg  [...]
*7330f729Sjoerg}
*7330f729Sjoerg// Registers a benchmark named "BM_takes_args/int_string_test" that passes
*7330f729Sjoerg// the specified values to `extra_args`.
*7330f729SjoergBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
*7330f729Sjoerg```
*7330f729SjoergNote that elements of `...args` may refer to global variables. Users should
*7330f729Sjoergavoid modifying global state inside of a benchmark.
*7330f729Sjoerg
*7330f729Sjoerg## Using RegisterBenchmark(name, fn, args...)
*7330f729Sjoerg
*7330f729SjoergThe `RegisterBenchmark(name, func, args...)` function provides an alternative
*7330f729Sjoergway to create and register benchmarks.
*7330f729Sjoerg`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
*7330f729Sjoergpointer to a new benchmark with the specified `name` that invokes
*7330f729Sjoerg`func(st, args...)` where `st` is a `benchmark::State` object.
*7330f729Sjoerg
*7330f729SjoergUnlike the `BENCHMARK` registration macros, which can only be used at the global
*7330f729Sjoergscope, the `RegisterBenchmark` can be called anywhere. This allows for
*7330f729Sjoergbenchmark tests to be registered programmatically.
*7330f729Sjoerg
*7330f729SjoergAdditionally `RegisterBenchmark` allows any callable object to be registered
*7330f729Sjoergas a benchmark. Including capturing lambdas and function objects.
*7330f729Sjoerg
*7330f729SjoergFor Example:
*7330f729Sjoerg```c++
*7330f729Sjoergauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
*7330f729Sjoerg
*7330f729Sjoergint main(int argc, char** argv) {
*7330f729Sjoerg  for (auto& test_input : { /* ... */ })
*7330f729Sjoerg      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
*7330f729Sjoerg  benchmark::Initialize(&argc, argv);
*7330f729Sjoerg  benchmark::RunSpecifiedBenchmarks();
*7330f729Sjoerg}
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### Multithreaded benchmarks
*7330f729SjoergIn a multithreaded test (benchmark invoked by multiple threads simultaneously),
*7330f729Sjoergit is guaranteed that none of the threads will start until all have reached
*7330f729Sjoergthe start of the benchmark loop, and all will have finished before any thread
*7330f729Sjoergexits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
*7330f729SjoergAPI) As such, any global setup or teardown can be wrapped in a check against the thread
*7330f729Sjoergindex:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_MultiThreaded(benchmark::State& state) {
*7330f729Sjoerg  if (state.thread_index == 0) {
*7330f729Sjoerg    // Setup code here.
*7330f729Sjoerg  }
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    // Run the test as normal.
*7330f729Sjoerg  }
*7330f729Sjoerg  if (state.thread_index == 0) {
*7330f729Sjoerg    // Teardown code here.
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_MultiThreaded)->Threads(2);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergIf the benchmarked code itself uses threads and you want to compare it to
*7330f729Sjoergsingle-threaded code, you may want to use real-time ("wallclock") measurements
*7330f729Sjoergfor latency comparisons:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergWithout `UseRealTime`, CPU time is used by default.
*7330f729Sjoerg
*7330f729Sjoerg
*7330f729Sjoerg## Manual timing
*7330f729SjoergFor benchmarking something for which neither CPU time nor real-time are
*7330f729Sjoergcorrect or accurate enough, completely manual timing is supported using
*7330f729Sjoergthe `UseManualTime` function.
*7330f729Sjoerg
*7330f729SjoergWhen `UseManualTime` is used, the benchmarked code must call
*7330f729Sjoerg`SetIterationTime` once per iteration of the benchmark loop to
*7330f729Sjoergreport the manually measured time.
*7330f729Sjoerg
*7330f729SjoergAn example use case for this is benchmarking GPU execution (e.g. OpenCL
*7330f729Sjoergor CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
*7330f729Sjoergbe accurately measured using CPU time or real-time. Instead, they can be
*7330f729Sjoergmeasured accurately using a dedicated API, and these measurement results
*7330f729Sjoergcan be reported back with `SetIterationTime`.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_ManualTiming(benchmark::State& state) {
*7330f729Sjoerg  int microseconds = state.range(0);
*7330f729Sjoerg  std::chrono::duration<double, std::micro> sleep_duration {
*7330f729Sjoerg    static_cast<double>(microseconds)
*7330f729Sjoerg  };
*7330f729Sjoerg
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    auto start = std::chrono::high_resolution_clock::now();
*7330f729Sjoerg    // Simulate some useful workload with a sleep
*7330f729Sjoerg    std::this_thread::sleep_for(sleep_duration);
*7330f729Sjoerg    auto end   = std::chrono::high_resolution_clock::now();
*7330f729Sjoerg
*7330f729Sjoerg    auto elapsed_seconds =
*7330f729Sjoerg      std::chrono::duration_cast<std::chrono::duration<double>>(
*7330f729Sjoerg        end - start);
*7330f729Sjoerg
*7330f729Sjoerg    state.SetIterationTime(elapsed_seconds.count());
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729SjoergBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### Preventing optimisation
*7330f729SjoergTo prevent a value or expression from being optimized away by the compiler
*7330f729Sjoergthe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
*7330f729Sjoergfunctions can be used.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_test(benchmark::State& state) {
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg      int x = 0;
*7330f729Sjoerg      for (int i=0; i < 64; ++i) {
*7330f729Sjoerg        benchmark::DoNotOptimize(x += i);
*7330f729Sjoerg      }
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg`DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
*7330f729Sjoergmemory or a register. For GNU based compilers it acts as read/write barrier
*7330f729Sjoergfor global memory. More specifically it forces the compiler to flush pending
*7330f729Sjoergwrites to memory and reload any other values as necessary.
*7330f729Sjoerg
*7330f729SjoergNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
*7330f729Sjoergin any way. `<expr>` may even be removed entirely when the result is already
*7330f729Sjoergknown. For example:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoerg  /* Example 1: `<expr>` is removed entirely. */
*7330f729Sjoerg  int foo(int x) { return x + 42; }
*7330f729Sjoerg  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
*7330f729Sjoerg
*7330f729Sjoerg  /*  Example 2: Result of '<expr>' is only reused */
*7330f729Sjoerg  int bar(int) __attribute__((const));
*7330f729Sjoerg  while (...) DoNotOptimize(bar(0)); // Optimized to:
*7330f729Sjoerg  // int __result__ = bar(0);
*7330f729Sjoerg  // while (...) DoNotOptimize(__result__);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe second tool for preventing optimizations is `ClobberMemory()`. In essence
*7330f729Sjoerg`ClobberMemory()` forces the compiler to perform all pending writes to global
*7330f729Sjoergmemory. Memory managed by block scope objects must be "escaped" using
*7330f729Sjoerg`DoNotOptimize(...)` before it can be clobbered. In the below example
*7330f729Sjoerg`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
*7330f729Sjoergaway.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_vector_push_back(benchmark::State& state) {
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    std::vector<int> v;
*7330f729Sjoerg    v.reserve(1);
*7330f729Sjoerg    benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
*7330f729Sjoerg    v.push_back(42);
*7330f729Sjoerg    benchmark::ClobberMemory(); // Force 42 to be written to memory.
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergNote that `ClobberMemory()` is only available for GNU or MSVC based compilers.
*7330f729Sjoerg
*7330f729Sjoerg### Set time unit manually
*7330f729SjoergIf a benchmark runs a few milliseconds it may be hard to visually compare the
*7330f729Sjoergmeasured times, since the output data is given in nanoseconds per default. In
*7330f729Sjoergorder to manually set the time unit, you can specify it manually:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729SjoergBENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg## Controlling number of iterations
*7330f729SjoergIn all cases, the number of iterations for which the benchmark is run is
*7330f729Sjoerggoverned by the amount of time the benchmark takes. Concretely, the number of
*7330f729Sjoergiterations is at least one, not more than 1e9, until CPU time is greater than
*7330f729Sjoergthe minimum time, or the wallclock time is 5x minimum time. The minimum time is
*7330f729Sjoergset as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on
*7330f729Sjoergthe registered benchmark object.
*7330f729Sjoerg
*7330f729Sjoerg## Reporting the mean, median and standard deviation by repeated benchmarks
*7330f729SjoergBy default each benchmark is run once and that single result is reported.
*7330f729SjoergHowever benchmarks are often noisy and a single result may not be representative
*7330f729Sjoergof the overall behavior. For this reason it's possible to repeatedly rerun the
*7330f729Sjoergbenchmark.
*7330f729Sjoerg
*7330f729SjoergThe number of runs of each benchmark is specified globally by the
*7330f729Sjoerg`--benchmark_repetitions` flag or on a per benchmark basis by calling
*7330f729Sjoerg`Repetitions` on the registered benchmark object. When a benchmark is run more
*7330f729Sjoergthan once the mean, median and standard deviation of the runs will be reported.
*7330f729Sjoerg
*7330f729SjoergAdditionally the `--benchmark_report_aggregates_only={true|false}` flag or
*7330f729Sjoerg`ReportAggregatesOnly(bool)` function can be used to change how repeated tests
*7330f729Sjoergare reported. By default the result of each repeated run is reported. When this
*7330f729Sjoergoption is `true` only the mean, median and standard deviation of the runs is reported.
*7330f729SjoergCalling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides
*7330f729Sjoergthe value of the flag for that benchmark.
*7330f729Sjoerg
*7330f729Sjoerg## User-defined statistics for repeated benchmarks
*7330f729SjoergWhile having mean, median and standard deviation is nice, this may not be
*7330f729Sjoergenough for everyone. For example you may want to know what is the largest
*7330f729Sjoergobservation, e.g. because you have some real-time constraints. This is easy.
*7330f729SjoergThe following code will specify a custom statistic to be calculated, defined
*7330f729Sjoergby a lambda function.
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergvoid BM_spin_empty(benchmark::State& state) {
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    for (int x = 0; x < state.range(0); ++x) {
*7330f729Sjoerg      benchmark::DoNotOptimize(x);
*7330f729Sjoerg    }
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg
*7330f729SjoergBENCHMARK(BM_spin_empty)
*7330f729Sjoerg  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
*7330f729Sjoerg    return *(std::max_element(std::begin(v), std::end(v)));
*7330f729Sjoerg  })
*7330f729Sjoerg  ->Arg(512);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg## Fixtures
*7330f729SjoergFixture tests are created by
*7330f729Sjoergfirst defining a type that derives from `::benchmark::Fixture` and then
*7330f729Sjoergcreating/registering the tests using the following macros:
*7330f729Sjoerg
*7330f729Sjoerg* `BENCHMARK_F(ClassName, Method)`
*7330f729Sjoerg* `BENCHMARK_DEFINE_F(ClassName, Method)`
*7330f729Sjoerg* `BENCHMARK_REGISTER_F(ClassName, Method)`
*7330f729Sjoerg
*7330f729SjoergFor Example:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergclass MyFixture : public benchmark::Fixture {};
*7330f729Sjoerg
*7330f729SjoergBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
*7330f729Sjoerg   for (auto _ : st) {
*7330f729Sjoerg     ...
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg
*7330f729SjoergBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
*7330f729Sjoerg   for (auto _ : st) {
*7330f729Sjoerg     ...
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg/* BarTest is NOT registered */
*7330f729SjoergBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
*7330f729Sjoerg/* BarTest is now registered */
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### Templated fixtures
*7330f729SjoergAlso you can create templated fixture by using the following macros:
*7330f729Sjoerg
*7330f729Sjoerg* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
*7330f729Sjoerg* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
*7330f729Sjoerg
*7330f729SjoergFor example:
*7330f729Sjoerg```c++
*7330f729Sjoergtemplate<typename T>
*7330f729Sjoergclass MyFixture : public benchmark::Fixture {};
*7330f729Sjoerg
*7330f729SjoergBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
*7330f729Sjoerg   for (auto _ : st) {
*7330f729Sjoerg     ...
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg
*7330f729SjoergBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
*7330f729Sjoerg   for (auto _ : st) {
*7330f729Sjoerg     ...
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg
*7330f729SjoergBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg## User-defined counters
*7330f729Sjoerg
*7330f729SjoergYou can add your own counters with user-defined names. The example below
*7330f729Sjoergwill add columns "Foo", "Bar" and "Baz" in its output:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void UserCountersExample1(benchmark::State& state) {
*7330f729Sjoerg  double numFoos = 0, numBars = 0, numBazs = 0;
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    // ... count Foo,Bar,Baz events
*7330f729Sjoerg  }
*7330f729Sjoerg  state.counters["Foo"] = numFoos;
*7330f729Sjoerg  state.counters["Bar"] = numBars;
*7330f729Sjoerg  state.counters["Baz"] = numBazs;
*7330f729Sjoerg}
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe `state.counters` object is a `std::map` with `std::string` keys
*7330f729Sjoergand `Counter` values. The latter is a `double`-like class, via an implicit
*7330f729Sjoergconversion to `double&`. Thus you can use all of the standard arithmetic
*7330f729Sjoergassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
*7330f729Sjoerg
*7330f729SjoergIn multithreaded benchmarks, each counter is set on the calling thread only.
*7330f729SjoergWhen the benchmark finishes, the counters from each thread will be summed;
*7330f729Sjoergthe resulting sum is the value which will be shown for the benchmark.
*7330f729Sjoerg
*7330f729SjoergThe `Counter` constructor accepts two parameters: the value as a `double`
*7330f729Sjoergand a bit flag which allows you to show counters as rates and/or as
*7330f729Sjoergper-thread averages:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoerg  // sets a simple counter
*7330f729Sjoerg  state.counters["Foo"] = numFoos;
*7330f729Sjoerg
*7330f729Sjoerg  // Set the counter as a rate. It will be presented divided
*7330f729Sjoerg  // by the duration of the benchmark.
*7330f729Sjoerg  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
*7330f729Sjoerg
*7330f729Sjoerg  // Set the counter as a thread-average quantity. It will
*7330f729Sjoerg  // be presented divided by the number of threads.
*7330f729Sjoerg  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
*7330f729Sjoerg
*7330f729Sjoerg  // There's also a combined flag:
*7330f729Sjoerg  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergWhen you're compiling in C++11 mode or later you can use `insert()` with
*7330f729Sjoerg`std::initializer_list`:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoerg  // With C++11, this can be done:
*7330f729Sjoerg  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
*7330f729Sjoerg  // ... instead of:
*7330f729Sjoerg  state.counters["Foo"] = numFoos;
*7330f729Sjoerg  state.counters["Bar"] = numBars;
*7330f729Sjoerg  state.counters["Baz"] = numBazs;
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg### Counter reporting
*7330f729Sjoerg
*7330f729SjoergWhen using the console reporter, by default, user counters are are printed at
*7330f729Sjoergthe end after the table, the same way as ``bytes_processed`` and
*7330f729Sjoerg``items_processed``. This is best for cases in which there are few counters,
*7330f729Sjoergor where there are only a couple of lines per benchmark. Here's an example of
*7330f729Sjoergthe default output:
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoerg------------------------------------------------------------------------------
*7330f729SjoergBenchmark                        Time           CPU Iterations UserCounters...
*7330f729Sjoerg------------------------------------------------------------------------------
*7330f729SjoergBM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
*7330f729SjoergBM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
*7330f729SjoergBM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
*7330f729SjoergBM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
*7330f729SjoergBM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
*7330f729SjoergBM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
*7330f729SjoergBM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
*7330f729SjoergBM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
*7330f729SjoergBM_Factorial                    26 ns         26 ns   26608979 40320
*7330f729SjoergBM_Factorial/real_time          26 ns         26 ns   26587936 40320
*7330f729SjoergBM_CalculatePiRange/1           16 ns         16 ns   45704255 0
*7330f729SjoergBM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
*7330f729SjoergBM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
*7330f729SjoergBM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergIf this doesn't suit you, you can print each counter as a table column by
*7330f729Sjoergpassing the flag `--benchmark_counters_tabular=true` to the benchmark
*7330f729Sjoergapplication. This is best for cases in which there are a lot of counters, or
*7330f729Sjoerga lot of lines per individual benchmark. Note that this will trigger a
*7330f729Sjoergreprinting of the table header any time the counter set changes between
*7330f729Sjoergindividual benchmarks. Here's an example of corresponding output when
*7330f729Sjoerg`--benchmark_counters_tabular=true` is passed:
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoerg---------------------------------------------------------------------------------------
*7330f729SjoergBenchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
*7330f729Sjoerg---------------------------------------------------------------------------------------
*7330f729SjoergBM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
*7330f729SjoergBM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
*7330f729SjoergBM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
*7330f729SjoergBM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
*7330f729SjoergBM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
*7330f729SjoergBM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
*7330f729SjoergBM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
*7330f729SjoergBM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
*7330f729Sjoerg--------------------------------------------------------------
*7330f729SjoergBenchmark                        Time           CPU Iterations
*7330f729Sjoerg--------------------------------------------------------------
*7330f729SjoergBM_Factorial                    26 ns         26 ns   26392245 40320
*7330f729SjoergBM_Factorial/real_time          26 ns         26 ns   26494107 40320
*7330f729SjoergBM_CalculatePiRange/1           15 ns         15 ns   45571597 0
*7330f729SjoergBM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
*7330f729SjoergBM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
*7330f729SjoergBM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
*7330f729SjoergBM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
*7330f729SjoergBM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
*7330f729SjoergBM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
*7330f729SjoergBM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
*7330f729SjoergBM_CalculatePi/threads:8      2255 ns       9943 ns      70936
*7330f729Sjoerg```
*7330f729SjoergNote above the additional header printed when the benchmark changes from
*7330f729Sjoerg``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
*7330f729Sjoergnot have the same counter set as ``BM_UserCounter``.
*7330f729Sjoerg
*7330f729Sjoerg## Exiting Benchmarks in Error
*7330f729Sjoerg
*7330f729SjoergWhen errors caused by external influences, such as file I/O and network
*7330f729Sjoergcommunication, occur within a benchmark the
*7330f729Sjoerg`State::SkipWithError(const char* msg)` function can be used to skip that run
*7330f729Sjoergof benchmark and report the error. Note that only future iterations of the
*7330f729Sjoerg`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
*7330f729SjoergUsers must explicitly exit the loop, otherwise all iterations will be performed.
*7330f729SjoergUsers may explicitly return to exit the benchmark immediately.
*7330f729Sjoerg
*7330f729SjoergThe `SkipWithError(...)` function may be used at any point within the benchmark,
*7330f729Sjoergincluding before and after the benchmark loop.
*7330f729Sjoerg
*7330f729SjoergFor example:
*7330f729Sjoerg
*7330f729Sjoerg```c++
*7330f729Sjoergstatic void BM_test(benchmark::State& state) {
*7330f729Sjoerg  auto resource = GetResource();
*7330f729Sjoerg  if (!resource.good()) {
*7330f729Sjoerg      state.SkipWithError("Resource is not good!");
*7330f729Sjoerg      // KeepRunning() loop will not be entered.
*7330f729Sjoerg  }
*7330f729Sjoerg  for (state.KeepRunning()) {
*7330f729Sjoerg      auto data = resource.read_data();
*7330f729Sjoerg      if (!resource.good()) {
*7330f729Sjoerg        state.SkipWithError("Failed to read data!");
*7330f729Sjoerg        break; // Needed to skip the rest of the iteration.
*7330f729Sjoerg     }
*7330f729Sjoerg     do_stuff(data);
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg
*7330f729Sjoergstatic void BM_test_ranged_fo(benchmark::State & state) {
*7330f729Sjoerg  state.SkipWithError("test will not be entered");
*7330f729Sjoerg  for (auto _ : state) {
*7330f729Sjoerg    state.SkipWithError("Failed!");
*7330f729Sjoerg    break; // REQUIRED to prevent all further iterations.
*7330f729Sjoerg  }
*7330f729Sjoerg}
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg## Running a subset of the benchmarks
*7330f729Sjoerg
*7330f729SjoergThe `--benchmark_filter=<regex>` option can be used to only run the benchmarks
*7330f729Sjoergwhich match the specified `<regex>`. For example:
*7330f729Sjoerg
*7330f729Sjoerg```bash
*7330f729Sjoerg$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
*7330f729SjoergRun on (1 X 2300 MHz CPU )
*7330f729Sjoerg2016-06-25 19:34:24
*7330f729SjoergBenchmark              Time           CPU Iterations
*7330f729Sjoerg----------------------------------------------------
*7330f729SjoergBM_memcpy/32          11 ns         11 ns   79545455
*7330f729SjoergBM_memcpy/32k       2181 ns       2185 ns     324074
*7330f729SjoergBM_memcpy/32          12 ns         12 ns   54687500
*7330f729SjoergBM_memcpy/32k       1834 ns       1837 ns     357143
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg
*7330f729Sjoerg## Output Formats
*7330f729SjoergThe library supports multiple output formats. Use the
*7330f729Sjoerg`--benchmark_format=<console|json|csv>` flag to set the format type. `console`
*7330f729Sjoergis the default format.
*7330f729Sjoerg
*7330f729SjoergThe Console format is intended to be a human readable format. By default
*7330f729Sjoergthe format generates color output. Context is output on stderr and the
*7330f729Sjoergtabular data on stdout. Example tabular output looks like:
*7330f729Sjoerg```
*7330f729SjoergBenchmark                               Time(ns)    CPU(ns) Iterations
*7330f729Sjoerg----------------------------------------------------------------------
*7330f729SjoergBM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
*7330f729SjoergBM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
*7330f729SjoergBM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe JSON format outputs human readable json split into two top level attributes.
*7330f729SjoergThe `context` attribute contains information about the run in general, including
*7330f729Sjoerginformation about the CPU and the date.
*7330f729SjoergThe `benchmarks` attribute contains a list of every benchmark run. Example json
*7330f729Sjoergoutput looks like:
*7330f729Sjoerg```json
*7330f729Sjoerg{
*7330f729Sjoerg  "context": {
*7330f729Sjoerg    "date": "2015/03/17-18:40:25",
*7330f729Sjoerg    "num_cpus": 40,
*7330f729Sjoerg    "mhz_per_cpu": 2801,
*7330f729Sjoerg    "cpu_scaling_enabled": false,
*7330f729Sjoerg    "build_type": "debug"
*7330f729Sjoerg  },
*7330f729Sjoerg  "benchmarks": [
*7330f729Sjoerg    {
*7330f729Sjoerg      "name": "BM_SetInsert/1024/1",
*7330f729Sjoerg      "iterations": 94877,
*7330f729Sjoerg      "real_time": 29275,
*7330f729Sjoerg      "cpu_time": 29836,
*7330f729Sjoerg      "bytes_per_second": 134066,
*7330f729Sjoerg      "items_per_second": 33516
*7330f729Sjoerg    },
*7330f729Sjoerg    {
*7330f729Sjoerg      "name": "BM_SetInsert/1024/8",
*7330f729Sjoerg      "iterations": 21609,
*7330f729Sjoerg      "real_time": 32317,
*7330f729Sjoerg      "cpu_time": 32429,
*7330f729Sjoerg      "bytes_per_second": 986770,
*7330f729Sjoerg      "items_per_second": 246693
*7330f729Sjoerg    },
*7330f729Sjoerg    {
*7330f729Sjoerg      "name": "BM_SetInsert/1024/10",
*7330f729Sjoerg      "iterations": 21393,
*7330f729Sjoerg      "real_time": 32724,
*7330f729Sjoerg      "cpu_time": 33355,
*7330f729Sjoerg      "bytes_per_second": 1199226,
*7330f729Sjoerg      "items_per_second": 299807
*7330f729Sjoerg    }
*7330f729Sjoerg  ]
*7330f729Sjoerg}
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergThe CSV format outputs comma-separated values. The `context` is output on stderr
*7330f729Sjoergand the CSV itself on stdout. Example CSV output looks like:
*7330f729Sjoerg```
*7330f729Sjoergname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
*7330f729Sjoerg"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
*7330f729Sjoerg"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
*7330f729Sjoerg"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg## Output Files
*7330f729SjoergThe library supports writing the output of the benchmark to a file specified
*7330f729Sjoergby `--benchmark_out=<filename>`. The format of the output can be specified
*7330f729Sjoergusing `--benchmark_out_format={json|console|csv}`. Specifying
*7330f729Sjoerg`--benchmark_out` does not suppress the console output.
*7330f729Sjoerg
*7330f729Sjoerg## Debug vs Release
*7330f729SjoergBy default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoergcmake -DCMAKE_BUILD_TYPE=Release
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergTo enable link-time optimisation, use
*7330f729Sjoerg
*7330f729Sjoerg```
*7330f729Sjoergcmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729SjoergIf you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails.
*7330f729SjoergIf you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
*7330f729Sjoerg
*7330f729Sjoerg## Linking against the library
*7330f729Sjoerg
*7330f729SjoergWhen the library is built using GCC it is necessary to link with `-pthread`,
*7330f729Sjoergdue to how GCC implements `std::thread`.
*7330f729Sjoerg
*7330f729SjoergFor GCC 4.x failing to link to pthreads will lead to runtime exceptions, not linker errors.
*7330f729SjoergSee [issue #67](https://github.com/google/benchmark/issues/67) for more details.
*7330f729Sjoerg
*7330f729Sjoerg## Compiler Support
*7330f729Sjoerg
*7330f729SjoergGoogle Benchmark uses C++11 when building the library. As such we require
*7330f729Sjoerga modern C++ toolchain, both compiler and standard library.
*7330f729Sjoerg
*7330f729SjoergThe following minimum versions are strongly recommended build the library:
*7330f729Sjoerg
*7330f729Sjoerg* GCC 4.8
*7330f729Sjoerg* Clang 3.4
*7330f729Sjoerg* Visual Studio 2013
*7330f729Sjoerg* Intel 2015 Update 1
*7330f729Sjoerg
*7330f729SjoergAnything older *may* work.
*7330f729Sjoerg
*7330f729SjoergNote: Using the library and its headers in C++03 is supported. C++11 is only
*7330f729Sjoergrequired to build the library.
*7330f729Sjoerg
*7330f729Sjoerg## Disable CPU frequency scaling
*7330f729SjoergIf you see this error:
*7330f729Sjoerg```
*7330f729Sjoerg***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
*7330f729Sjoerg```
*7330f729Sjoergyou might want to disable the CPU frequency scaling while running the benchmark:
*7330f729Sjoerg```bash
*7330f729Sjoergsudo cpupower frequency-set --governor performance
*7330f729Sjoerg./mybench
*7330f729Sjoergsudo cpupower frequency-set --governor powersave
*7330f729Sjoerg```
*7330f729Sjoerg
*7330f729Sjoerg# Known Issues
*7330f729Sjoerg
*7330f729Sjoerg### Windows with CMake
*7330f729Sjoerg
*7330f729Sjoerg* Users must manually link `shlwapi.lib`. Failure to do so may result
*7330f729Sjoergin unresolved symbols.
*7330f729Sjoerg
*7330f729Sjoerg### Solaris
*7330f729Sjoerg
*7330f729Sjoerg* Users must explicitly link with kstat library (-lkstat compilation flag).