1*7330f729Sjoerg# benchmark 2*7330f729Sjoerg[](https://travis-ci.org/google/benchmark) 3*7330f729Sjoerg[](https://ci.appveyor.com/project/google/benchmark/branch/master) 4*7330f729Sjoerg[](https://coveralls.io/r/google/benchmark) 5*7330f729Sjoerg[](https://slackin-iqtfqnpzxd.now.sh/) 6*7330f729Sjoerg 7*7330f729SjoergA library to support the benchmarking of functions, similar to unit-tests. 8*7330f729Sjoerg 9*7330f729SjoergDiscussion group: https://groups.google.com/d/forum/benchmark-discuss 10*7330f729Sjoerg 11*7330f729SjoergIRC channel: https://freenode.net #googlebenchmark 12*7330f729Sjoerg 13*7330f729Sjoerg[Known issues and common problems](#known-issues) 14*7330f729Sjoerg 15*7330f729Sjoerg[Additional Tooling Documentation](docs/tools.md) 16*7330f729Sjoerg 17*7330f729Sjoerg[Assembly Testing Documentation](docs/AssemblyTests.md) 18*7330f729Sjoerg 19*7330f729Sjoerg 20*7330f729Sjoerg## Building 21*7330f729Sjoerg 22*7330f729SjoergThe basic steps for configuring and building the library look like this: 23*7330f729Sjoerg 24*7330f729Sjoerg```bash 25*7330f729Sjoerg$ git clone https://github.com/google/benchmark.git 26*7330f729Sjoerg# Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory. 27*7330f729Sjoerg$ git clone https://github.com/google/googletest.git benchmark/googletest 28*7330f729Sjoerg$ mkdir build && cd build 29*7330f729Sjoerg$ cmake -G <generator> [options] ../benchmark 30*7330f729Sjoerg# Assuming a makefile generator was used 31*7330f729Sjoerg$ make 32*7330f729Sjoerg``` 33*7330f729Sjoerg 34*7330f729SjoergNote that Google Benchmark requires Google Test to build and run the tests. This 35*7330f729Sjoergdependency can be provided two ways: 36*7330f729Sjoerg 37*7330f729Sjoerg* Checkout the Google Test sources into `benchmark/googletest` as above. 38*7330f729Sjoerg* Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during 39*7330f729Sjoerg configuration, the library will automatically download and build any required 40*7330f729Sjoerg dependencies. 41*7330f729Sjoerg 42*7330f729SjoergIf you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF` 43*7330f729Sjoergto `CMAKE_ARGS`. 44*7330f729Sjoerg 45*7330f729Sjoerg 46*7330f729Sjoerg## Installation Guide 47*7330f729Sjoerg 48*7330f729SjoergFor Ubuntu and Debian Based System 49*7330f729Sjoerg 50*7330f729SjoergFirst make sure you have git and cmake installed (If not please install it) 51*7330f729Sjoerg 52*7330f729Sjoerg``` 53*7330f729Sjoergsudo apt-get install git 54*7330f729Sjoergsudo apt-get install cmake 55*7330f729Sjoerg``` 56*7330f729Sjoerg 57*7330f729SjoergNow, let's clone the repository and build it 58*7330f729Sjoerg 59*7330f729Sjoerg``` 60*7330f729Sjoerggit clone https://github.com/google/benchmark.git 61*7330f729Sjoergcd benchmark 62*7330f729Sjoerggit clone https://github.com/google/googletest.git 63*7330f729Sjoergmkdir build 64*7330f729Sjoergcd build 65*7330f729Sjoergcmake .. -DCMAKE_BUILD_TYPE=RELEASE 66*7330f729Sjoergmake 67*7330f729Sjoerg``` 68*7330f729Sjoerg 69*7330f729SjoergWe need to install the library globally now 70*7330f729Sjoerg 71*7330f729Sjoerg``` 72*7330f729Sjoergsudo make install 73*7330f729Sjoerg``` 74*7330f729Sjoerg 75*7330f729SjoergNow you have google/benchmark installed in your machine 76*7330f729SjoergNote: Don't forget to link to pthread library while building 77*7330f729Sjoerg 78*7330f729Sjoerg## Stable and Experimental Library Versions 79*7330f729Sjoerg 80*7330f729SjoergThe main branch contains the latest stable version of the benchmarking library; 81*7330f729Sjoergthe API of which can be considered largely stable, with source breaking changes 82*7330f729Sjoergbeing made only upon the release of a new major version. 83*7330f729Sjoerg 84*7330f729SjoergNewer, experimental, features are implemented and tested on the 85*7330f729Sjoerg[`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish 86*7330f729Sjoergto use, test, and provide feedback on the new features are encouraged to try 87*7330f729Sjoergthis branch. However, this branch provides no stability guarantees and reserves 88*7330f729Sjoergthe right to change and break the API at any time. 89*7330f729Sjoerg 90*7330f729Sjoerg##Prerequisite knowledge 91*7330f729Sjoerg 92*7330f729SjoergBefore attempting to understand this framework one should ideally have some familiarity with the structure and format of the Google Test framework, upon which it is based. Documentation for Google Test, including a "Getting Started" (primer) guide, is available here: 93*7330f729Sjoerghttps://github.com/google/googletest/blob/master/googletest/docs/Documentation.md 94*7330f729Sjoerg 95*7330f729Sjoerg 96*7330f729Sjoerg## Example usage 97*7330f729Sjoerg### Basic usage 98*7330f729SjoergDefine a function that executes the code to be measured. 99*7330f729Sjoerg 100*7330f729Sjoerg```c++ 101*7330f729Sjoerg#include <benchmark/benchmark.h> 102*7330f729Sjoerg 103*7330f729Sjoergstatic void BM_StringCreation(benchmark::State& state) { 104*7330f729Sjoerg for (auto _ : state) 105*7330f729Sjoerg std::string empty_string; 106*7330f729Sjoerg} 107*7330f729Sjoerg// Register the function as a benchmark 108*7330f729SjoergBENCHMARK(BM_StringCreation); 109*7330f729Sjoerg 110*7330f729Sjoerg// Define another benchmark 111*7330f729Sjoergstatic void BM_StringCopy(benchmark::State& state) { 112*7330f729Sjoerg std::string x = "hello"; 113*7330f729Sjoerg for (auto _ : state) 114*7330f729Sjoerg std::string copy(x); 115*7330f729Sjoerg} 116*7330f729SjoergBENCHMARK(BM_StringCopy); 117*7330f729Sjoerg 118*7330f729SjoergBENCHMARK_MAIN(); 119*7330f729Sjoerg``` 120*7330f729Sjoerg 121*7330f729SjoergDon't forget to inform your linker to add benchmark library e.g. through 122*7330f729Sjoerg`-lbenchmark` compilation flag. Alternatively, you may leave out the 123*7330f729Sjoerg`BENCHMARK_MAIN();` at the end of the source file and link against 124*7330f729Sjoerg`-lbenchmark_main` to get the same default behavior. 125*7330f729Sjoerg 126*7330f729SjoergThe benchmark library will reporting the timing for the code within the `for(...)` loop. 127*7330f729Sjoerg 128*7330f729Sjoerg### Passing arguments 129*7330f729SjoergSometimes a family of benchmarks can be implemented with just one routine that 130*7330f729Sjoergtakes an extra argument to specify which one of the family of benchmarks to 131*7330f729Sjoergrun. For example, the following code defines a family of benchmarks for 132*7330f729Sjoergmeasuring the speed of `memcpy()` calls of different lengths: 133*7330f729Sjoerg 134*7330f729Sjoerg```c++ 135*7330f729Sjoergstatic void BM_memcpy(benchmark::State& state) { 136*7330f729Sjoerg char* src = new char[state.range(0)]; 137*7330f729Sjoerg char* dst = new char[state.range(0)]; 138*7330f729Sjoerg memset(src, 'x', state.range(0)); 139*7330f729Sjoerg for (auto _ : state) 140*7330f729Sjoerg memcpy(dst, src, state.range(0)); 141*7330f729Sjoerg state.SetBytesProcessed(int64_t(state.iterations()) * 142*7330f729Sjoerg int64_t(state.range(0))); 143*7330f729Sjoerg delete[] src; 144*7330f729Sjoerg delete[] dst; 145*7330f729Sjoerg} 146*7330f729SjoergBENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); 147*7330f729Sjoerg``` 148*7330f729Sjoerg 149*7330f729SjoergThe preceding code is quite repetitive, and can be replaced with the following 150*7330f729Sjoergshort-hand. The following invocation will pick a few appropriate arguments in 151*7330f729Sjoergthe specified range and will generate a benchmark for each such argument. 152*7330f729Sjoerg 153*7330f729Sjoerg```c++ 154*7330f729SjoergBENCHMARK(BM_memcpy)->Range(8, 8<<10); 155*7330f729Sjoerg``` 156*7330f729Sjoerg 157*7330f729SjoergBy default the arguments in the range are generated in multiples of eight and 158*7330f729Sjoergthe command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the 159*7330f729Sjoergrange multiplier is changed to multiples of two. 160*7330f729Sjoerg 161*7330f729Sjoerg```c++ 162*7330f729SjoergBENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); 163*7330f729Sjoerg``` 164*7330f729SjoergNow arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. 165*7330f729Sjoerg 166*7330f729SjoergYou might have a benchmark that depends on two or more inputs. For example, the 167*7330f729Sjoergfollowing code defines a family of benchmarks for measuring the speed of set 168*7330f729Sjoerginsertion. 169*7330f729Sjoerg 170*7330f729Sjoerg```c++ 171*7330f729Sjoergstatic void BM_SetInsert(benchmark::State& state) { 172*7330f729Sjoerg std::set<int> data; 173*7330f729Sjoerg for (auto _ : state) { 174*7330f729Sjoerg state.PauseTiming(); 175*7330f729Sjoerg data = ConstructRandomSet(state.range(0)); 176*7330f729Sjoerg state.ResumeTiming(); 177*7330f729Sjoerg for (int j = 0; j < state.range(1); ++j) 178*7330f729Sjoerg data.insert(RandomNumber()); 179*7330f729Sjoerg } 180*7330f729Sjoerg} 181*7330f729SjoergBENCHMARK(BM_SetInsert) 182*7330f729Sjoerg ->Args({1<<10, 128}) 183*7330f729Sjoerg ->Args({2<<10, 128}) 184*7330f729Sjoerg ->Args({4<<10, 128}) 185*7330f729Sjoerg ->Args({8<<10, 128}) 186*7330f729Sjoerg ->Args({1<<10, 512}) 187*7330f729Sjoerg ->Args({2<<10, 512}) 188*7330f729Sjoerg ->Args({4<<10, 512}) 189*7330f729Sjoerg ->Args({8<<10, 512}); 190*7330f729Sjoerg``` 191*7330f729Sjoerg 192*7330f729SjoergThe preceding code is quite repetitive, and can be replaced with the following 193*7330f729Sjoergshort-hand. The following macro will pick a few appropriate arguments in the 194*7330f729Sjoergproduct of the two specified ranges and will generate a benchmark for each such 195*7330f729Sjoergpair. 196*7330f729Sjoerg 197*7330f729Sjoerg```c++ 198*7330f729SjoergBENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}}); 199*7330f729Sjoerg``` 200*7330f729Sjoerg 201*7330f729SjoergFor more complex patterns of inputs, passing a custom function to `Apply` allows 202*7330f729Sjoergprogrammatic specification of an arbitrary set of arguments on which to run the 203*7330f729Sjoergbenchmark. The following example enumerates a dense range on one parameter, 204*7330f729Sjoergand a sparse range on the second. 205*7330f729Sjoerg 206*7330f729Sjoerg```c++ 207*7330f729Sjoergstatic void CustomArguments(benchmark::internal::Benchmark* b) { 208*7330f729Sjoerg for (int i = 0; i <= 10; ++i) 209*7330f729Sjoerg for (int j = 32; j <= 1024*1024; j *= 8) 210*7330f729Sjoerg b->Args({i, j}); 211*7330f729Sjoerg} 212*7330f729SjoergBENCHMARK(BM_SetInsert)->Apply(CustomArguments); 213*7330f729Sjoerg``` 214*7330f729Sjoerg 215*7330f729Sjoerg### Calculate asymptotic complexity (Big O) 216*7330f729SjoergAsymptotic complexity might be calculated for a family of benchmarks. The 217*7330f729Sjoergfollowing code will calculate the coefficient for the high-order term in the 218*7330f729Sjoergrunning time and the normalized root-mean square error of string comparison. 219*7330f729Sjoerg 220*7330f729Sjoerg```c++ 221*7330f729Sjoergstatic void BM_StringCompare(benchmark::State& state) { 222*7330f729Sjoerg std::string s1(state.range(0), '-'); 223*7330f729Sjoerg std::string s2(state.range(0), '-'); 224*7330f729Sjoerg for (auto _ : state) { 225*7330f729Sjoerg benchmark::DoNotOptimize(s1.compare(s2)); 226*7330f729Sjoerg } 227*7330f729Sjoerg state.SetComplexityN(state.range(0)); 228*7330f729Sjoerg} 229*7330f729SjoergBENCHMARK(BM_StringCompare) 230*7330f729Sjoerg ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); 231*7330f729Sjoerg``` 232*7330f729Sjoerg 233*7330f729SjoergAs shown in the following invocation, asymptotic complexity might also be 234*7330f729Sjoergcalculated automatically. 235*7330f729Sjoerg 236*7330f729Sjoerg```c++ 237*7330f729SjoergBENCHMARK(BM_StringCompare) 238*7330f729Sjoerg ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); 239*7330f729Sjoerg``` 240*7330f729Sjoerg 241*7330f729SjoergThe following code will specify asymptotic complexity with a lambda function, 242*7330f729Sjoergthat might be used to customize high-order term calculation. 243*7330f729Sjoerg 244*7330f729Sjoerg```c++ 245*7330f729SjoergBENCHMARK(BM_StringCompare)->RangeMultiplier(2) 246*7330f729Sjoerg ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; }); 247*7330f729Sjoerg``` 248*7330f729Sjoerg 249*7330f729Sjoerg### Templated benchmarks 250*7330f729SjoergTemplated benchmarks work the same way: This example produces and consumes 251*7330f729Sjoergmessages of size `sizeof(v)` `range_x` times. It also outputs throughput in the 252*7330f729Sjoergabsence of multiprogramming. 253*7330f729Sjoerg 254*7330f729Sjoerg```c++ 255*7330f729Sjoergtemplate <class Q> int BM_Sequential(benchmark::State& state) { 256*7330f729Sjoerg Q q; 257*7330f729Sjoerg typename Q::value_type v; 258*7330f729Sjoerg for (auto _ : state) { 259*7330f729Sjoerg for (int i = state.range(0); i--; ) 260*7330f729Sjoerg q.push(v); 261*7330f729Sjoerg for (int e = state.range(0); e--; ) 262*7330f729Sjoerg q.Wait(&v); 263*7330f729Sjoerg } 264*7330f729Sjoerg // actually messages, not bytes: 265*7330f729Sjoerg state.SetBytesProcessed( 266*7330f729Sjoerg static_cast<int64_t>(state.iterations())*state.range(0)); 267*7330f729Sjoerg} 268*7330f729SjoergBENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 269*7330f729Sjoerg``` 270*7330f729Sjoerg 271*7330f729SjoergThree macros are provided for adding benchmark templates. 272*7330f729Sjoerg 273*7330f729Sjoerg```c++ 274*7330f729Sjoerg#ifdef BENCHMARK_HAS_CXX11 275*7330f729Sjoerg#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. 276*7330f729Sjoerg#else // C++ < C++11 277*7330f729Sjoerg#define BENCHMARK_TEMPLATE(func, arg1) 278*7330f729Sjoerg#endif 279*7330f729Sjoerg#define BENCHMARK_TEMPLATE1(func, arg1) 280*7330f729Sjoerg#define BENCHMARK_TEMPLATE2(func, arg1, arg2) 281*7330f729Sjoerg``` 282*7330f729Sjoerg 283*7330f729Sjoerg### A Faster KeepRunning loop 284*7330f729Sjoerg 285*7330f729SjoergIn C++11 mode, a ranged-based for loop should be used in preference to 286*7330f729Sjoergthe `KeepRunning` loop for running the benchmarks. For example: 287*7330f729Sjoerg 288*7330f729Sjoerg```c++ 289*7330f729Sjoergstatic void BM_Fast(benchmark::State &state) { 290*7330f729Sjoerg for (auto _ : state) { 291*7330f729Sjoerg FastOperation(); 292*7330f729Sjoerg } 293*7330f729Sjoerg} 294*7330f729SjoergBENCHMARK(BM_Fast); 295*7330f729Sjoerg``` 296*7330f729Sjoerg 297*7330f729SjoergThe reason the ranged-for loop is faster than using `KeepRunning`, is 298*7330f729Sjoergbecause `KeepRunning` requires a memory load and store of the iteration count 299*7330f729Sjoergever iteration, whereas the ranged-for variant is able to keep the iteration count 300*7330f729Sjoergin a register. 301*7330f729Sjoerg 302*7330f729SjoergFor example, an empty inner loop of using the ranged-based for method looks like: 303*7330f729Sjoerg 304*7330f729Sjoerg```asm 305*7330f729Sjoerg# Loop Init 306*7330f729Sjoerg mov rbx, qword ptr [r14 + 104] 307*7330f729Sjoerg call benchmark::State::StartKeepRunning() 308*7330f729Sjoerg test rbx, rbx 309*7330f729Sjoerg je .LoopEnd 310*7330f729Sjoerg.LoopHeader: # =>This Inner Loop Header: Depth=1 311*7330f729Sjoerg add rbx, -1 312*7330f729Sjoerg jne .LoopHeader 313*7330f729Sjoerg.LoopEnd: 314*7330f729Sjoerg``` 315*7330f729Sjoerg 316*7330f729SjoergCompared to an empty `KeepRunning` loop, which looks like: 317*7330f729Sjoerg 318*7330f729Sjoerg```asm 319*7330f729Sjoerg.LoopHeader: # in Loop: Header=BB0_3 Depth=1 320*7330f729Sjoerg cmp byte ptr [rbx], 1 321*7330f729Sjoerg jne .LoopInit 322*7330f729Sjoerg.LoopBody: # =>This Inner Loop Header: Depth=1 323*7330f729Sjoerg mov rax, qword ptr [rbx + 8] 324*7330f729Sjoerg lea rcx, [rax + 1] 325*7330f729Sjoerg mov qword ptr [rbx + 8], rcx 326*7330f729Sjoerg cmp rax, qword ptr [rbx + 104] 327*7330f729Sjoerg jb .LoopHeader 328*7330f729Sjoerg jmp .LoopEnd 329*7330f729Sjoerg.LoopInit: 330*7330f729Sjoerg mov rdi, rbx 331*7330f729Sjoerg call benchmark::State::StartKeepRunning() 332*7330f729Sjoerg jmp .LoopBody 333*7330f729Sjoerg.LoopEnd: 334*7330f729Sjoerg``` 335*7330f729Sjoerg 336*7330f729SjoergUnless C++03 compatibility is required, the ranged-for variant of writing 337*7330f729Sjoergthe benchmark loop should be preferred. 338*7330f729Sjoerg 339*7330f729Sjoerg## Passing arbitrary arguments to a benchmark 340*7330f729SjoergIn C++11 it is possible to define a benchmark that takes an arbitrary number 341*7330f729Sjoergof extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` 342*7330f729Sjoergmacro creates a benchmark that invokes `func` with the `benchmark::State` as 343*7330f729Sjoergthe first argument followed by the specified `args...`. 344*7330f729SjoergThe `test_case_name` is appended to the name of the benchmark and 345*7330f729Sjoergshould describe the values passed. 346*7330f729Sjoerg 347*7330f729Sjoerg```c++ 348*7330f729Sjoergtemplate <class ...ExtraArgs> 349*7330f729Sjoergvoid BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) { 350*7330f729Sjoerg [...] 351*7330f729Sjoerg} 352*7330f729Sjoerg// Registers a benchmark named "BM_takes_args/int_string_test" that passes 353*7330f729Sjoerg// the specified values to `extra_args`. 354*7330f729SjoergBENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); 355*7330f729Sjoerg``` 356*7330f729SjoergNote that elements of `...args` may refer to global variables. Users should 357*7330f729Sjoergavoid modifying global state inside of a benchmark. 358*7330f729Sjoerg 359*7330f729Sjoerg## Using RegisterBenchmark(name, fn, args...) 360*7330f729Sjoerg 361*7330f729SjoergThe `RegisterBenchmark(name, func, args...)` function provides an alternative 362*7330f729Sjoergway to create and register benchmarks. 363*7330f729Sjoerg`RegisterBenchmark(name, func, args...)` creates, registers, and returns a 364*7330f729Sjoergpointer to a new benchmark with the specified `name` that invokes 365*7330f729Sjoerg`func(st, args...)` where `st` is a `benchmark::State` object. 366*7330f729Sjoerg 367*7330f729SjoergUnlike the `BENCHMARK` registration macros, which can only be used at the global 368*7330f729Sjoergscope, the `RegisterBenchmark` can be called anywhere. This allows for 369*7330f729Sjoergbenchmark tests to be registered programmatically. 370*7330f729Sjoerg 371*7330f729SjoergAdditionally `RegisterBenchmark` allows any callable object to be registered 372*7330f729Sjoergas a benchmark. Including capturing lambdas and function objects. 373*7330f729Sjoerg 374*7330f729SjoergFor Example: 375*7330f729Sjoerg```c++ 376*7330f729Sjoergauto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; 377*7330f729Sjoerg 378*7330f729Sjoergint main(int argc, char** argv) { 379*7330f729Sjoerg for (auto& test_input : { /* ... */ }) 380*7330f729Sjoerg benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); 381*7330f729Sjoerg benchmark::Initialize(&argc, argv); 382*7330f729Sjoerg benchmark::RunSpecifiedBenchmarks(); 383*7330f729Sjoerg} 384*7330f729Sjoerg``` 385*7330f729Sjoerg 386*7330f729Sjoerg### Multithreaded benchmarks 387*7330f729SjoergIn a multithreaded test (benchmark invoked by multiple threads simultaneously), 388*7330f729Sjoergit is guaranteed that none of the threads will start until all have reached 389*7330f729Sjoergthe start of the benchmark loop, and all will have finished before any thread 390*7330f729Sjoergexits the benchmark loop. (This behavior is also provided by the `KeepRunning()` 391*7330f729SjoergAPI) As such, any global setup or teardown can be wrapped in a check against the thread 392*7330f729Sjoergindex: 393*7330f729Sjoerg 394*7330f729Sjoerg```c++ 395*7330f729Sjoergstatic void BM_MultiThreaded(benchmark::State& state) { 396*7330f729Sjoerg if (state.thread_index == 0) { 397*7330f729Sjoerg // Setup code here. 398*7330f729Sjoerg } 399*7330f729Sjoerg for (auto _ : state) { 400*7330f729Sjoerg // Run the test as normal. 401*7330f729Sjoerg } 402*7330f729Sjoerg if (state.thread_index == 0) { 403*7330f729Sjoerg // Teardown code here. 404*7330f729Sjoerg } 405*7330f729Sjoerg} 406*7330f729SjoergBENCHMARK(BM_MultiThreaded)->Threads(2); 407*7330f729Sjoerg``` 408*7330f729Sjoerg 409*7330f729SjoergIf the benchmarked code itself uses threads and you want to compare it to 410*7330f729Sjoergsingle-threaded code, you may want to use real-time ("wallclock") measurements 411*7330f729Sjoergfor latency comparisons: 412*7330f729Sjoerg 413*7330f729Sjoerg```c++ 414*7330f729SjoergBENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 415*7330f729Sjoerg``` 416*7330f729Sjoerg 417*7330f729SjoergWithout `UseRealTime`, CPU time is used by default. 418*7330f729Sjoerg 419*7330f729Sjoerg 420*7330f729Sjoerg## Manual timing 421*7330f729SjoergFor benchmarking something for which neither CPU time nor real-time are 422*7330f729Sjoergcorrect or accurate enough, completely manual timing is supported using 423*7330f729Sjoergthe `UseManualTime` function. 424*7330f729Sjoerg 425*7330f729SjoergWhen `UseManualTime` is used, the benchmarked code must call 426*7330f729Sjoerg`SetIterationTime` once per iteration of the benchmark loop to 427*7330f729Sjoergreport the manually measured time. 428*7330f729Sjoerg 429*7330f729SjoergAn example use case for this is benchmarking GPU execution (e.g. OpenCL 430*7330f729Sjoergor CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot 431*7330f729Sjoergbe accurately measured using CPU time or real-time. Instead, they can be 432*7330f729Sjoergmeasured accurately using a dedicated API, and these measurement results 433*7330f729Sjoergcan be reported back with `SetIterationTime`. 434*7330f729Sjoerg 435*7330f729Sjoerg```c++ 436*7330f729Sjoergstatic void BM_ManualTiming(benchmark::State& state) { 437*7330f729Sjoerg int microseconds = state.range(0); 438*7330f729Sjoerg std::chrono::duration<double, std::micro> sleep_duration { 439*7330f729Sjoerg static_cast<double>(microseconds) 440*7330f729Sjoerg }; 441*7330f729Sjoerg 442*7330f729Sjoerg for (auto _ : state) { 443*7330f729Sjoerg auto start = std::chrono::high_resolution_clock::now(); 444*7330f729Sjoerg // Simulate some useful workload with a sleep 445*7330f729Sjoerg std::this_thread::sleep_for(sleep_duration); 446*7330f729Sjoerg auto end = std::chrono::high_resolution_clock::now(); 447*7330f729Sjoerg 448*7330f729Sjoerg auto elapsed_seconds = 449*7330f729Sjoerg std::chrono::duration_cast<std::chrono::duration<double>>( 450*7330f729Sjoerg end - start); 451*7330f729Sjoerg 452*7330f729Sjoerg state.SetIterationTime(elapsed_seconds.count()); 453*7330f729Sjoerg } 454*7330f729Sjoerg} 455*7330f729SjoergBENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); 456*7330f729Sjoerg``` 457*7330f729Sjoerg 458*7330f729Sjoerg### Preventing optimisation 459*7330f729SjoergTo prevent a value or expression from being optimized away by the compiler 460*7330f729Sjoergthe `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` 461*7330f729Sjoergfunctions can be used. 462*7330f729Sjoerg 463*7330f729Sjoerg```c++ 464*7330f729Sjoergstatic void BM_test(benchmark::State& state) { 465*7330f729Sjoerg for (auto _ : state) { 466*7330f729Sjoerg int x = 0; 467*7330f729Sjoerg for (int i=0; i < 64; ++i) { 468*7330f729Sjoerg benchmark::DoNotOptimize(x += i); 469*7330f729Sjoerg } 470*7330f729Sjoerg } 471*7330f729Sjoerg} 472*7330f729Sjoerg``` 473*7330f729Sjoerg 474*7330f729Sjoerg`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either 475*7330f729Sjoergmemory or a register. For GNU based compilers it acts as read/write barrier 476*7330f729Sjoergfor global memory. More specifically it forces the compiler to flush pending 477*7330f729Sjoergwrites to memory and reload any other values as necessary. 478*7330f729Sjoerg 479*7330f729SjoergNote that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` 480*7330f729Sjoergin any way. `<expr>` may even be removed entirely when the result is already 481*7330f729Sjoergknown. For example: 482*7330f729Sjoerg 483*7330f729Sjoerg```c++ 484*7330f729Sjoerg /* Example 1: `<expr>` is removed entirely. */ 485*7330f729Sjoerg int foo(int x) { return x + 42; } 486*7330f729Sjoerg while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); 487*7330f729Sjoerg 488*7330f729Sjoerg /* Example 2: Result of '<expr>' is only reused */ 489*7330f729Sjoerg int bar(int) __attribute__((const)); 490*7330f729Sjoerg while (...) DoNotOptimize(bar(0)); // Optimized to: 491*7330f729Sjoerg // int __result__ = bar(0); 492*7330f729Sjoerg // while (...) DoNotOptimize(__result__); 493*7330f729Sjoerg``` 494*7330f729Sjoerg 495*7330f729SjoergThe second tool for preventing optimizations is `ClobberMemory()`. In essence 496*7330f729Sjoerg`ClobberMemory()` forces the compiler to perform all pending writes to global 497*7330f729Sjoergmemory. Memory managed by block scope objects must be "escaped" using 498*7330f729Sjoerg`DoNotOptimize(...)` before it can be clobbered. In the below example 499*7330f729Sjoerg`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized 500*7330f729Sjoergaway. 501*7330f729Sjoerg 502*7330f729Sjoerg```c++ 503*7330f729Sjoergstatic void BM_vector_push_back(benchmark::State& state) { 504*7330f729Sjoerg for (auto _ : state) { 505*7330f729Sjoerg std::vector<int> v; 506*7330f729Sjoerg v.reserve(1); 507*7330f729Sjoerg benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered. 508*7330f729Sjoerg v.push_back(42); 509*7330f729Sjoerg benchmark::ClobberMemory(); // Force 42 to be written to memory. 510*7330f729Sjoerg } 511*7330f729Sjoerg} 512*7330f729Sjoerg``` 513*7330f729Sjoerg 514*7330f729SjoergNote that `ClobberMemory()` is only available for GNU or MSVC based compilers. 515*7330f729Sjoerg 516*7330f729Sjoerg### Set time unit manually 517*7330f729SjoergIf a benchmark runs a few milliseconds it may be hard to visually compare the 518*7330f729Sjoergmeasured times, since the output data is given in nanoseconds per default. In 519*7330f729Sjoergorder to manually set the time unit, you can specify it manually: 520*7330f729Sjoerg 521*7330f729Sjoerg```c++ 522*7330f729SjoergBENCHMARK(BM_test)->Unit(benchmark::kMillisecond); 523*7330f729Sjoerg``` 524*7330f729Sjoerg 525*7330f729Sjoerg## Controlling number of iterations 526*7330f729SjoergIn all cases, the number of iterations for which the benchmark is run is 527*7330f729Sjoerggoverned by the amount of time the benchmark takes. Concretely, the number of 528*7330f729Sjoergiterations is at least one, not more than 1e9, until CPU time is greater than 529*7330f729Sjoergthe minimum time, or the wallclock time is 5x minimum time. The minimum time is 530*7330f729Sjoergset as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on 531*7330f729Sjoergthe registered benchmark object. 532*7330f729Sjoerg 533*7330f729Sjoerg## Reporting the mean, median and standard deviation by repeated benchmarks 534*7330f729SjoergBy default each benchmark is run once and that single result is reported. 535*7330f729SjoergHowever benchmarks are often noisy and a single result may not be representative 536*7330f729Sjoergof the overall behavior. For this reason it's possible to repeatedly rerun the 537*7330f729Sjoergbenchmark. 538*7330f729Sjoerg 539*7330f729SjoergThe number of runs of each benchmark is specified globally by the 540*7330f729Sjoerg`--benchmark_repetitions` flag or on a per benchmark basis by calling 541*7330f729Sjoerg`Repetitions` on the registered benchmark object. When a benchmark is run more 542*7330f729Sjoergthan once the mean, median and standard deviation of the runs will be reported. 543*7330f729Sjoerg 544*7330f729SjoergAdditionally the `--benchmark_report_aggregates_only={true|false}` flag or 545*7330f729Sjoerg`ReportAggregatesOnly(bool)` function can be used to change how repeated tests 546*7330f729Sjoergare reported. By default the result of each repeated run is reported. When this 547*7330f729Sjoergoption is `true` only the mean, median and standard deviation of the runs is reported. 548*7330f729SjoergCalling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides 549*7330f729Sjoergthe value of the flag for that benchmark. 550*7330f729Sjoerg 551*7330f729Sjoerg## User-defined statistics for repeated benchmarks 552*7330f729SjoergWhile having mean, median and standard deviation is nice, this may not be 553*7330f729Sjoergenough for everyone. For example you may want to know what is the largest 554*7330f729Sjoergobservation, e.g. because you have some real-time constraints. This is easy. 555*7330f729SjoergThe following code will specify a custom statistic to be calculated, defined 556*7330f729Sjoergby a lambda function. 557*7330f729Sjoerg 558*7330f729Sjoerg```c++ 559*7330f729Sjoergvoid BM_spin_empty(benchmark::State& state) { 560*7330f729Sjoerg for (auto _ : state) { 561*7330f729Sjoerg for (int x = 0; x < state.range(0); ++x) { 562*7330f729Sjoerg benchmark::DoNotOptimize(x); 563*7330f729Sjoerg } 564*7330f729Sjoerg } 565*7330f729Sjoerg} 566*7330f729Sjoerg 567*7330f729SjoergBENCHMARK(BM_spin_empty) 568*7330f729Sjoerg ->ComputeStatistics("max", [](const std::vector<double>& v) -> double { 569*7330f729Sjoerg return *(std::max_element(std::begin(v), std::end(v))); 570*7330f729Sjoerg }) 571*7330f729Sjoerg ->Arg(512); 572*7330f729Sjoerg``` 573*7330f729Sjoerg 574*7330f729Sjoerg## Fixtures 575*7330f729SjoergFixture tests are created by 576*7330f729Sjoergfirst defining a type that derives from `::benchmark::Fixture` and then 577*7330f729Sjoergcreating/registering the tests using the following macros: 578*7330f729Sjoerg 579*7330f729Sjoerg* `BENCHMARK_F(ClassName, Method)` 580*7330f729Sjoerg* `BENCHMARK_DEFINE_F(ClassName, Method)` 581*7330f729Sjoerg* `BENCHMARK_REGISTER_F(ClassName, Method)` 582*7330f729Sjoerg 583*7330f729SjoergFor Example: 584*7330f729Sjoerg 585*7330f729Sjoerg```c++ 586*7330f729Sjoergclass MyFixture : public benchmark::Fixture {}; 587*7330f729Sjoerg 588*7330f729SjoergBENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 589*7330f729Sjoerg for (auto _ : st) { 590*7330f729Sjoerg ... 591*7330f729Sjoerg } 592*7330f729Sjoerg} 593*7330f729Sjoerg 594*7330f729SjoergBENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 595*7330f729Sjoerg for (auto _ : st) { 596*7330f729Sjoerg ... 597*7330f729Sjoerg } 598*7330f729Sjoerg} 599*7330f729Sjoerg/* BarTest is NOT registered */ 600*7330f729SjoergBENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 601*7330f729Sjoerg/* BarTest is now registered */ 602*7330f729Sjoerg``` 603*7330f729Sjoerg 604*7330f729Sjoerg### Templated fixtures 605*7330f729SjoergAlso you can create templated fixture by using the following macros: 606*7330f729Sjoerg 607*7330f729Sjoerg* `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)` 608*7330f729Sjoerg* `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)` 609*7330f729Sjoerg 610*7330f729SjoergFor example: 611*7330f729Sjoerg```c++ 612*7330f729Sjoergtemplate<typename T> 613*7330f729Sjoergclass MyFixture : public benchmark::Fixture {}; 614*7330f729Sjoerg 615*7330f729SjoergBENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) { 616*7330f729Sjoerg for (auto _ : st) { 617*7330f729Sjoerg ... 618*7330f729Sjoerg } 619*7330f729Sjoerg} 620*7330f729Sjoerg 621*7330f729SjoergBENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) { 622*7330f729Sjoerg for (auto _ : st) { 623*7330f729Sjoerg ... 624*7330f729Sjoerg } 625*7330f729Sjoerg} 626*7330f729Sjoerg 627*7330f729SjoergBENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2); 628*7330f729Sjoerg``` 629*7330f729Sjoerg 630*7330f729Sjoerg## User-defined counters 631*7330f729Sjoerg 632*7330f729SjoergYou can add your own counters with user-defined names. The example below 633*7330f729Sjoergwill add columns "Foo", "Bar" and "Baz" in its output: 634*7330f729Sjoerg 635*7330f729Sjoerg```c++ 636*7330f729Sjoergstatic void UserCountersExample1(benchmark::State& state) { 637*7330f729Sjoerg double numFoos = 0, numBars = 0, numBazs = 0; 638*7330f729Sjoerg for (auto _ : state) { 639*7330f729Sjoerg // ... count Foo,Bar,Baz events 640*7330f729Sjoerg } 641*7330f729Sjoerg state.counters["Foo"] = numFoos; 642*7330f729Sjoerg state.counters["Bar"] = numBars; 643*7330f729Sjoerg state.counters["Baz"] = numBazs; 644*7330f729Sjoerg} 645*7330f729Sjoerg``` 646*7330f729Sjoerg 647*7330f729SjoergThe `state.counters` object is a `std::map` with `std::string` keys 648*7330f729Sjoergand `Counter` values. The latter is a `double`-like class, via an implicit 649*7330f729Sjoergconversion to `double&`. Thus you can use all of the standard arithmetic 650*7330f729Sjoergassignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. 651*7330f729Sjoerg 652*7330f729SjoergIn multithreaded benchmarks, each counter is set on the calling thread only. 653*7330f729SjoergWhen the benchmark finishes, the counters from each thread will be summed; 654*7330f729Sjoergthe resulting sum is the value which will be shown for the benchmark. 655*7330f729Sjoerg 656*7330f729SjoergThe `Counter` constructor accepts two parameters: the value as a `double` 657*7330f729Sjoergand a bit flag which allows you to show counters as rates and/or as 658*7330f729Sjoergper-thread averages: 659*7330f729Sjoerg 660*7330f729Sjoerg```c++ 661*7330f729Sjoerg // sets a simple counter 662*7330f729Sjoerg state.counters["Foo"] = numFoos; 663*7330f729Sjoerg 664*7330f729Sjoerg // Set the counter as a rate. It will be presented divided 665*7330f729Sjoerg // by the duration of the benchmark. 666*7330f729Sjoerg state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); 667*7330f729Sjoerg 668*7330f729Sjoerg // Set the counter as a thread-average quantity. It will 669*7330f729Sjoerg // be presented divided by the number of threads. 670*7330f729Sjoerg state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); 671*7330f729Sjoerg 672*7330f729Sjoerg // There's also a combined flag: 673*7330f729Sjoerg state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); 674*7330f729Sjoerg``` 675*7330f729Sjoerg 676*7330f729SjoergWhen you're compiling in C++11 mode or later you can use `insert()` with 677*7330f729Sjoerg`std::initializer_list`: 678*7330f729Sjoerg 679*7330f729Sjoerg```c++ 680*7330f729Sjoerg // With C++11, this can be done: 681*7330f729Sjoerg state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); 682*7330f729Sjoerg // ... instead of: 683*7330f729Sjoerg state.counters["Foo"] = numFoos; 684*7330f729Sjoerg state.counters["Bar"] = numBars; 685*7330f729Sjoerg state.counters["Baz"] = numBazs; 686*7330f729Sjoerg``` 687*7330f729Sjoerg 688*7330f729Sjoerg### Counter reporting 689*7330f729Sjoerg 690*7330f729SjoergWhen using the console reporter, by default, user counters are are printed at 691*7330f729Sjoergthe end after the table, the same way as ``bytes_processed`` and 692*7330f729Sjoerg``items_processed``. This is best for cases in which there are few counters, 693*7330f729Sjoergor where there are only a couple of lines per benchmark. Here's an example of 694*7330f729Sjoergthe default output: 695*7330f729Sjoerg 696*7330f729Sjoerg``` 697*7330f729Sjoerg------------------------------------------------------------------------------ 698*7330f729SjoergBenchmark Time CPU Iterations UserCounters... 699*7330f729Sjoerg------------------------------------------------------------------------------ 700*7330f729SjoergBM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 701*7330f729SjoergBM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m 702*7330f729SjoergBM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 703*7330f729SjoergBM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 704*7330f729SjoergBM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 705*7330f729SjoergBM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 706*7330f729SjoergBM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 707*7330f729SjoergBM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 708*7330f729SjoergBM_Factorial 26 ns 26 ns 26608979 40320 709*7330f729SjoergBM_Factorial/real_time 26 ns 26 ns 26587936 40320 710*7330f729SjoergBM_CalculatePiRange/1 16 ns 16 ns 45704255 0 711*7330f729SjoergBM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 712*7330f729SjoergBM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 713*7330f729SjoergBM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 714*7330f729Sjoerg``` 715*7330f729Sjoerg 716*7330f729SjoergIf this doesn't suit you, you can print each counter as a table column by 717*7330f729Sjoergpassing the flag `--benchmark_counters_tabular=true` to the benchmark 718*7330f729Sjoergapplication. This is best for cases in which there are a lot of counters, or 719*7330f729Sjoerga lot of lines per individual benchmark. Note that this will trigger a 720*7330f729Sjoergreprinting of the table header any time the counter set changes between 721*7330f729Sjoergindividual benchmarks. Here's an example of corresponding output when 722*7330f729Sjoerg`--benchmark_counters_tabular=true` is passed: 723*7330f729Sjoerg 724*7330f729Sjoerg``` 725*7330f729Sjoerg--------------------------------------------------------------------------------------- 726*7330f729SjoergBenchmark Time CPU Iterations Bar Bat Baz Foo 727*7330f729Sjoerg--------------------------------------------------------------------------------------- 728*7330f729SjoergBM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 729*7330f729SjoergBM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 730*7330f729SjoergBM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 731*7330f729SjoergBM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 732*7330f729SjoergBM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 733*7330f729SjoergBM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 734*7330f729SjoergBM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 735*7330f729SjoergBM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 736*7330f729Sjoerg-------------------------------------------------------------- 737*7330f729SjoergBenchmark Time CPU Iterations 738*7330f729Sjoerg-------------------------------------------------------------- 739*7330f729SjoergBM_Factorial 26 ns 26 ns 26392245 40320 740*7330f729SjoergBM_Factorial/real_time 26 ns 26 ns 26494107 40320 741*7330f729SjoergBM_CalculatePiRange/1 15 ns 15 ns 45571597 0 742*7330f729SjoergBM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 743*7330f729SjoergBM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 744*7330f729SjoergBM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 745*7330f729SjoergBM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 746*7330f729SjoergBM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 747*7330f729SjoergBM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 748*7330f729SjoergBM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 749*7330f729SjoergBM_CalculatePi/threads:8 2255 ns 9943 ns 70936 750*7330f729Sjoerg``` 751*7330f729SjoergNote above the additional header printed when the benchmark changes from 752*7330f729Sjoerg``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does 753*7330f729Sjoergnot have the same counter set as ``BM_UserCounter``. 754*7330f729Sjoerg 755*7330f729Sjoerg## Exiting Benchmarks in Error 756*7330f729Sjoerg 757*7330f729SjoergWhen errors caused by external influences, such as file I/O and network 758*7330f729Sjoergcommunication, occur within a benchmark the 759*7330f729Sjoerg`State::SkipWithError(const char* msg)` function can be used to skip that run 760*7330f729Sjoergof benchmark and report the error. Note that only future iterations of the 761*7330f729Sjoerg`KeepRunning()` are skipped. For the ranged-for version of the benchmark loop 762*7330f729SjoergUsers must explicitly exit the loop, otherwise all iterations will be performed. 763*7330f729SjoergUsers may explicitly return to exit the benchmark immediately. 764*7330f729Sjoerg 765*7330f729SjoergThe `SkipWithError(...)` function may be used at any point within the benchmark, 766*7330f729Sjoergincluding before and after the benchmark loop. 767*7330f729Sjoerg 768*7330f729SjoergFor example: 769*7330f729Sjoerg 770*7330f729Sjoerg```c++ 771*7330f729Sjoergstatic void BM_test(benchmark::State& state) { 772*7330f729Sjoerg auto resource = GetResource(); 773*7330f729Sjoerg if (!resource.good()) { 774*7330f729Sjoerg state.SkipWithError("Resource is not good!"); 775*7330f729Sjoerg // KeepRunning() loop will not be entered. 776*7330f729Sjoerg } 777*7330f729Sjoerg for (state.KeepRunning()) { 778*7330f729Sjoerg auto data = resource.read_data(); 779*7330f729Sjoerg if (!resource.good()) { 780*7330f729Sjoerg state.SkipWithError("Failed to read data!"); 781*7330f729Sjoerg break; // Needed to skip the rest of the iteration. 782*7330f729Sjoerg } 783*7330f729Sjoerg do_stuff(data); 784*7330f729Sjoerg } 785*7330f729Sjoerg} 786*7330f729Sjoerg 787*7330f729Sjoergstatic void BM_test_ranged_fo(benchmark::State & state) { 788*7330f729Sjoerg state.SkipWithError("test will not be entered"); 789*7330f729Sjoerg for (auto _ : state) { 790*7330f729Sjoerg state.SkipWithError("Failed!"); 791*7330f729Sjoerg break; // REQUIRED to prevent all further iterations. 792*7330f729Sjoerg } 793*7330f729Sjoerg} 794*7330f729Sjoerg``` 795*7330f729Sjoerg 796*7330f729Sjoerg## Running a subset of the benchmarks 797*7330f729Sjoerg 798*7330f729SjoergThe `--benchmark_filter=<regex>` option can be used to only run the benchmarks 799*7330f729Sjoergwhich match the specified `<regex>`. For example: 800*7330f729Sjoerg 801*7330f729Sjoerg```bash 802*7330f729Sjoerg$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 803*7330f729SjoergRun on (1 X 2300 MHz CPU ) 804*7330f729Sjoerg2016-06-25 19:34:24 805*7330f729SjoergBenchmark Time CPU Iterations 806*7330f729Sjoerg---------------------------------------------------- 807*7330f729SjoergBM_memcpy/32 11 ns 11 ns 79545455 808*7330f729SjoergBM_memcpy/32k 2181 ns 2185 ns 324074 809*7330f729SjoergBM_memcpy/32 12 ns 12 ns 54687500 810*7330f729SjoergBM_memcpy/32k 1834 ns 1837 ns 357143 811*7330f729Sjoerg``` 812*7330f729Sjoerg 813*7330f729Sjoerg 814*7330f729Sjoerg## Output Formats 815*7330f729SjoergThe library supports multiple output formats. Use the 816*7330f729Sjoerg`--benchmark_format=<console|json|csv>` flag to set the format type. `console` 817*7330f729Sjoergis the default format. 818*7330f729Sjoerg 819*7330f729SjoergThe Console format is intended to be a human readable format. By default 820*7330f729Sjoergthe format generates color output. Context is output on stderr and the 821*7330f729Sjoergtabular data on stdout. Example tabular output looks like: 822*7330f729Sjoerg``` 823*7330f729SjoergBenchmark Time(ns) CPU(ns) Iterations 824*7330f729Sjoerg---------------------------------------------------------------------- 825*7330f729SjoergBM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 826*7330f729SjoergBM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 827*7330f729SjoergBM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 828*7330f729Sjoerg``` 829*7330f729Sjoerg 830*7330f729SjoergThe JSON format outputs human readable json split into two top level attributes. 831*7330f729SjoergThe `context` attribute contains information about the run in general, including 832*7330f729Sjoerginformation about the CPU and the date. 833*7330f729SjoergThe `benchmarks` attribute contains a list of every benchmark run. Example json 834*7330f729Sjoergoutput looks like: 835*7330f729Sjoerg```json 836*7330f729Sjoerg{ 837*7330f729Sjoerg "context": { 838*7330f729Sjoerg "date": "2015/03/17-18:40:25", 839*7330f729Sjoerg "num_cpus": 40, 840*7330f729Sjoerg "mhz_per_cpu": 2801, 841*7330f729Sjoerg "cpu_scaling_enabled": false, 842*7330f729Sjoerg "build_type": "debug" 843*7330f729Sjoerg }, 844*7330f729Sjoerg "benchmarks": [ 845*7330f729Sjoerg { 846*7330f729Sjoerg "name": "BM_SetInsert/1024/1", 847*7330f729Sjoerg "iterations": 94877, 848*7330f729Sjoerg "real_time": 29275, 849*7330f729Sjoerg "cpu_time": 29836, 850*7330f729Sjoerg "bytes_per_second": 134066, 851*7330f729Sjoerg "items_per_second": 33516 852*7330f729Sjoerg }, 853*7330f729Sjoerg { 854*7330f729Sjoerg "name": "BM_SetInsert/1024/8", 855*7330f729Sjoerg "iterations": 21609, 856*7330f729Sjoerg "real_time": 32317, 857*7330f729Sjoerg "cpu_time": 32429, 858*7330f729Sjoerg "bytes_per_second": 986770, 859*7330f729Sjoerg "items_per_second": 246693 860*7330f729Sjoerg }, 861*7330f729Sjoerg { 862*7330f729Sjoerg "name": "BM_SetInsert/1024/10", 863*7330f729Sjoerg "iterations": 21393, 864*7330f729Sjoerg "real_time": 32724, 865*7330f729Sjoerg "cpu_time": 33355, 866*7330f729Sjoerg "bytes_per_second": 1199226, 867*7330f729Sjoerg "items_per_second": 299807 868*7330f729Sjoerg } 869*7330f729Sjoerg ] 870*7330f729Sjoerg} 871*7330f729Sjoerg``` 872*7330f729Sjoerg 873*7330f729SjoergThe CSV format outputs comma-separated values. The `context` is output on stderr 874*7330f729Sjoergand the CSV itself on stdout. Example CSV output looks like: 875*7330f729Sjoerg``` 876*7330f729Sjoergname,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 877*7330f729Sjoerg"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 878*7330f729Sjoerg"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 879*7330f729Sjoerg"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 880*7330f729Sjoerg``` 881*7330f729Sjoerg 882*7330f729Sjoerg## Output Files 883*7330f729SjoergThe library supports writing the output of the benchmark to a file specified 884*7330f729Sjoergby `--benchmark_out=<filename>`. The format of the output can be specified 885*7330f729Sjoergusing `--benchmark_out_format={json|console|csv}`. Specifying 886*7330f729Sjoerg`--benchmark_out` does not suppress the console output. 887*7330f729Sjoerg 888*7330f729Sjoerg## Debug vs Release 889*7330f729SjoergBy default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: 890*7330f729Sjoerg 891*7330f729Sjoerg``` 892*7330f729Sjoergcmake -DCMAKE_BUILD_TYPE=Release 893*7330f729Sjoerg``` 894*7330f729Sjoerg 895*7330f729SjoergTo enable link-time optimisation, use 896*7330f729Sjoerg 897*7330f729Sjoerg``` 898*7330f729Sjoergcmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true 899*7330f729Sjoerg``` 900*7330f729Sjoerg 901*7330f729SjoergIf you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails. 902*7330f729SjoergIf you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables. 903*7330f729Sjoerg 904*7330f729Sjoerg## Linking against the library 905*7330f729Sjoerg 906*7330f729SjoergWhen the library is built using GCC it is necessary to link with `-pthread`, 907*7330f729Sjoergdue to how GCC implements `std::thread`. 908*7330f729Sjoerg 909*7330f729SjoergFor GCC 4.x failing to link to pthreads will lead to runtime exceptions, not linker errors. 910*7330f729SjoergSee [issue #67](https://github.com/google/benchmark/issues/67) for more details. 911*7330f729Sjoerg 912*7330f729Sjoerg## Compiler Support 913*7330f729Sjoerg 914*7330f729SjoergGoogle Benchmark uses C++11 when building the library. As such we require 915*7330f729Sjoerga modern C++ toolchain, both compiler and standard library. 916*7330f729Sjoerg 917*7330f729SjoergThe following minimum versions are strongly recommended build the library: 918*7330f729Sjoerg 919*7330f729Sjoerg* GCC 4.8 920*7330f729Sjoerg* Clang 3.4 921*7330f729Sjoerg* Visual Studio 2013 922*7330f729Sjoerg* Intel 2015 Update 1 923*7330f729Sjoerg 924*7330f729SjoergAnything older *may* work. 925*7330f729Sjoerg 926*7330f729SjoergNote: Using the library and its headers in C++03 is supported. C++11 is only 927*7330f729Sjoergrequired to build the library. 928*7330f729Sjoerg 929*7330f729Sjoerg## Disable CPU frequency scaling 930*7330f729SjoergIf you see this error: 931*7330f729Sjoerg``` 932*7330f729Sjoerg***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. 933*7330f729Sjoerg``` 934*7330f729Sjoergyou might want to disable the CPU frequency scaling while running the benchmark: 935*7330f729Sjoerg```bash 936*7330f729Sjoergsudo cpupower frequency-set --governor performance 937*7330f729Sjoerg./mybench 938*7330f729Sjoergsudo cpupower frequency-set --governor powersave 939*7330f729Sjoerg``` 940*7330f729Sjoerg 941*7330f729Sjoerg# Known Issues 942*7330f729Sjoerg 943*7330f729Sjoerg### Windows with CMake 944*7330f729Sjoerg 945*7330f729Sjoerg* Users must manually link `shlwapi.lib`. Failure to do so may result 946*7330f729Sjoergin unresolved symbols. 947*7330f729Sjoerg 948*7330f729Sjoerg### Solaris 949*7330f729Sjoerg 950*7330f729Sjoerg* Users must explicitly link with kstat library (-lkstat compilation flag). 951