1*86d7f5d3SJohn Marino Interbench - The Linux Interactivity Benchmark 2*86d7f5d3SJohn Marino 3*86d7f5d3SJohn Marino 4*86d7f5d3SJohn Marino Introduction 5*86d7f5d3SJohn Marino 6*86d7f5d3SJohn MarinoThis benchmark application is designed to benchmark interactivity in Linux. See 7*86d7f5d3SJohn Marinothe file readme.interactivity for a brief definition. 8*86d7f5d3SJohn Marino 9*86d7f5d3SJohn MarinoIt is designed to measure the effect of changes in Linux kernel design or system 10*86d7f5d3SJohn Marinoconfiguration changes such as cpu, I/O scheduler and filesystem changes and 11*86d7f5d3SJohn Marinooptions. With careful benchmarking, different hardware can be compared. 12*86d7f5d3SJohn Marino 13*86d7f5d3SJohn Marino 14*86d7f5d3SJohn Marino What does it do? 15*86d7f5d3SJohn Marino 16*86d7f5d3SJohn MarinoIt is designed to emulate the cpu scheduling behaviour of interactive tasks and 17*86d7f5d3SJohn Marinomeasure their scheduling latency and jitter. It does this with the tasks on 18*86d7f5d3SJohn Marinotheir own and then in the presence of various background loads, both with 19*86d7f5d3SJohn Marinoconfigurable nice levels and the benchmarked tasks can be real time. 20*86d7f5d3SJohn Marino 21*86d7f5d3SJohn Marino 22*86d7f5d3SJohn Marino How does it work? 23*86d7f5d3SJohn Marino 24*86d7f5d3SJohn MarinoFirst it benchmarks how best to reproduce a fixed percentage of cpu usage on the 25*86d7f5d3SJohn Marinomachine currently being used for the benchmark. It saves this to a file and then 26*86d7f5d3SJohn Marinouses this for all subsequent runs to keep the emulation of cpu usage constant. 27*86d7f5d3SJohn Marino 28*86d7f5d3SJohn MarinoIt runs a real time high priority timing thread that wakes up the thread or 29*86d7f5d3SJohn Marinothreads of the simulated interactive tasks and then measures the latency in the 30*86d7f5d3SJohn Marinotime taken to schedule. As there is no accurate timer driven scheduling in linux 31*86d7f5d3SJohn Marinothe timing thread sleeps as accurately as linux kernel supports, and latency is 32*86d7f5d3SJohn Marinoconsidered as the time from this sleep till the simulated task gets scheduled. 33*86d7f5d3SJohn Marino 34*86d7f5d3SJohn MarinoEach benchmarked simulation runs as a separate process with its own threads, 35*86d7f5d3SJohn Marinoand the background load (if any) also runs as a separate process. 36*86d7f5d3SJohn Marino 37*86d7f5d3SJohn Marino 38*86d7f5d3SJohn Marino What interactive tasks are simulated and how? 39*86d7f5d3SJohn Marino 40*86d7f5d3SJohn MarinoX: 41*86d7f5d3SJohn MarinoX is simulated as a thread that uses a variable amount of cpu ranging from 0 to 42*86d7f5d3SJohn Marino100%. This simulates an idle gui where a window is grabbed and then dragged 43*86d7f5d3SJohn Marinoacross the screen. 44*86d7f5d3SJohn Marino 45*86d7f5d3SJohn MarinoAudio: 46*86d7f5d3SJohn MarinoAudio is simulated as a thread that tries to run at 50ms intervals that then 47*86d7f5d3SJohn Marinorequires 5% cpu. This behaviour ignores any caching that would normally be done 48*86d7f5d3SJohn Marinoby well designed audio applications, but has been seen as the interval used to 49*86d7f5d3SJohn Marinowrite to audio cards by a popular linux audio player. It also ignores any of the 50*86d7f5d3SJohn Marinoeffects of different audio drivers and audio cards. Audio is also benchmarked 51*86d7f5d3SJohn Marinorunning SCHED_FIFO if the real time benchmarking option is used. 52*86d7f5d3SJohn Marino 53*86d7f5d3SJohn MarinoVideo: 54*86d7f5d3SJohn MarinoVideo is simulated as a thread that tries to receive cpu 60 times per second 55*86d7f5d3SJohn Marinoand uses 40% cpu. This would be quite a demanding video playback at 60fps. Like 56*86d7f5d3SJohn Marinothe audio simulator it ignores caching, drivers and video cards. As per audio, 57*86d7f5d3SJohn Marinovideo is benchmarked with the real time option. 58*86d7f5d3SJohn Marino 59*86d7f5d3SJohn MarinoGaming: 60*86d7f5d3SJohn MarinoThe cpu usage behind gaming is not at all interactive, yet games clearly are 61*86d7f5d3SJohn Marinointended for interactive usage. This load simply uses as much cpu as it can 62*86d7f5d3SJohn Marinoget. It does not return deadlines met as there are no deadlines with an 63*86d7f5d3SJohn Marinounlocked frame rate in a game. This does not accurately emulate a 3d game 64*86d7f5d3SJohn Marinowhich is gpu bound (limited purely by the graphics card), only a cpu bound 65*86d7f5d3SJohn Marinoone. 66*86d7f5d3SJohn Marino 67*86d7f5d3SJohn MarinoCustom: 68*86d7f5d3SJohn MarinoThis load will allow you to specify your own combination of cpu percentage and 69*86d7f5d3SJohn Marinointervals if you have a specific workload you are interested in and know the 70*86d7f5d3SJohn Marinocpu usage and frame rate of it on the hardware you are testing. 71*86d7f5d3SJohn Marino 72*86d7f5d3SJohn Marino 73*86d7f5d3SJohn Marino What loads are simulated? 74*86d7f5d3SJohn Marino 75*86d7f5d3SJohn MarinoNone: 76*86d7f5d3SJohn MarinoOtherwise idle system. 77*86d7f5d3SJohn Marino 78*86d7f5d3SJohn MarinoVideo: 79*86d7f5d3SJohn MarinoThe video simulation thread is also used as a background load. 80*86d7f5d3SJohn Marino 81*86d7f5d3SJohn MarinoX: 82*86d7f5d3SJohn MarinoThe X simulation thread is used as a load. 83*86d7f5d3SJohn Marino 84*86d7f5d3SJohn MarinoBurn: 85*86d7f5d3SJohn MarinoA configurable number of threads fully cpu bound (4 by default). 86*86d7f5d3SJohn Marino 87*86d7f5d3SJohn MarinoWrite: 88*86d7f5d3SJohn MarinoA streaming write to disk repeatedly of a file the size of physical ram. 89*86d7f5d3SJohn Marino 90*86d7f5d3SJohn MarinoRead: 91*86d7f5d3SJohn MarinoRepeatedly reading a file from disk the size of physical ram (to avoid any 92*86d7f5d3SJohn Marinocaching effects). 93*86d7f5d3SJohn Marino 94*86d7f5d3SJohn MarinoCompile: 95*86d7f5d3SJohn MarinoSimulating a heavy 'make -j4' compilation by running Burn, Write and Read 96*86d7f5d3SJohn Marinoconcurrently. 97*86d7f5d3SJohn Marino 98*86d7f5d3SJohn MarinoMemload: 99*86d7f5d3SJohn MarinoSimulating heavy memory and swap pressure by repeatedly accessing 110% of 100*86d7f5d3SJohn Marinoavailable ram and moving it around and freeing it. You need to have some 101*86d7f5d3SJohn Marinoswap enabled due to the nature of this load, and if it detects no swap this 102*86d7f5d3SJohn Marinoload is disabled. 103*86d7f5d3SJohn Marino 104*86d7f5d3SJohn MarinoHack: 105*86d7f5d3SJohn MarinoThis repeatedly runs the benchmarking program "hackbench" as 'hackbench 50'. 106*86d7f5d3SJohn MarinoThis is suggested as a real time load only but because of how extreme this 107*86d7f5d3SJohn Marinoload is it is not unusual for an out-of-memory kill to occur which will 108*86d7f5d3SJohn Marinoinvalidate any data you get. For this reason it is disabled by default. 109*86d7f5d3SJohn Marino 110*86d7f5d3SJohn MarinoCustom: 111*86d7f5d3SJohn MarinoThe custom simulation is used as a load. 112*86d7f5d3SJohn Marino 113*86d7f5d3SJohn Marino 114*86d7f5d3SJohn Marino What is measured and what does it mean? 115*86d7f5d3SJohn Marino 116*86d7f5d3SJohn Marino1. The average scheduling latency (time to requesting cpu till actually getting 117*86d7f5d3SJohn Marinoit) of deadlines met during the test period. 118*86d7f5d3SJohn Marino2. The scheduling jitter is represented by calculating the standard deviation 119*86d7f5d3SJohn Marinoof the latency 120*86d7f5d3SJohn Marino3. The maximum latency seen during the test period 121*86d7f5d3SJohn Marino4. Percentage of desired cpu 122*86d7f5d3SJohn Marino5. Percentage of deadlines met. 123*86d7f5d3SJohn Marino 124*86d7f5d3SJohn MarinoThis data is output to console and saved to a file which is stamped with the 125*86d7f5d3SJohn Marinokernel name and date. See sample.log. 126*86d7f5d3SJohn Marino 127*86d7f5d3SJohn Marino Sample: 128*86d7f5d3SJohn Marino--- Benchmarking simulated cpu of X in the presence of simulated --- 129*86d7f5d3SJohn MarinoLoad Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met 130*86d7f5d3SJohn MarinoNone 0.495 +/- 0.495 45 100 96 131*86d7f5d3SJohn MarinoVideo 11.7 +/- 11.7 1815 89.6 62.7 132*86d7f5d3SJohn MarinoBurn 27.9 +/- 28.1 3335 78.5 44 133*86d7f5d3SJohn MarinoWrite 4.02 +/- 4.03 372 97 78.7 134*86d7f5d3SJohn MarinoRead 1.09 +/- 1.09 158 99.7 88 135*86d7f5d3SJohn MarinoCompile 28.8 +/- 28.8 3351 78.2 43.7 136*86d7f5d3SJohn MarinoMemload 2.81 +/- 2.81 187 98.7 85 137*86d7f5d3SJohn Marino 138*86d7f5d3SJohn MarinoWhat can be seen here is that never during this test run were all the so called 139*86d7f5d3SJohn Marinodeadlines met by the X simulator, although all the desired cpu was achieved 140*86d7f5d3SJohn Marinounder no load. In X terms this means that every bit of window movement was 141*86d7f5d3SJohn Marinodrawn while moving the window, but some were delayed and there was enough time 142*86d7f5d3SJohn Marinoto catch up before the next deadline. In the 'Burn' column we can see that only 143*86d7f5d3SJohn Marino44% of the deadlines were met, and only 78.5% of the desired cpu was achieved. 144*86d7f5d3SJohn MarinoThis means that some deadlines were so late (%deadlines met was low) that some 145*86d7f5d3SJohn Marinoredraws were dropped entirely to catch up. In X terms this would translate into 146*86d7f5d3SJohn Marinojerky movement, in audio it would be a skip, and in video it would be a dropped 147*86d7f5d3SJohn Marinoframe. Note that despite the massive maximum latency of >3seconds, the average 148*86d7f5d3SJohn Marinolatency is still less than 30ms. This is because redraws are dropped in order 149*86d7f5d3SJohn Marinoto catch up usually by these sorts of applications. 150*86d7f5d3SJohn Marino 151*86d7f5d3SJohn Marino 152*86d7f5d3SJohn Marino What is relevant in the data? 153*86d7f5d3SJohn Marino 154*86d7f5d3SJohn MarinoThe results pessimise quite a lot what happens in real world terms because they 155*86d7f5d3SJohn Marinoignore the reality of buffering, but this allows us to pick up subtle 156*86d7f5d3SJohn Marinodifferences more readily. In terms of what would be noticed by the end user, 157*86d7f5d3SJohn Marinodropping deadlines would make noticable clicks in audio, subtle visible frame 158*86d7f5d3SJohn Marinotime delays in video, and loss of "smooth" movement in X. Dropping desired cpu 159*86d7f5d3SJohn Marinowould be much more noticeable with audio skips, missed video frames or jerks 160*86d7f5d3SJohn Marinoin window movement under X. The magnitude of these would be best represented by 161*86d7f5d3SJohn Marinothe maximum latency. When the deadlines are actually met, the average latency 162*86d7f5d3SJohn Marinorepresents how "smooth" it would look. Average humans' limit of perception for 163*86d7f5d3SJohn Marinojitter is in the order of 7ms. Trained audio observers might notice much less. 164*86d7f5d3SJohn Marino 165*86d7f5d3SJohn Marino 166*86d7f5d3SJohn Marino How to use it? 167*86d7f5d3SJohn Marino 168*86d7f5d3SJohn MarinoIn response to critisicm of difficulty in setting up my previous benchmark, 169*86d7f5d3SJohn Marinocontest, I've made this as simple as possible. 170*86d7f5d3SJohn Marino 171*86d7f5d3SJohn Marino Short version: 172*86d7f5d3SJohn Marinomake 173*86d7f5d3SJohn Marino./interbench 174*86d7f5d3SJohn Marino 175*86d7f5d3SJohn MarinoPlease read the long version before submitting results! 176*86d7f5d3SJohn Marino 177*86d7f5d3SJohn Marino Longer version: 178*86d7f5d3SJohn MarinoBuild with 'make'. It is a single executable once built so if you desire to 179*86d7f5d3SJohn Marinoinstall it simply copy the interbench binary wherever you like. 180*86d7f5d3SJohn Marino 181*86d7f5d3SJohn MarinoTo get good reproducible data from it you should boot into runlevel one so 182*86d7f5d3SJohn Marinothat nothing else is running on the machine. All power saving (cpu throttling, 183*86d7f5d3SJohn Marinocpu frequency modifications) must be disabled on the first run to get an 184*86d7f5d3SJohn Marinoaccurate measurement for cpu usage. You may enable them later if you are 185*86d7f5d3SJohn Marinobenchmarking their effect on interactivity on that machine. Root is almost 186*86d7f5d3SJohn Marinomandatory for this benchmark, or real time privileges at the very least. You 187*86d7f5d3SJohn Marinoneed free disk space in the directory it is being run in the order of 2* your 188*86d7f5d3SJohn Marinophysical ram for the disk loads. A default run in v0.21 takes about 15 189*86d7f5d3SJohn Marinominutes to complete, longer if your disk is slow. 190*86d7f5d3SJohn Marino 191*86d7f5d3SJohn MarinoAs the benchmark bases the work it does on the speed of the hardware the 192*86d7f5d3SJohn Marinoresults from different hardware can not be directly compared. However changes 193*86d7f5d3SJohn Marinoof kernels, filesystem and options can be compared. To do a comparison of 194*86d7f5d3SJohn Marinodifferent cpus and keep the workload constant, using the -l option and 195*86d7f5d3SJohn Marinopassing the value of "loops_per_ms" from the first hardware tested will keep 196*86d7f5d3SJohn Marinothe number of cpu cycles fairly constant allowing some comparison. Future 197*86d7f5d3SJohn Marinoversions may add the option of setting the amount of disk throughput etc. 198*86d7f5d3SJohn Marino 199*86d7f5d3SJohn Marino 200*86d7f5d3SJohn MarinoCommand line options supported: 201*86d7f5d3SJohn Marinointerbench [-l <int>] [-L <int>] [-t <int] [-B <int>] [-N <int>] 202*86d7f5d3SJohn Marino [-b] [-c] [-r] [-C <int> -I <int>] [-m <comment>] 203*86d7f5d3SJohn Marino [-w <load type>] [-x <load type>] [-W <bench>] [-X <bench>] 204*86d7f5d3SJohn Marino [-h] 205*86d7f5d3SJohn Marino 206*86d7f5d3SJohn Marino -l Use <int> loops per sec (default: use saved benchmark) 207*86d7f5d3SJohn Marino -L Use cpu load of <int> with burn load (default: 4) 208*86d7f5d3SJohn Marino -t Seconds to run each benchmark (default: 30) 209*86d7f5d3SJohn Marino -B Nice the benchmarked thread to <int> (default: 0) 210*86d7f5d3SJohn Marino -N Nice the load thread to <int> (default: 0) 211*86d7f5d3SJohn Marino -b Benchmark loops_per_ms even if it is already known 212*86d7f5d3SJohn Marino -c Output to console only (default: use console and logfile) 213*86d7f5d3SJohn Marino -r Perform real time scheduling benchmarks (default: non-rt) 214*86d7f5d3SJohn Marino -C Use <int> percentage cpu as a custom load (default: no custom load) 215*86d7f5d3SJohn Marino -I Use <int> microsecond intervals for custom load (needs -C as well) 216*86d7f5d3SJohn Marino -m Add <comment> to the log file as a separate line 217*86d7f5d3SJohn Marino -w Add <load type> to the list of loads to be tested against 218*86d7f5d3SJohn Marino -x Exclude <load type> from the list of loads to be tested against 219*86d7f5d3SJohn Marino -W Add <bench> to the list of benchmarks to be tested 220*86d7f5d3SJohn Marino -X Exclude <bench> from the list of benchmarks to be tested 221*86d7f5d3SJohn Marino -h Show help 222*86d7f5d3SJohn Marino 223*86d7f5d3SJohn MarinoThere is one hidden option which is not supported by default, -u 224*86d7f5d3SJohn Marinowhich emulates a uniprocessor when run on an smp machine. The support for cpu 225*86d7f5d3SJohn Marinoaffinity is not built in by default because there are multiple versions of 226*86d7f5d3SJohn Marinothe sched_setaffinity call in glibc that not only accept different variable 227*86d7f5d3SJohn Marinotypes but across architectures take different numbers of arguments. For x86 228*86d7f5d3SJohn Marinosupport you can change the '#if 0' in interbench.c to '#if 1' to enable the 229*86d7f5d3SJohn Marinoaffinity support to be built in. The function on x86_64 for those very keen 230*86d7f5d3SJohn Marinodoes not have the sizeof argument. 231*86d7f5d3SJohn Marino 232*86d7f5d3SJohn Marino 233*86d7f5d3SJohn MarinoThanks: 234*86d7f5d3SJohn MarinoFor help from Zwane Mwaikambo, Bert Hubert, Seth Arnold, Rik Van Riel, 235*86d7f5d3SJohn MarinoNicholas Miell, John Levon, Miguel Freitas and Peter Williams. 236*86d7f5d3SJohn MarinoAggelos Economopoulos for contest code, Bob Matthews for irman (mem_load) 237*86d7f5d3SJohn Marinocode, Rusty Russell for hackbench code and Julien Valroff for manpage. 238*86d7f5d3SJohn Marino 239*86d7f5d3SJohn MarinoSat Mar 4 12:11:34 2006 240*86d7f5d3SJohn MarinoCon Kolivas < kernel at kolivas dot org > 241