186d7f5d3SJohn Marino Interbench - The Linux Interactivity Benchmark 286d7f5d3SJohn Marino 386d7f5d3SJohn Marino 486d7f5d3SJohn Marino Introduction 586d7f5d3SJohn Marino 686d7f5d3SJohn MarinoThis benchmark application is designed to benchmark interactivity in Linux. See 786d7f5d3SJohn Marinothe file readme.interactivity for a brief definition. 886d7f5d3SJohn Marino 986d7f5d3SJohn MarinoIt is designed to measure the effect of changes in Linux kernel design or system 1086d7f5d3SJohn Marinoconfiguration changes such as cpu, I/O scheduler and filesystem changes and 1186d7f5d3SJohn Marinooptions. With careful benchmarking, different hardware can be compared. 1286d7f5d3SJohn Marino 1386d7f5d3SJohn Marino 1486d7f5d3SJohn Marino What does it do? 1586d7f5d3SJohn Marino 1686d7f5d3SJohn MarinoIt is designed to emulate the cpu scheduling behaviour of interactive tasks and 1786d7f5d3SJohn Marinomeasure their scheduling latency and jitter. It does this with the tasks on 1886d7f5d3SJohn Marinotheir own and then in the presence of various background loads, both with 1986d7f5d3SJohn Marinoconfigurable nice levels and the benchmarked tasks can be real time. 2086d7f5d3SJohn Marino 2186d7f5d3SJohn Marino 2286d7f5d3SJohn Marino How does it work? 2386d7f5d3SJohn Marino 2486d7f5d3SJohn MarinoFirst it benchmarks how best to reproduce a fixed percentage of cpu usage on the 2586d7f5d3SJohn Marinomachine currently being used for the benchmark. It saves this to a file and then 2686d7f5d3SJohn Marinouses this for all subsequent runs to keep the emulation of cpu usage constant. 2786d7f5d3SJohn Marino 2886d7f5d3SJohn MarinoIt runs a real time high priority timing thread that wakes up the thread or 2986d7f5d3SJohn Marinothreads of the simulated interactive tasks and then measures the latency in the 3086d7f5d3SJohn Marinotime taken to schedule. As there is no accurate timer driven scheduling in linux 3186d7f5d3SJohn Marinothe timing thread sleeps as accurately as linux kernel supports, and latency is 3286d7f5d3SJohn Marinoconsidered as the time from this sleep till the simulated task gets scheduled. 3386d7f5d3SJohn Marino 3486d7f5d3SJohn MarinoEach benchmarked simulation runs as a separate process with its own threads, 3586d7f5d3SJohn Marinoand the background load (if any) also runs as a separate process. 3686d7f5d3SJohn Marino 3786d7f5d3SJohn Marino 3886d7f5d3SJohn Marino What interactive tasks are simulated and how? 3986d7f5d3SJohn Marino 4086d7f5d3SJohn MarinoX: 4186d7f5d3SJohn MarinoX is simulated as a thread that uses a variable amount of cpu ranging from 0 to 4286d7f5d3SJohn Marino100%. This simulates an idle gui where a window is grabbed and then dragged 4386d7f5d3SJohn Marinoacross the screen. 4486d7f5d3SJohn Marino 4586d7f5d3SJohn MarinoAudio: 4686d7f5d3SJohn MarinoAudio is simulated as a thread that tries to run at 50ms intervals that then 4786d7f5d3SJohn Marinorequires 5% cpu. This behaviour ignores any caching that would normally be done 4886d7f5d3SJohn Marinoby well designed audio applications, but has been seen as the interval used to 4986d7f5d3SJohn Marinowrite to audio cards by a popular linux audio player. It also ignores any of the 5086d7f5d3SJohn Marinoeffects of different audio drivers and audio cards. Audio is also benchmarked 5186d7f5d3SJohn Marinorunning SCHED_FIFO if the real time benchmarking option is used. 5286d7f5d3SJohn Marino 5386d7f5d3SJohn MarinoVideo: 5486d7f5d3SJohn MarinoVideo is simulated as a thread that tries to receive cpu 60 times per second 5586d7f5d3SJohn Marinoand uses 40% cpu. This would be quite a demanding video playback at 60fps. Like 5686d7f5d3SJohn Marinothe audio simulator it ignores caching, drivers and video cards. As per audio, 5786d7f5d3SJohn Marinovideo is benchmarked with the real time option. 5886d7f5d3SJohn Marino 5986d7f5d3SJohn MarinoGaming: 6086d7f5d3SJohn MarinoThe cpu usage behind gaming is not at all interactive, yet games clearly are 6186d7f5d3SJohn Marinointended for interactive usage. This load simply uses as much cpu as it can 6286d7f5d3SJohn Marinoget. It does not return deadlines met as there are no deadlines with an 6386d7f5d3SJohn Marinounlocked frame rate in a game. This does not accurately emulate a 3d game 6486d7f5d3SJohn Marinowhich is gpu bound (limited purely by the graphics card), only a cpu bound 6586d7f5d3SJohn Marinoone. 6686d7f5d3SJohn Marino 6786d7f5d3SJohn MarinoCustom: 6886d7f5d3SJohn MarinoThis load will allow you to specify your own combination of cpu percentage and 6986d7f5d3SJohn Marinointervals if you have a specific workload you are interested in and know the 7086d7f5d3SJohn Marinocpu usage and frame rate of it on the hardware you are testing. 7186d7f5d3SJohn Marino 7286d7f5d3SJohn Marino 7386d7f5d3SJohn Marino What loads are simulated? 7486d7f5d3SJohn Marino 7586d7f5d3SJohn MarinoNone: 7686d7f5d3SJohn MarinoOtherwise idle system. 7786d7f5d3SJohn Marino 7886d7f5d3SJohn MarinoVideo: 7986d7f5d3SJohn MarinoThe video simulation thread is also used as a background load. 8086d7f5d3SJohn Marino 8186d7f5d3SJohn MarinoX: 8286d7f5d3SJohn MarinoThe X simulation thread is used as a load. 8386d7f5d3SJohn Marino 8486d7f5d3SJohn MarinoBurn: 8586d7f5d3SJohn MarinoA configurable number of threads fully cpu bound (4 by default). 8686d7f5d3SJohn Marino 8786d7f5d3SJohn MarinoWrite: 8886d7f5d3SJohn MarinoA streaming write to disk repeatedly of a file the size of physical ram. 8986d7f5d3SJohn Marino 9086d7f5d3SJohn MarinoRead: 9186d7f5d3SJohn MarinoRepeatedly reading a file from disk the size of physical ram (to avoid any 9286d7f5d3SJohn Marinocaching effects). 9386d7f5d3SJohn Marino 9486d7f5d3SJohn MarinoCompile: 9586d7f5d3SJohn MarinoSimulating a heavy 'make -j4' compilation by running Burn, Write and Read 9686d7f5d3SJohn Marinoconcurrently. 9786d7f5d3SJohn Marino 9886d7f5d3SJohn MarinoMemload: 9986d7f5d3SJohn MarinoSimulating heavy memory and swap pressure by repeatedly accessing 110% of 10086d7f5d3SJohn Marinoavailable ram and moving it around and freeing it. You need to have some 10186d7f5d3SJohn Marinoswap enabled due to the nature of this load, and if it detects no swap this 10286d7f5d3SJohn Marinoload is disabled. 10386d7f5d3SJohn Marino 10486d7f5d3SJohn MarinoHack: 10586d7f5d3SJohn MarinoThis repeatedly runs the benchmarking program "hackbench" as 'hackbench 50'. 10686d7f5d3SJohn MarinoThis is suggested as a real time load only but because of how extreme this 10786d7f5d3SJohn Marinoload is it is not unusual for an out-of-memory kill to occur which will 10886d7f5d3SJohn Marinoinvalidate any data you get. For this reason it is disabled by default. 10986d7f5d3SJohn Marino 11086d7f5d3SJohn MarinoCustom: 11186d7f5d3SJohn MarinoThe custom simulation is used as a load. 11286d7f5d3SJohn Marino 11386d7f5d3SJohn Marino 11486d7f5d3SJohn Marino What is measured and what does it mean? 11586d7f5d3SJohn Marino 11686d7f5d3SJohn Marino1. The average scheduling latency (time to requesting cpu till actually getting 11786d7f5d3SJohn Marinoit) of deadlines met during the test period. 11886d7f5d3SJohn Marino2. The scheduling jitter is represented by calculating the standard deviation 11986d7f5d3SJohn Marinoof the latency 12086d7f5d3SJohn Marino3. The maximum latency seen during the test period 12186d7f5d3SJohn Marino4. Percentage of desired cpu 12286d7f5d3SJohn Marino5. Percentage of deadlines met. 12386d7f5d3SJohn Marino 12486d7f5d3SJohn MarinoThis data is output to console and saved to a file which is stamped with the 12586d7f5d3SJohn Marinokernel name and date. See sample.log. 12686d7f5d3SJohn Marino 12786d7f5d3SJohn Marino Sample: 12886d7f5d3SJohn Marino--- Benchmarking simulated cpu of X in the presence of simulated --- 12986d7f5d3SJohn MarinoLoad Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met 13086d7f5d3SJohn MarinoNone 0.495 +/- 0.495 45 100 96 13186d7f5d3SJohn MarinoVideo 11.7 +/- 11.7 1815 89.6 62.7 13286d7f5d3SJohn MarinoBurn 27.9 +/- 28.1 3335 78.5 44 13386d7f5d3SJohn MarinoWrite 4.02 +/- 4.03 372 97 78.7 13486d7f5d3SJohn MarinoRead 1.09 +/- 1.09 158 99.7 88 13586d7f5d3SJohn MarinoCompile 28.8 +/- 28.8 3351 78.2 43.7 13686d7f5d3SJohn MarinoMemload 2.81 +/- 2.81 187 98.7 85 13786d7f5d3SJohn Marino 13886d7f5d3SJohn MarinoWhat can be seen here is that never during this test run were all the so called 13986d7f5d3SJohn Marinodeadlines met by the X simulator, although all the desired cpu was achieved 14086d7f5d3SJohn Marinounder no load. In X terms this means that every bit of window movement was 14186d7f5d3SJohn Marinodrawn while moving the window, but some were delayed and there was enough time 14286d7f5d3SJohn Marinoto catch up before the next deadline. In the 'Burn' column we can see that only 14386d7f5d3SJohn Marino44% of the deadlines were met, and only 78.5% of the desired cpu was achieved. 14486d7f5d3SJohn MarinoThis means that some deadlines were so late (%deadlines met was low) that some 14586d7f5d3SJohn Marinoredraws were dropped entirely to catch up. In X terms this would translate into 14686d7f5d3SJohn Marinojerky movement, in audio it would be a skip, and in video it would be a dropped 14786d7f5d3SJohn Marinoframe. Note that despite the massive maximum latency of >3seconds, the average 14886d7f5d3SJohn Marinolatency is still less than 30ms. This is because redraws are dropped in order 14986d7f5d3SJohn Marinoto catch up usually by these sorts of applications. 15086d7f5d3SJohn Marino 15186d7f5d3SJohn Marino 15286d7f5d3SJohn Marino What is relevant in the data? 15386d7f5d3SJohn Marino 15486d7f5d3SJohn MarinoThe results pessimise quite a lot what happens in real world terms because they 15586d7f5d3SJohn Marinoignore the reality of buffering, but this allows us to pick up subtle 15686d7f5d3SJohn Marinodifferences more readily. In terms of what would be noticed by the end user, 15786d7f5d3SJohn Marinodropping deadlines would make noticable clicks in audio, subtle visible frame 15886d7f5d3SJohn Marinotime delays in video, and loss of "smooth" movement in X. Dropping desired cpu 15986d7f5d3SJohn Marinowould be much more noticeable with audio skips, missed video frames or jerks 16086d7f5d3SJohn Marinoin window movement under X. The magnitude of these would be best represented by 16186d7f5d3SJohn Marinothe maximum latency. When the deadlines are actually met, the average latency 16286d7f5d3SJohn Marinorepresents how "smooth" it would look. Average humans' limit of perception for 16386d7f5d3SJohn Marinojitter is in the order of 7ms. Trained audio observers might notice much less. 16486d7f5d3SJohn Marino 16586d7f5d3SJohn Marino 16686d7f5d3SJohn Marino How to use it? 16786d7f5d3SJohn Marino 16886d7f5d3SJohn MarinoIn response to critisicm of difficulty in setting up my previous benchmark, 16986d7f5d3SJohn Marinocontest, I've made this as simple as possible. 17086d7f5d3SJohn Marino 17186d7f5d3SJohn Marino Short version: 17286d7f5d3SJohn Marinomake 17386d7f5d3SJohn Marino./interbench 17486d7f5d3SJohn Marino 17586d7f5d3SJohn MarinoPlease read the long version before submitting results! 17686d7f5d3SJohn Marino 17786d7f5d3SJohn Marino Longer version: 17886d7f5d3SJohn MarinoBuild with 'make'. It is a single executable once built so if you desire to 17986d7f5d3SJohn Marinoinstall it simply copy the interbench binary wherever you like. 18086d7f5d3SJohn Marino 18186d7f5d3SJohn MarinoTo get good reproducible data from it you should boot into runlevel one so 18286d7f5d3SJohn Marinothat nothing else is running on the machine. All power saving (cpu throttling, 18386d7f5d3SJohn Marinocpu frequency modifications) must be disabled on the first run to get an 18486d7f5d3SJohn Marinoaccurate measurement for cpu usage. You may enable them later if you are 18586d7f5d3SJohn Marinobenchmarking their effect on interactivity on that machine. Root is almost 18686d7f5d3SJohn Marinomandatory for this benchmark, or real time privileges at the very least. You 18786d7f5d3SJohn Marinoneed free disk space in the directory it is being run in the order of 2* your 18886d7f5d3SJohn Marinophysical ram for the disk loads. A default run in v0.21 takes about 15 18986d7f5d3SJohn Marinominutes to complete, longer if your disk is slow. 19086d7f5d3SJohn Marino 19186d7f5d3SJohn MarinoAs the benchmark bases the work it does on the speed of the hardware the 19286d7f5d3SJohn Marinoresults from different hardware can not be directly compared. However changes 19386d7f5d3SJohn Marinoof kernels, filesystem and options can be compared. To do a comparison of 19486d7f5d3SJohn Marinodifferent cpus and keep the workload constant, using the -l option and 19586d7f5d3SJohn Marinopassing the value of "loops_per_ms" from the first hardware tested will keep 19686d7f5d3SJohn Marinothe number of cpu cycles fairly constant allowing some comparison. Future 19786d7f5d3SJohn Marinoversions may add the option of setting the amount of disk throughput etc. 19886d7f5d3SJohn Marino 19986d7f5d3SJohn Marino 20086d7f5d3SJohn MarinoCommand line options supported: 20186d7f5d3SJohn Marinointerbench [-l <int>] [-L <int>] [-t <int] [-B <int>] [-N <int>] 20286d7f5d3SJohn Marino [-b] [-c] [-r] [-C <int> -I <int>] [-m <comment>] 20386d7f5d3SJohn Marino [-w <load type>] [-x <load type>] [-W <bench>] [-X <bench>] 20486d7f5d3SJohn Marino [-h] 20586d7f5d3SJohn Marino 20686d7f5d3SJohn Marino -l Use <int> loops per sec (default: use saved benchmark) 20786d7f5d3SJohn Marino -L Use cpu load of <int> with burn load (default: 4) 20886d7f5d3SJohn Marino -t Seconds to run each benchmark (default: 30) 20986d7f5d3SJohn Marino -B Nice the benchmarked thread to <int> (default: 0) 21086d7f5d3SJohn Marino -N Nice the load thread to <int> (default: 0) 21186d7f5d3SJohn Marino -b Benchmark loops_per_ms even if it is already known 21286d7f5d3SJohn Marino -c Output to console only (default: use console and logfile) 21386d7f5d3SJohn Marino -r Perform real time scheduling benchmarks (default: non-rt) 21486d7f5d3SJohn Marino -C Use <int> percentage cpu as a custom load (default: no custom load) 21586d7f5d3SJohn Marino -I Use <int> microsecond intervals for custom load (needs -C as well) 21686d7f5d3SJohn Marino -m Add <comment> to the log file as a separate line 21786d7f5d3SJohn Marino -w Add <load type> to the list of loads to be tested against 21886d7f5d3SJohn Marino -x Exclude <load type> from the list of loads to be tested against 21986d7f5d3SJohn Marino -W Add <bench> to the list of benchmarks to be tested 22086d7f5d3SJohn Marino -X Exclude <bench> from the list of benchmarks to be tested 22186d7f5d3SJohn Marino -h Show help 22286d7f5d3SJohn Marino 22386d7f5d3SJohn MarinoThere is one hidden option which is not supported by default, -u 22486d7f5d3SJohn Marinowhich emulates a uniprocessor when run on an smp machine. The support for cpu 22586d7f5d3SJohn Marinoaffinity is not built in by default because there are multiple versions of 22686d7f5d3SJohn Marinothe sched_setaffinity call in glibc that not only accept different variable 22786d7f5d3SJohn Marinotypes but across architectures take different numbers of arguments. For x86 22886d7f5d3SJohn Marinosupport you can change the '#if 0' in interbench.c to '#if 1' to enable the 22986d7f5d3SJohn Marinoaffinity support to be built in. The function on x86_64 for those very keen 23086d7f5d3SJohn Marinodoes not have the sizeof argument. 23186d7f5d3SJohn Marino 23286d7f5d3SJohn Marino 23386d7f5d3SJohn MarinoThanks: 23486d7f5d3SJohn MarinoFor help from Zwane Mwaikambo, Bert Hubert, Seth Arnold, Rik Van Riel, 23586d7f5d3SJohn MarinoNicholas Miell, John Levon, Miguel Freitas and Peter Williams. 23686d7f5d3SJohn MarinoAggelos Economopoulos for contest code, Bob Matthews for irman (mem_load) 23786d7f5d3SJohn Marinocode, Rusty Russell for hackbench code and Julien Valroff for manpage. 23886d7f5d3SJohn Marino 23986d7f5d3SJohn MarinoSat Mar 4 12:11:34 2006 24086d7f5d3SJohn MarinoCon Kolivas < kernel at kolivas dot org > 241