xref: /dflybsd-src/test/interbench/readme (revision 86d7f5d305c6adaa56ff4582ece9859d73106103)
186d7f5d3SJohn Marino	Interbench - The Linux Interactivity Benchmark
286d7f5d3SJohn Marino
386d7f5d3SJohn Marino
486d7f5d3SJohn Marino	Introduction
586d7f5d3SJohn Marino
686d7f5d3SJohn MarinoThis benchmark application is designed to benchmark interactivity in Linux. See
786d7f5d3SJohn Marinothe file readme.interactivity for a brief definition.
886d7f5d3SJohn Marino
986d7f5d3SJohn MarinoIt is designed to measure the effect of changes in Linux kernel design or system
1086d7f5d3SJohn Marinoconfiguration changes such as cpu, I/O scheduler and filesystem changes and
1186d7f5d3SJohn Marinooptions. With careful benchmarking, different hardware can be compared.
1286d7f5d3SJohn Marino
1386d7f5d3SJohn Marino
1486d7f5d3SJohn Marino	What does it do?
1586d7f5d3SJohn Marino
1686d7f5d3SJohn MarinoIt is designed to emulate the cpu scheduling behaviour of interactive tasks and
1786d7f5d3SJohn Marinomeasure their scheduling latency and jitter. It does this with the tasks on
1886d7f5d3SJohn Marinotheir own and then in the presence of various background loads, both with
1986d7f5d3SJohn Marinoconfigurable nice levels and the benchmarked tasks can be real time.
2086d7f5d3SJohn Marino
2186d7f5d3SJohn Marino
2286d7f5d3SJohn Marino	How does it work?
2386d7f5d3SJohn Marino
2486d7f5d3SJohn MarinoFirst it benchmarks how best to reproduce a fixed percentage of cpu usage on the
2586d7f5d3SJohn Marinomachine currently being used for the benchmark. It saves this to a file and then
2686d7f5d3SJohn Marinouses this for all subsequent runs to keep the emulation of cpu usage constant.
2786d7f5d3SJohn Marino
2886d7f5d3SJohn MarinoIt runs a real time high priority timing thread that wakes up the thread or
2986d7f5d3SJohn Marinothreads of the simulated interactive tasks and then measures the latency in the
3086d7f5d3SJohn Marinotime taken to schedule. As there is no accurate timer driven scheduling in linux
3186d7f5d3SJohn Marinothe timing thread sleeps as accurately as linux kernel supports, and latency is
3286d7f5d3SJohn Marinoconsidered as the time from this sleep till the simulated task gets scheduled.
3386d7f5d3SJohn Marino
3486d7f5d3SJohn MarinoEach benchmarked simulation runs as a separate process with its own threads,
3586d7f5d3SJohn Marinoand the background load (if any) also runs as a separate process.
3686d7f5d3SJohn Marino
3786d7f5d3SJohn Marino
3886d7f5d3SJohn Marino	What interactive tasks are simulated and how?
3986d7f5d3SJohn Marino
4086d7f5d3SJohn MarinoX:
4186d7f5d3SJohn MarinoX is simulated as a thread that uses a variable amount of cpu ranging from 0 to
4286d7f5d3SJohn Marino100%. This simulates an idle gui where a window is grabbed and then dragged
4386d7f5d3SJohn Marinoacross the screen.
4486d7f5d3SJohn Marino
4586d7f5d3SJohn MarinoAudio:
4686d7f5d3SJohn MarinoAudio is simulated as a thread that tries to run at 50ms intervals that then
4786d7f5d3SJohn Marinorequires 5% cpu. This behaviour ignores any caching that would normally be done
4886d7f5d3SJohn Marinoby well designed audio applications, but has been seen as the interval used to
4986d7f5d3SJohn Marinowrite to audio cards by a popular linux audio player. It also ignores any of the
5086d7f5d3SJohn Marinoeffects of different audio drivers and audio cards. Audio is also benchmarked
5186d7f5d3SJohn Marinorunning SCHED_FIFO if the real time benchmarking option is used.
5286d7f5d3SJohn Marino
5386d7f5d3SJohn MarinoVideo:
5486d7f5d3SJohn MarinoVideo is simulated as a thread that tries to receive cpu 60 times per second
5586d7f5d3SJohn Marinoand uses 40% cpu. This would be quite a demanding video playback at 60fps. Like
5686d7f5d3SJohn Marinothe audio simulator it ignores caching, drivers and video cards. As per audio,
5786d7f5d3SJohn Marinovideo is benchmarked with the real time option.
5886d7f5d3SJohn Marino
5986d7f5d3SJohn MarinoGaming:
6086d7f5d3SJohn MarinoThe cpu usage behind gaming is not at all interactive, yet games clearly are
6186d7f5d3SJohn Marinointended for interactive usage. This load simply uses as much cpu as it can
6286d7f5d3SJohn Marinoget. It does not return deadlines met as there are no deadlines with an
6386d7f5d3SJohn Marinounlocked frame rate in a game. This does not accurately emulate a 3d game
6486d7f5d3SJohn Marinowhich is gpu bound (limited purely by the graphics card), only a cpu bound
6586d7f5d3SJohn Marinoone.
6686d7f5d3SJohn Marino
6786d7f5d3SJohn MarinoCustom:
6886d7f5d3SJohn MarinoThis load will allow you to specify your own combination of cpu percentage and
6986d7f5d3SJohn Marinointervals if you have a specific workload you are interested in and know the
7086d7f5d3SJohn Marinocpu usage and frame rate of it on the hardware you are testing.
7186d7f5d3SJohn Marino
7286d7f5d3SJohn Marino
7386d7f5d3SJohn Marino	What loads are simulated?
7486d7f5d3SJohn Marino
7586d7f5d3SJohn MarinoNone:
7686d7f5d3SJohn MarinoOtherwise idle system.
7786d7f5d3SJohn Marino
7886d7f5d3SJohn MarinoVideo:
7986d7f5d3SJohn MarinoThe video simulation thread is also used as a background load.
8086d7f5d3SJohn Marino
8186d7f5d3SJohn MarinoX:
8286d7f5d3SJohn MarinoThe X simulation thread is used as a load.
8386d7f5d3SJohn Marino
8486d7f5d3SJohn MarinoBurn:
8586d7f5d3SJohn MarinoA configurable number of threads fully cpu bound (4 by default).
8686d7f5d3SJohn Marino
8786d7f5d3SJohn MarinoWrite:
8886d7f5d3SJohn MarinoA streaming write to disk repeatedly of a file the size of physical ram.
8986d7f5d3SJohn Marino
9086d7f5d3SJohn MarinoRead:
9186d7f5d3SJohn MarinoRepeatedly reading a file from disk the size of physical ram (to avoid any
9286d7f5d3SJohn Marinocaching effects).
9386d7f5d3SJohn Marino
9486d7f5d3SJohn MarinoCompile:
9586d7f5d3SJohn MarinoSimulating a heavy 'make -j4' compilation by running Burn, Write and Read
9686d7f5d3SJohn Marinoconcurrently.
9786d7f5d3SJohn Marino
9886d7f5d3SJohn MarinoMemload:
9986d7f5d3SJohn MarinoSimulating heavy memory and swap pressure by repeatedly accessing 110% of
10086d7f5d3SJohn Marinoavailable ram and moving it around and freeing it. You need to have some
10186d7f5d3SJohn Marinoswap enabled due to the nature of this load, and if it detects no swap this
10286d7f5d3SJohn Marinoload is disabled.
10386d7f5d3SJohn Marino
10486d7f5d3SJohn MarinoHack:
10586d7f5d3SJohn MarinoThis repeatedly runs the benchmarking program "hackbench" as 'hackbench 50'.
10686d7f5d3SJohn MarinoThis is suggested as a real time load only but because of how extreme this
10786d7f5d3SJohn Marinoload is it is not unusual for an out-of-memory kill to occur which will
10886d7f5d3SJohn Marinoinvalidate any data you get. For this reason it is disabled by default.
10986d7f5d3SJohn Marino
11086d7f5d3SJohn MarinoCustom:
11186d7f5d3SJohn MarinoThe custom simulation is used as a load.
11286d7f5d3SJohn Marino
11386d7f5d3SJohn Marino
11486d7f5d3SJohn Marino	What is measured and what does it mean?
11586d7f5d3SJohn Marino
11686d7f5d3SJohn Marino1. The average scheduling latency (time to requesting cpu till actually getting
11786d7f5d3SJohn Marinoit) of deadlines met during the test period.
11886d7f5d3SJohn Marino2. The scheduling jitter is represented by calculating the standard deviation
11986d7f5d3SJohn Marinoof the latency
12086d7f5d3SJohn Marino3. The maximum latency seen during the test period
12186d7f5d3SJohn Marino4. Percentage of desired cpu
12286d7f5d3SJohn Marino5. Percentage of deadlines met.
12386d7f5d3SJohn Marino
12486d7f5d3SJohn MarinoThis data is output to console and saved to a file which is stamped with the
12586d7f5d3SJohn Marinokernel name and date. See sample.log.
12686d7f5d3SJohn Marino
12786d7f5d3SJohn Marino	Sample:
12886d7f5d3SJohn Marino--- Benchmarking simulated cpu of X in the presence of simulated ---
12986d7f5d3SJohn MarinoLoad	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
13086d7f5d3SJohn MarinoNone	  0.495 +/- 0.495         45		 100	         96
13186d7f5d3SJohn MarinoVideo	   11.7 +/- 11.7        1815		89.6	       62.7
13286d7f5d3SJohn MarinoBurn	   27.9 +/- 28.1        3335		78.5	         44
13386d7f5d3SJohn MarinoWrite	   4.02 +/- 4.03         372		  97	       78.7
13486d7f5d3SJohn MarinoRead	   1.09 +/- 1.09         158		99.7	         88
13586d7f5d3SJohn MarinoCompile	   28.8 +/- 28.8        3351		78.2	       43.7
13686d7f5d3SJohn MarinoMemload	   2.81 +/- 2.81         187		98.7	         85
13786d7f5d3SJohn Marino
13886d7f5d3SJohn MarinoWhat can be seen here is that never during this test run were all the so called
13986d7f5d3SJohn Marinodeadlines met by the X simulator, although all the desired cpu was achieved
14086d7f5d3SJohn Marinounder no load. In X terms this means that every bit of window movement was
14186d7f5d3SJohn Marinodrawn while moving the window, but some were delayed and there was enough time
14286d7f5d3SJohn Marinoto catch up before the next deadline. In the 'Burn' column we can see that only
14386d7f5d3SJohn Marino44% of the deadlines were met, and only 78.5% of the desired cpu was achieved.
14486d7f5d3SJohn MarinoThis means that some deadlines were so late (%deadlines met was low) that some
14586d7f5d3SJohn Marinoredraws were dropped entirely to catch up. In X terms this would translate into
14686d7f5d3SJohn Marinojerky movement, in audio it would be a skip, and in video it would be a dropped
14786d7f5d3SJohn Marinoframe. Note that despite the massive maximum latency of >3seconds, the average
14886d7f5d3SJohn Marinolatency is still less than 30ms. This is because redraws are dropped in order
14986d7f5d3SJohn Marinoto catch up usually by these sorts of applications.
15086d7f5d3SJohn Marino
15186d7f5d3SJohn Marino
15286d7f5d3SJohn Marino	What is relevant in the data?
15386d7f5d3SJohn Marino
15486d7f5d3SJohn MarinoThe results pessimise quite a lot what happens in real world terms because they
15586d7f5d3SJohn Marinoignore the reality of buffering, but this allows us to pick up subtle
15686d7f5d3SJohn Marinodifferences more readily. In terms of what would be noticed by the end user,
15786d7f5d3SJohn Marinodropping deadlines would make noticable clicks in audio, subtle visible frame
15886d7f5d3SJohn Marinotime delays in video, and loss of "smooth" movement in X. Dropping desired cpu
15986d7f5d3SJohn Marinowould be much more noticeable with audio skips, missed video frames or jerks
16086d7f5d3SJohn Marinoin window movement under X. The magnitude of these would be best represented by
16186d7f5d3SJohn Marinothe maximum latency. When the deadlines are actually met, the average latency
16286d7f5d3SJohn Marinorepresents how "smooth" it would look. Average humans' limit of perception for
16386d7f5d3SJohn Marinojitter is in the order of 7ms. Trained audio observers might notice much less.
16486d7f5d3SJohn Marino
16586d7f5d3SJohn Marino
16686d7f5d3SJohn Marino	How to use it?
16786d7f5d3SJohn Marino
16886d7f5d3SJohn MarinoIn response to critisicm of difficulty in setting up my previous benchmark,
16986d7f5d3SJohn Marinocontest, I've made this as simple as possible.
17086d7f5d3SJohn Marino
17186d7f5d3SJohn Marino	Short version:
17286d7f5d3SJohn Marinomake
17386d7f5d3SJohn Marino./interbench
17486d7f5d3SJohn Marino
17586d7f5d3SJohn MarinoPlease read the long version before submitting results!
17686d7f5d3SJohn Marino
17786d7f5d3SJohn Marino	Longer version:
17886d7f5d3SJohn MarinoBuild with 'make'. It is a single executable once built so if you desire to
17986d7f5d3SJohn Marinoinstall it simply copy the interbench binary wherever you like.
18086d7f5d3SJohn Marino
18186d7f5d3SJohn MarinoTo get good reproducible data from it you should boot into runlevel one so
18286d7f5d3SJohn Marinothat nothing else is running on the machine. All power saving (cpu throttling,
18386d7f5d3SJohn Marinocpu frequency modifications) must be disabled on the first run to get an
18486d7f5d3SJohn Marinoaccurate measurement for cpu usage. You may enable them later if you are
18586d7f5d3SJohn Marinobenchmarking their effect on interactivity on that machine. Root is almost
18686d7f5d3SJohn Marinomandatory for this benchmark, or real time privileges at the very least. You
18786d7f5d3SJohn Marinoneed free disk space in the directory it is being run in the order of 2* your
18886d7f5d3SJohn Marinophysical ram for the disk loads. A default run in v0.21 takes about 15
18986d7f5d3SJohn Marinominutes to complete, longer if your disk is slow.
19086d7f5d3SJohn Marino
19186d7f5d3SJohn MarinoAs the benchmark bases the work it does on the speed of the hardware the
19286d7f5d3SJohn Marinoresults from different hardware can not be directly compared. However changes
19386d7f5d3SJohn Marinoof kernels, filesystem and options can be compared. To do a comparison of
19486d7f5d3SJohn Marinodifferent cpus and keep the workload constant, using the -l option and
19586d7f5d3SJohn Marinopassing the value of "loops_per_ms" from the first hardware tested will keep
19686d7f5d3SJohn Marinothe number of cpu cycles fairly constant allowing some comparison. Future
19786d7f5d3SJohn Marinoversions may add the option of setting the amount of disk throughput etc.
19886d7f5d3SJohn Marino
19986d7f5d3SJohn Marino
20086d7f5d3SJohn MarinoCommand line options supported:
20186d7f5d3SJohn Marinointerbench [-l <int>] [-L <int>] [-t <int] [-B <int>] [-N <int>]
20286d7f5d3SJohn Marino        [-b] [-c] [-r] [-C <int> -I <int>] [-m <comment>]
20386d7f5d3SJohn Marino        [-w <load type>] [-x <load type>] [-W <bench>] [-X <bench>]
20486d7f5d3SJohn Marino        [-h]
20586d7f5d3SJohn Marino
20686d7f5d3SJohn Marino -l     Use <int> loops per sec (default: use saved benchmark)
20786d7f5d3SJohn Marino -L     Use cpu load of <int> with burn load (default: 4)
20886d7f5d3SJohn Marino -t     Seconds to run each benchmark (default: 30)
20986d7f5d3SJohn Marino -B     Nice the benchmarked thread to <int> (default: 0)
21086d7f5d3SJohn Marino -N     Nice the load thread to <int> (default: 0)
21186d7f5d3SJohn Marino -b     Benchmark loops_per_ms even if it is already known
21286d7f5d3SJohn Marino -c     Output to console only (default: use console and logfile)
21386d7f5d3SJohn Marino -r     Perform real time scheduling benchmarks (default: non-rt)
21486d7f5d3SJohn Marino -C     Use <int> percentage cpu as a custom load (default: no custom load)
21586d7f5d3SJohn Marino -I     Use <int> microsecond intervals for custom load (needs -C as well)
21686d7f5d3SJohn Marino -m     Add <comment> to the log file as a separate line
21786d7f5d3SJohn Marino -w     Add <load type> to the list of loads to be tested against
21886d7f5d3SJohn Marino -x     Exclude <load type> from the list of loads to be tested against
21986d7f5d3SJohn Marino -W     Add <bench> to the list of benchmarks to be tested
22086d7f5d3SJohn Marino -X     Exclude <bench> from the list of benchmarks to be tested
22186d7f5d3SJohn Marino -h     Show help
22286d7f5d3SJohn Marino
22386d7f5d3SJohn MarinoThere is one hidden option which is not supported by default, -u
22486d7f5d3SJohn Marinowhich emulates a uniprocessor when run on an smp machine. The support for cpu
22586d7f5d3SJohn Marinoaffinity is not built in by default because there are multiple versions of
22686d7f5d3SJohn Marinothe sched_setaffinity call in glibc that not only accept different variable
22786d7f5d3SJohn Marinotypes but across architectures take different numbers of arguments. For x86
22886d7f5d3SJohn Marinosupport you can change the '#if 0' in interbench.c to '#if 1' to enable the
22986d7f5d3SJohn Marinoaffinity support to be built in. The function on x86_64 for those very keen
23086d7f5d3SJohn Marinodoes not have the sizeof argument.
23186d7f5d3SJohn Marino
23286d7f5d3SJohn Marino
23386d7f5d3SJohn MarinoThanks:
23486d7f5d3SJohn MarinoFor help from Zwane Mwaikambo, Bert Hubert, Seth Arnold, Rik Van Riel,
23586d7f5d3SJohn MarinoNicholas Miell, John Levon, Miguel Freitas and Peter Williams.
23686d7f5d3SJohn MarinoAggelos Economopoulos for contest code, Bob Matthews for irman (mem_load)
23786d7f5d3SJohn Marinocode, Rusty Russell for hackbench code and Julien Valroff for manpage.
23886d7f5d3SJohn Marino
23986d7f5d3SJohn MarinoSat Mar 4 12:11:34 2006
24086d7f5d3SJohn MarinoCon Kolivas < kernel at kolivas dot org >
241