readme (revision 86d7f5d305c6adaa56ff4582ece9859d73106103) - OpenGrok cross reference for /dflybsd-src/test/interbench/readme

*86d7f5d3SJohn Marino	Interbench - The Linux Interactivity Benchmark
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	Introduction
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoThis benchmark application is designed to benchmark interactivity in Linux. See
*86d7f5d3SJohn Marinothe file readme.interactivity for a brief definition.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoIt is designed to measure the effect of changes in Linux kernel design or system
*86d7f5d3SJohn Marinoconfiguration changes such as cpu, I/O scheduler and filesystem changes and
*86d7f5d3SJohn Marinooptions. With careful benchmarking, different hardware can be compared.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	What does it do?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoIt is designed to emulate the cpu scheduling behaviour of interactive tasks and
*86d7f5d3SJohn Marinomeasure their scheduling latency and jitter. It does this with the tasks on
*86d7f5d3SJohn Marinotheir own and then in the presence of various background loads, both with
*86d7f5d3SJohn Marinoconfigurable nice levels and the benchmarked tasks can be real time.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	How does it work?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoFirst it benchmarks how best to reproduce a fixed percentage of cpu usage on the
*86d7f5d3SJohn Marinomachine currently being used for the benchmark. It saves this to a file and then
*86d7f5d3SJohn Marinouses this for all subsequent runs to keep the emulation of cpu usage constant.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoIt runs a real time high priority timing thread that wakes up the thread or
*86d7f5d3SJohn Marinothreads of the simulated interactive tasks and then measures the latency in the
*86d7f5d3SJohn Marinotime taken to schedule. As there is no accurate timer driven scheduling in linux
*86d7f5d3SJohn Marinothe timing thread sleeps as accurately as linux kernel supports, and latency is
*86d7f5d3SJohn Marinoconsidered as the time from this sleep till the simulated task gets scheduled.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoEach benchmarked simulation runs as a separate process with its own threads,
*86d7f5d3SJohn Marinoand the background load (if any) also runs as a separate process.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	What interactive tasks are simulated and how?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoX:
*86d7f5d3SJohn MarinoX is simulated as a thread that uses a variable amount of cpu ranging from 0 to
*86d7f5d3SJohn Marino100%. This simulates an idle gui where a window is grabbed and then dragged
*86d7f5d3SJohn Marinoacross the screen.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoAudio:
*86d7f5d3SJohn MarinoAudio is simulated as a thread that tries to run at 50ms intervals that then
*86d7f5d3SJohn Marinorequires 5% cpu. This behaviour ignores any caching that would normally be done
*86d7f5d3SJohn Marinoby well designed audio applications, but has been seen as the interval used to
*86d7f5d3SJohn Marinowrite to audio cards by a popular linux audio player. It also ignores any of the
*86d7f5d3SJohn Marinoeffects of different audio drivers and audio cards. Audio is also benchmarked
*86d7f5d3SJohn Marinorunning SCHED_FIFO if the real time benchmarking option is used.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoVideo:
*86d7f5d3SJohn MarinoVideo is simulated as a thread that tries to receive cpu 60 times per second
*86d7f5d3SJohn Marinoand uses 40% cpu. This would be quite a demanding video playback at 60fps. Like
*86d7f5d3SJohn Marinothe audio simulator it ignores caching, drivers and video cards. As per audio,
*86d7f5d3SJohn Marinovideo is benchmarked with the real time option.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoGaming:
*86d7f5d3SJohn MarinoThe cpu usage behind gaming is not at all interactive, yet games clearly are
*86d7f5d3SJohn Marinointended for interactive usage. This load simply uses as much cpu as it can
*86d7f5d3SJohn Marinoget. It does not return deadlines met as there are no deadlines with an
*86d7f5d3SJohn Marinounlocked frame rate in a game. This does not accurately emulate a 3d game
*86d7f5d3SJohn Marinowhich is gpu bound (limited purely by the graphics card), only a cpu bound
*86d7f5d3SJohn Marinoone.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoCustom:
*86d7f5d3SJohn MarinoThis load will allow you to specify your own combination of cpu percentage and
*86d7f5d3SJohn Marinointervals if you have a specific workload you are interested in and know the
*86d7f5d3SJohn Marinocpu usage and frame rate of it on the hardware you are testing.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	What loads are simulated?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoNone:
*86d7f5d3SJohn MarinoOtherwise idle system.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoVideo:
*86d7f5d3SJohn MarinoThe video simulation thread is also used as a background load.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoX:
*86d7f5d3SJohn MarinoThe X simulation thread is used as a load.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoBurn:
*86d7f5d3SJohn MarinoA configurable number of threads fully cpu bound (4 by default).
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoWrite:
*86d7f5d3SJohn MarinoA streaming write to disk repeatedly of a file the size of physical ram.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoRead:
*86d7f5d3SJohn MarinoRepeatedly reading a file from disk the size of physical ram (to avoid any
*86d7f5d3SJohn Marinocaching effects).
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoCompile:
*86d7f5d3SJohn MarinoSimulating a heavy 'make -j4' compilation by running Burn, Write and Read
*86d7f5d3SJohn Marinoconcurrently.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoMemload:
*86d7f5d3SJohn MarinoSimulating heavy memory and swap pressure by repeatedly accessing 110% of
*86d7f5d3SJohn Marinoavailable ram and moving it around and freeing it. You need to have some
*86d7f5d3SJohn Marinoswap enabled due to the nature of this load, and if it detects no swap this
*86d7f5d3SJohn Marinoload is disabled.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoHack:
*86d7f5d3SJohn MarinoThis repeatedly runs the benchmarking program "hackbench" as 'hackbench 50'.
*86d7f5d3SJohn MarinoThis is suggested as a real time load only but because of how extreme this
*86d7f5d3SJohn Marinoload is it is not unusual for an out-of-memory kill to occur which will
*86d7f5d3SJohn Marinoinvalidate any data you get. For this reason it is disabled by default.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoCustom:
*86d7f5d3SJohn MarinoThe custom simulation is used as a load.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	What is measured and what does it mean?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino1. The average scheduling latency (time to requesting cpu till actually getting
*86d7f5d3SJohn Marinoit) of deadlines met during the test period.
*86d7f5d3SJohn Marino2. The scheduling jitter is represented by calculating the standard deviation
*86d7f5d3SJohn Marinoof the latency
*86d7f5d3SJohn Marino3. The maximum latency seen during the test period
*86d7f5d3SJohn Marino4. Percentage of desired cpu
*86d7f5d3SJohn Marino5. Percentage of deadlines met.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoThis data is output to console and saved to a file which is stamped with the
*86d7f5d3SJohn Marinokernel name and date. See sample.log.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	Sample:
*86d7f5d3SJohn Marino--- Benchmarking simulated cpu of X in the presence of simulated ---
*86d7f5d3SJohn MarinoLoad	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
*86d7f5d3SJohn MarinoNone	  0.495 +/- 0.495         45		 100	         96
*86d7f5d3SJohn MarinoVideo	   11.7 +/- 11.7        1815		89.6	       62.7
*86d7f5d3SJohn MarinoBurn	   27.9 +/- 28.1        3335		78.5	         44
*86d7f5d3SJohn MarinoWrite	   4.02 +/- 4.03         372		  97	       78.7
*86d7f5d3SJohn MarinoRead	   1.09 +/- 1.09         158		99.7	         88
*86d7f5d3SJohn MarinoCompile	   28.8 +/- 28.8        3351		78.2	       43.7
*86d7f5d3SJohn MarinoMemload	   2.81 +/- 2.81         187		98.7	         85
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoWhat can be seen here is that never during this test run were all the so called
*86d7f5d3SJohn Marinodeadlines met by the X simulator, although all the desired cpu was achieved
*86d7f5d3SJohn Marinounder no load. In X terms this means that every bit of window movement was
*86d7f5d3SJohn Marinodrawn while moving the window, but some were delayed and there was enough time
*86d7f5d3SJohn Marinoto catch up before the next deadline. In the 'Burn' column we can see that only
*86d7f5d3SJohn Marino44% of the deadlines were met, and only 78.5% of the desired cpu was achieved.
*86d7f5d3SJohn MarinoThis means that some deadlines were so late (%deadlines met was low) that some
*86d7f5d3SJohn Marinoredraws were dropped entirely to catch up. In X terms this would translate into
*86d7f5d3SJohn Marinojerky movement, in audio it would be a skip, and in video it would be a dropped
*86d7f5d3SJohn Marinoframe. Note that despite the massive maximum latency of >3seconds, the average
*86d7f5d3SJohn Marinolatency is still less than 30ms. This is because redraws are dropped in order
*86d7f5d3SJohn Marinoto catch up usually by these sorts of applications.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	What is relevant in the data?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoThe results pessimise quite a lot what happens in real world terms because they
*86d7f5d3SJohn Marinoignore the reality of buffering, but this allows us to pick up subtle
*86d7f5d3SJohn Marinodifferences more readily. In terms of what would be noticed by the end user,
*86d7f5d3SJohn Marinodropping deadlines would make noticable clicks in audio, subtle visible frame
*86d7f5d3SJohn Marinotime delays in video, and loss of "smooth" movement in X. Dropping desired cpu
*86d7f5d3SJohn Marinowould be much more noticeable with audio skips, missed video frames or jerks
*86d7f5d3SJohn Marinoin window movement under X. The magnitude of these would be best represented by
*86d7f5d3SJohn Marinothe maximum latency. When the deadlines are actually met, the average latency
*86d7f5d3SJohn Marinorepresents how "smooth" it would look. Average humans' limit of perception for
*86d7f5d3SJohn Marinojitter is in the order of 7ms. Trained audio observers might notice much less.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	How to use it?
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoIn response to critisicm of difficulty in setting up my previous benchmark,
*86d7f5d3SJohn Marinocontest, I've made this as simple as possible.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	Short version:
*86d7f5d3SJohn Marinomake
*86d7f5d3SJohn Marino./interbench
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoPlease read the long version before submitting results!
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino	Longer version:
*86d7f5d3SJohn MarinoBuild with 'make'. It is a single executable once built so if you desire to
*86d7f5d3SJohn Marinoinstall it simply copy the interbench binary wherever you like.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoTo get good reproducible data from it you should boot into runlevel one so
*86d7f5d3SJohn Marinothat nothing else is running on the machine. All power saving (cpu throttling,
*86d7f5d3SJohn Marinocpu frequency modifications) must be disabled on the first run to get an
*86d7f5d3SJohn Marinoaccurate measurement for cpu usage. You may enable them later if you are
*86d7f5d3SJohn Marinobenchmarking their effect on interactivity on that machine. Root is almost
*86d7f5d3SJohn Marinomandatory for this benchmark, or real time privileges at the very least. You
*86d7f5d3SJohn Marinoneed free disk space in the directory it is being run in the order of 2* your
*86d7f5d3SJohn Marinophysical ram for the disk loads. A default run in v0.21 takes about 15
*86d7f5d3SJohn Marinominutes to complete, longer if your disk is slow.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoAs the benchmark bases the work it does on the speed of the hardware the
*86d7f5d3SJohn Marinoresults from different hardware can not be directly compared. However changes
*86d7f5d3SJohn Marinoof kernels, filesystem and options can be compared. To do a comparison of
*86d7f5d3SJohn Marinodifferent cpus and keep the workload constant, using the -l option and
*86d7f5d3SJohn Marinopassing the value of "loops_per_ms" from the first hardware tested will keep
*86d7f5d3SJohn Marinothe number of cpu cycles fairly constant allowing some comparison. Future
*86d7f5d3SJohn Marinoversions may add the option of setting the amount of disk throughput etc.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoCommand line options supported:
*86d7f5d3SJohn Marinointerbench [-l <int>] [-L <int>] [-t <int] [-B <int>] [-N <int>]
*86d7f5d3SJohn Marino        [-b] [-c] [-r] [-C <int> -I <int>] [-m <comment>]
*86d7f5d3SJohn Marino        [-w <load type>] [-x <load type>] [-W <bench>] [-X <bench>]
*86d7f5d3SJohn Marino        [-h]
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino -l     Use <int> loops per sec (default: use saved benchmark)
*86d7f5d3SJohn Marino -L     Use cpu load of <int> with burn load (default: 4)
*86d7f5d3SJohn Marino -t     Seconds to run each benchmark (default: 30)
*86d7f5d3SJohn Marino -B     Nice the benchmarked thread to <int> (default: 0)
*86d7f5d3SJohn Marino -N     Nice the load thread to <int> (default: 0)
*86d7f5d3SJohn Marino -b     Benchmark loops_per_ms even if it is already known
*86d7f5d3SJohn Marino -c     Output to console only (default: use console and logfile)
*86d7f5d3SJohn Marino -r     Perform real time scheduling benchmarks (default: non-rt)
*86d7f5d3SJohn Marino -C     Use <int> percentage cpu as a custom load (default: no custom load)
*86d7f5d3SJohn Marino -I     Use <int> microsecond intervals for custom load (needs -C as well)
*86d7f5d3SJohn Marino -m     Add <comment> to the log file as a separate line
*86d7f5d3SJohn Marino -w     Add <load type> to the list of loads to be tested against
*86d7f5d3SJohn Marino -x     Exclude <load type> from the list of loads to be tested against
*86d7f5d3SJohn Marino -W     Add <bench> to the list of benchmarks to be tested
*86d7f5d3SJohn Marino -X     Exclude <bench> from the list of benchmarks to be tested
*86d7f5d3SJohn Marino -h     Show help
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoThere is one hidden option which is not supported by default, -u
*86d7f5d3SJohn Marinowhich emulates a uniprocessor when run on an smp machine. The support for cpu
*86d7f5d3SJohn Marinoaffinity is not built in by default because there are multiple versions of
*86d7f5d3SJohn Marinothe sched_setaffinity call in glibc that not only accept different variable
*86d7f5d3SJohn Marinotypes but across architectures take different numbers of arguments. For x86
*86d7f5d3SJohn Marinosupport you can change the '#if 0' in interbench.c to '#if 1' to enable the
*86d7f5d3SJohn Marinoaffinity support to be built in. The function on x86_64 for those very keen
*86d7f5d3SJohn Marinodoes not have the sizeof argument.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoThanks:
*86d7f5d3SJohn MarinoFor help from Zwane Mwaikambo, Bert Hubert, Seth Arnold, Rik Van Riel,
*86d7f5d3SJohn MarinoNicholas Miell, John Levon, Miguel Freitas and Peter Williams.
*86d7f5d3SJohn MarinoAggelos Economopoulos for contest code, Bob Matthews for irman (mem_load)
*86d7f5d3SJohn Marinocode, Rusty Russell for hackbench code and Julien Valroff for manpage.
*86d7f5d3SJohn Marino
*86d7f5d3SJohn MarinoSat Mar 4 12:11:34 2006
*86d7f5d3SJohn MarinoCon Kolivas < kernel at kolivas dot org >