xref: /dflybsd-src/test/interbench/readme (revision 86d7f5d305c6adaa56ff4582ece9859d73106103)
1*86d7f5d3SJohn Marino	Interbench - The Linux Interactivity Benchmark
2*86d7f5d3SJohn Marino
3*86d7f5d3SJohn Marino
4*86d7f5d3SJohn Marino	Introduction
5*86d7f5d3SJohn Marino
6*86d7f5d3SJohn MarinoThis benchmark application is designed to benchmark interactivity in Linux. See
7*86d7f5d3SJohn Marinothe file readme.interactivity for a brief definition.
8*86d7f5d3SJohn Marino
9*86d7f5d3SJohn MarinoIt is designed to measure the effect of changes in Linux kernel design or system
10*86d7f5d3SJohn Marinoconfiguration changes such as cpu, I/O scheduler and filesystem changes and
11*86d7f5d3SJohn Marinooptions. With careful benchmarking, different hardware can be compared.
12*86d7f5d3SJohn Marino
13*86d7f5d3SJohn Marino
14*86d7f5d3SJohn Marino	What does it do?
15*86d7f5d3SJohn Marino
16*86d7f5d3SJohn MarinoIt is designed to emulate the cpu scheduling behaviour of interactive tasks and
17*86d7f5d3SJohn Marinomeasure their scheduling latency and jitter. It does this with the tasks on
18*86d7f5d3SJohn Marinotheir own and then in the presence of various background loads, both with
19*86d7f5d3SJohn Marinoconfigurable nice levels and the benchmarked tasks can be real time.
20*86d7f5d3SJohn Marino
21*86d7f5d3SJohn Marino
22*86d7f5d3SJohn Marino	How does it work?
23*86d7f5d3SJohn Marino
24*86d7f5d3SJohn MarinoFirst it benchmarks how best to reproduce a fixed percentage of cpu usage on the
25*86d7f5d3SJohn Marinomachine currently being used for the benchmark. It saves this to a file and then
26*86d7f5d3SJohn Marinouses this for all subsequent runs to keep the emulation of cpu usage constant.
27*86d7f5d3SJohn Marino
28*86d7f5d3SJohn MarinoIt runs a real time high priority timing thread that wakes up the thread or
29*86d7f5d3SJohn Marinothreads of the simulated interactive tasks and then measures the latency in the
30*86d7f5d3SJohn Marinotime taken to schedule. As there is no accurate timer driven scheduling in linux
31*86d7f5d3SJohn Marinothe timing thread sleeps as accurately as linux kernel supports, and latency is
32*86d7f5d3SJohn Marinoconsidered as the time from this sleep till the simulated task gets scheduled.
33*86d7f5d3SJohn Marino
34*86d7f5d3SJohn MarinoEach benchmarked simulation runs as a separate process with its own threads,
35*86d7f5d3SJohn Marinoand the background load (if any) also runs as a separate process.
36*86d7f5d3SJohn Marino
37*86d7f5d3SJohn Marino
38*86d7f5d3SJohn Marino	What interactive tasks are simulated and how?
39*86d7f5d3SJohn Marino
40*86d7f5d3SJohn MarinoX:
41*86d7f5d3SJohn MarinoX is simulated as a thread that uses a variable amount of cpu ranging from 0 to
42*86d7f5d3SJohn Marino100%. This simulates an idle gui where a window is grabbed and then dragged
43*86d7f5d3SJohn Marinoacross the screen.
44*86d7f5d3SJohn Marino
45*86d7f5d3SJohn MarinoAudio:
46*86d7f5d3SJohn MarinoAudio is simulated as a thread that tries to run at 50ms intervals that then
47*86d7f5d3SJohn Marinorequires 5% cpu. This behaviour ignores any caching that would normally be done
48*86d7f5d3SJohn Marinoby well designed audio applications, but has been seen as the interval used to
49*86d7f5d3SJohn Marinowrite to audio cards by a popular linux audio player. It also ignores any of the
50*86d7f5d3SJohn Marinoeffects of different audio drivers and audio cards. Audio is also benchmarked
51*86d7f5d3SJohn Marinorunning SCHED_FIFO if the real time benchmarking option is used.
52*86d7f5d3SJohn Marino
53*86d7f5d3SJohn MarinoVideo:
54*86d7f5d3SJohn MarinoVideo is simulated as a thread that tries to receive cpu 60 times per second
55*86d7f5d3SJohn Marinoand uses 40% cpu. This would be quite a demanding video playback at 60fps. Like
56*86d7f5d3SJohn Marinothe audio simulator it ignores caching, drivers and video cards. As per audio,
57*86d7f5d3SJohn Marinovideo is benchmarked with the real time option.
58*86d7f5d3SJohn Marino
59*86d7f5d3SJohn MarinoGaming:
60*86d7f5d3SJohn MarinoThe cpu usage behind gaming is not at all interactive, yet games clearly are
61*86d7f5d3SJohn Marinointended for interactive usage. This load simply uses as much cpu as it can
62*86d7f5d3SJohn Marinoget. It does not return deadlines met as there are no deadlines with an
63*86d7f5d3SJohn Marinounlocked frame rate in a game. This does not accurately emulate a 3d game
64*86d7f5d3SJohn Marinowhich is gpu bound (limited purely by the graphics card), only a cpu bound
65*86d7f5d3SJohn Marinoone.
66*86d7f5d3SJohn Marino
67*86d7f5d3SJohn MarinoCustom:
68*86d7f5d3SJohn MarinoThis load will allow you to specify your own combination of cpu percentage and
69*86d7f5d3SJohn Marinointervals if you have a specific workload you are interested in and know the
70*86d7f5d3SJohn Marinocpu usage and frame rate of it on the hardware you are testing.
71*86d7f5d3SJohn Marino
72*86d7f5d3SJohn Marino
73*86d7f5d3SJohn Marino	What loads are simulated?
74*86d7f5d3SJohn Marino
75*86d7f5d3SJohn MarinoNone:
76*86d7f5d3SJohn MarinoOtherwise idle system.
77*86d7f5d3SJohn Marino
78*86d7f5d3SJohn MarinoVideo:
79*86d7f5d3SJohn MarinoThe video simulation thread is also used as a background load.
80*86d7f5d3SJohn Marino
81*86d7f5d3SJohn MarinoX:
82*86d7f5d3SJohn MarinoThe X simulation thread is used as a load.
83*86d7f5d3SJohn Marino
84*86d7f5d3SJohn MarinoBurn:
85*86d7f5d3SJohn MarinoA configurable number of threads fully cpu bound (4 by default).
86*86d7f5d3SJohn Marino
87*86d7f5d3SJohn MarinoWrite:
88*86d7f5d3SJohn MarinoA streaming write to disk repeatedly of a file the size of physical ram.
89*86d7f5d3SJohn Marino
90*86d7f5d3SJohn MarinoRead:
91*86d7f5d3SJohn MarinoRepeatedly reading a file from disk the size of physical ram (to avoid any
92*86d7f5d3SJohn Marinocaching effects).
93*86d7f5d3SJohn Marino
94*86d7f5d3SJohn MarinoCompile:
95*86d7f5d3SJohn MarinoSimulating a heavy 'make -j4' compilation by running Burn, Write and Read
96*86d7f5d3SJohn Marinoconcurrently.
97*86d7f5d3SJohn Marino
98*86d7f5d3SJohn MarinoMemload:
99*86d7f5d3SJohn MarinoSimulating heavy memory and swap pressure by repeatedly accessing 110% of
100*86d7f5d3SJohn Marinoavailable ram and moving it around and freeing it. You need to have some
101*86d7f5d3SJohn Marinoswap enabled due to the nature of this load, and if it detects no swap this
102*86d7f5d3SJohn Marinoload is disabled.
103*86d7f5d3SJohn Marino
104*86d7f5d3SJohn MarinoHack:
105*86d7f5d3SJohn MarinoThis repeatedly runs the benchmarking program "hackbench" as 'hackbench 50'.
106*86d7f5d3SJohn MarinoThis is suggested as a real time load only but because of how extreme this
107*86d7f5d3SJohn Marinoload is it is not unusual for an out-of-memory kill to occur which will
108*86d7f5d3SJohn Marinoinvalidate any data you get. For this reason it is disabled by default.
109*86d7f5d3SJohn Marino
110*86d7f5d3SJohn MarinoCustom:
111*86d7f5d3SJohn MarinoThe custom simulation is used as a load.
112*86d7f5d3SJohn Marino
113*86d7f5d3SJohn Marino
114*86d7f5d3SJohn Marino	What is measured and what does it mean?
115*86d7f5d3SJohn Marino
116*86d7f5d3SJohn Marino1. The average scheduling latency (time to requesting cpu till actually getting
117*86d7f5d3SJohn Marinoit) of deadlines met during the test period.
118*86d7f5d3SJohn Marino2. The scheduling jitter is represented by calculating the standard deviation
119*86d7f5d3SJohn Marinoof the latency
120*86d7f5d3SJohn Marino3. The maximum latency seen during the test period
121*86d7f5d3SJohn Marino4. Percentage of desired cpu
122*86d7f5d3SJohn Marino5. Percentage of deadlines met.
123*86d7f5d3SJohn Marino
124*86d7f5d3SJohn MarinoThis data is output to console and saved to a file which is stamped with the
125*86d7f5d3SJohn Marinokernel name and date. See sample.log.
126*86d7f5d3SJohn Marino
127*86d7f5d3SJohn Marino	Sample:
128*86d7f5d3SJohn Marino--- Benchmarking simulated cpu of X in the presence of simulated ---
129*86d7f5d3SJohn MarinoLoad	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
130*86d7f5d3SJohn MarinoNone	  0.495 +/- 0.495         45		 100	         96
131*86d7f5d3SJohn MarinoVideo	   11.7 +/- 11.7        1815		89.6	       62.7
132*86d7f5d3SJohn MarinoBurn	   27.9 +/- 28.1        3335		78.5	         44
133*86d7f5d3SJohn MarinoWrite	   4.02 +/- 4.03         372		  97	       78.7
134*86d7f5d3SJohn MarinoRead	   1.09 +/- 1.09         158		99.7	         88
135*86d7f5d3SJohn MarinoCompile	   28.8 +/- 28.8        3351		78.2	       43.7
136*86d7f5d3SJohn MarinoMemload	   2.81 +/- 2.81         187		98.7	         85
137*86d7f5d3SJohn Marino
138*86d7f5d3SJohn MarinoWhat can be seen here is that never during this test run were all the so called
139*86d7f5d3SJohn Marinodeadlines met by the X simulator, although all the desired cpu was achieved
140*86d7f5d3SJohn Marinounder no load. In X terms this means that every bit of window movement was
141*86d7f5d3SJohn Marinodrawn while moving the window, but some were delayed and there was enough time
142*86d7f5d3SJohn Marinoto catch up before the next deadline. In the 'Burn' column we can see that only
143*86d7f5d3SJohn Marino44% of the deadlines were met, and only 78.5% of the desired cpu was achieved.
144*86d7f5d3SJohn MarinoThis means that some deadlines were so late (%deadlines met was low) that some
145*86d7f5d3SJohn Marinoredraws were dropped entirely to catch up. In X terms this would translate into
146*86d7f5d3SJohn Marinojerky movement, in audio it would be a skip, and in video it would be a dropped
147*86d7f5d3SJohn Marinoframe. Note that despite the massive maximum latency of >3seconds, the average
148*86d7f5d3SJohn Marinolatency is still less than 30ms. This is because redraws are dropped in order
149*86d7f5d3SJohn Marinoto catch up usually by these sorts of applications.
150*86d7f5d3SJohn Marino
151*86d7f5d3SJohn Marino
152*86d7f5d3SJohn Marino	What is relevant in the data?
153*86d7f5d3SJohn Marino
154*86d7f5d3SJohn MarinoThe results pessimise quite a lot what happens in real world terms because they
155*86d7f5d3SJohn Marinoignore the reality of buffering, but this allows us to pick up subtle
156*86d7f5d3SJohn Marinodifferences more readily. In terms of what would be noticed by the end user,
157*86d7f5d3SJohn Marinodropping deadlines would make noticable clicks in audio, subtle visible frame
158*86d7f5d3SJohn Marinotime delays in video, and loss of "smooth" movement in X. Dropping desired cpu
159*86d7f5d3SJohn Marinowould be much more noticeable with audio skips, missed video frames or jerks
160*86d7f5d3SJohn Marinoin window movement under X. The magnitude of these would be best represented by
161*86d7f5d3SJohn Marinothe maximum latency. When the deadlines are actually met, the average latency
162*86d7f5d3SJohn Marinorepresents how "smooth" it would look. Average humans' limit of perception for
163*86d7f5d3SJohn Marinojitter is in the order of 7ms. Trained audio observers might notice much less.
164*86d7f5d3SJohn Marino
165*86d7f5d3SJohn Marino
166*86d7f5d3SJohn Marino	How to use it?
167*86d7f5d3SJohn Marino
168*86d7f5d3SJohn MarinoIn response to critisicm of difficulty in setting up my previous benchmark,
169*86d7f5d3SJohn Marinocontest, I've made this as simple as possible.
170*86d7f5d3SJohn Marino
171*86d7f5d3SJohn Marino	Short version:
172*86d7f5d3SJohn Marinomake
173*86d7f5d3SJohn Marino./interbench
174*86d7f5d3SJohn Marino
175*86d7f5d3SJohn MarinoPlease read the long version before submitting results!
176*86d7f5d3SJohn Marino
177*86d7f5d3SJohn Marino	Longer version:
178*86d7f5d3SJohn MarinoBuild with 'make'. It is a single executable once built so if you desire to
179*86d7f5d3SJohn Marinoinstall it simply copy the interbench binary wherever you like.
180*86d7f5d3SJohn Marino
181*86d7f5d3SJohn MarinoTo get good reproducible data from it you should boot into runlevel one so
182*86d7f5d3SJohn Marinothat nothing else is running on the machine. All power saving (cpu throttling,
183*86d7f5d3SJohn Marinocpu frequency modifications) must be disabled on the first run to get an
184*86d7f5d3SJohn Marinoaccurate measurement for cpu usage. You may enable them later if you are
185*86d7f5d3SJohn Marinobenchmarking their effect on interactivity on that machine. Root is almost
186*86d7f5d3SJohn Marinomandatory for this benchmark, or real time privileges at the very least. You
187*86d7f5d3SJohn Marinoneed free disk space in the directory it is being run in the order of 2* your
188*86d7f5d3SJohn Marinophysical ram for the disk loads. A default run in v0.21 takes about 15
189*86d7f5d3SJohn Marinominutes to complete, longer if your disk is slow.
190*86d7f5d3SJohn Marino
191*86d7f5d3SJohn MarinoAs the benchmark bases the work it does on the speed of the hardware the
192*86d7f5d3SJohn Marinoresults from different hardware can not be directly compared. However changes
193*86d7f5d3SJohn Marinoof kernels, filesystem and options can be compared. To do a comparison of
194*86d7f5d3SJohn Marinodifferent cpus and keep the workload constant, using the -l option and
195*86d7f5d3SJohn Marinopassing the value of "loops_per_ms" from the first hardware tested will keep
196*86d7f5d3SJohn Marinothe number of cpu cycles fairly constant allowing some comparison. Future
197*86d7f5d3SJohn Marinoversions may add the option of setting the amount of disk throughput etc.
198*86d7f5d3SJohn Marino
199*86d7f5d3SJohn Marino
200*86d7f5d3SJohn MarinoCommand line options supported:
201*86d7f5d3SJohn Marinointerbench [-l <int>] [-L <int>] [-t <int] [-B <int>] [-N <int>]
202*86d7f5d3SJohn Marino        [-b] [-c] [-r] [-C <int> -I <int>] [-m <comment>]
203*86d7f5d3SJohn Marino        [-w <load type>] [-x <load type>] [-W <bench>] [-X <bench>]
204*86d7f5d3SJohn Marino        [-h]
205*86d7f5d3SJohn Marino
206*86d7f5d3SJohn Marino -l     Use <int> loops per sec (default: use saved benchmark)
207*86d7f5d3SJohn Marino -L     Use cpu load of <int> with burn load (default: 4)
208*86d7f5d3SJohn Marino -t     Seconds to run each benchmark (default: 30)
209*86d7f5d3SJohn Marino -B     Nice the benchmarked thread to <int> (default: 0)
210*86d7f5d3SJohn Marino -N     Nice the load thread to <int> (default: 0)
211*86d7f5d3SJohn Marino -b     Benchmark loops_per_ms even if it is already known
212*86d7f5d3SJohn Marino -c     Output to console only (default: use console and logfile)
213*86d7f5d3SJohn Marino -r     Perform real time scheduling benchmarks (default: non-rt)
214*86d7f5d3SJohn Marino -C     Use <int> percentage cpu as a custom load (default: no custom load)
215*86d7f5d3SJohn Marino -I     Use <int> microsecond intervals for custom load (needs -C as well)
216*86d7f5d3SJohn Marino -m     Add <comment> to the log file as a separate line
217*86d7f5d3SJohn Marino -w     Add <load type> to the list of loads to be tested against
218*86d7f5d3SJohn Marino -x     Exclude <load type> from the list of loads to be tested against
219*86d7f5d3SJohn Marino -W     Add <bench> to the list of benchmarks to be tested
220*86d7f5d3SJohn Marino -X     Exclude <bench> from the list of benchmarks to be tested
221*86d7f5d3SJohn Marino -h     Show help
222*86d7f5d3SJohn Marino
223*86d7f5d3SJohn MarinoThere is one hidden option which is not supported by default, -u
224*86d7f5d3SJohn Marinowhich emulates a uniprocessor when run on an smp machine. The support for cpu
225*86d7f5d3SJohn Marinoaffinity is not built in by default because there are multiple versions of
226*86d7f5d3SJohn Marinothe sched_setaffinity call in glibc that not only accept different variable
227*86d7f5d3SJohn Marinotypes but across architectures take different numbers of arguments. For x86
228*86d7f5d3SJohn Marinosupport you can change the '#if 0' in interbench.c to '#if 1' to enable the
229*86d7f5d3SJohn Marinoaffinity support to be built in. The function on x86_64 for those very keen
230*86d7f5d3SJohn Marinodoes not have the sizeof argument.
231*86d7f5d3SJohn Marino
232*86d7f5d3SJohn Marino
233*86d7f5d3SJohn MarinoThanks:
234*86d7f5d3SJohn MarinoFor help from Zwane Mwaikambo, Bert Hubert, Seth Arnold, Rik Van Riel,
235*86d7f5d3SJohn MarinoNicholas Miell, John Levon, Miguel Freitas and Peter Williams.
236*86d7f5d3SJohn MarinoAggelos Economopoulos for contest code, Bob Matthews for irman (mem_load)
237*86d7f5d3SJohn Marinocode, Rusty Russell for hackbench code and Julien Valroff for manpage.
238*86d7f5d3SJohn Marino
239*86d7f5d3SJohn MarinoSat Mar 4 12:11:34 2006
240*86d7f5d3SJohn MarinoCon Kolivas < kernel at kolivas dot org >
241