xref: /minix3/common/dist/zlib/contrib/asm586/README.586 (revision 44bedb31d842b4b0444105519bcf929a69fe2dc1)
1*44bedb31SLionel SambucThis is a patched version of zlib modified to use
2*44bedb31SLionel SambucPentium-optimized assembly code in the deflation algorithm. The files
3*44bedb31SLionel Sambucchanged/added by this patch are:
4*44bedb31SLionel Sambuc
5*44bedb31SLionel SambucREADME.586
6*44bedb31SLionel Sambucmatch.S
7*44bedb31SLionel Sambuc
8*44bedb31SLionel SambucThe effectiveness of these modifications is a bit marginal, as the
9*44bedb31SLionel Sambucprogram's bottleneck seems to be mostly L1-cache contention, for which
10*44bedb31SLionel Sambucthere is no real way to work around without rewriting the basic
11*44bedb31SLionel Sambucalgorithm. The speedup on average is around 5-10% (which is generally
12*44bedb31SLionel Sambucless than the amount of variance between subsequent executions).
13*44bedb31SLionel SambucHowever, when used at level 9 compression, the cache contention can
14*44bedb31SLionel Sambucdrop enough for the assembly version to achieve 10-20% speedup (and
15*44bedb31SLionel Sambucsometimes more, depending on the amount of overall redundancy in the
16*44bedb31SLionel Sambucfiles). Even here, though, cache contention can still be the limiting
17*44bedb31SLionel Sambucfactor, depending on the nature of the program using the zlib library.
18*44bedb31SLionel SambucThis may also mean that better improvements will be seen on a Pentium
19*44bedb31SLionel Sambucwith MMX, which suffers much less from L1-cache contention, but I have
20*44bedb31SLionel Sambucnot yet verified this.
21*44bedb31SLionel Sambuc
22*44bedb31SLionel SambucNote that this code has been tailored for the Pentium in particular,
23*44bedb31SLionel Sambucand will not perform well on the Pentium Pro (due to the use of a
24*44bedb31SLionel Sambucpartial register in the inner loop).
25*44bedb31SLionel Sambuc
26*44bedb31SLionel SambucIf you are using an assembler other than GNU as, you will have to
27*44bedb31SLionel Sambuctranslate match.S to use your assembler's syntax. (Have fun.)
28*44bedb31SLionel Sambuc
29*44bedb31SLionel SambucBrian Raiter
30*44bedb31SLionel Sambucbreadbox@muppetlabs.com
31*44bedb31SLionel SambucApril, 1998
32*44bedb31SLionel Sambuc
33*44bedb31SLionel Sambuc
34*44bedb31SLionel SambucAdded for zlib 1.1.3:
35*44bedb31SLionel Sambuc
36*44bedb31SLionel SambucThe patches come from
37*44bedb31SLionel Sambuchttp://www.muppetlabs.com/~breadbox/software/assembly.html
38*44bedb31SLionel Sambuc
39*44bedb31SLionel SambucTo compile zlib with this asm file, copy match.S to the zlib directory
40*44bedb31SLionel Sambucthen do:
41*44bedb31SLionel Sambuc
42*44bedb31SLionel SambucCFLAGS="-O3 -DASMV" ./configure
43*44bedb31SLionel Sambucmake OBJA=match.o
44