xref: /minix3/common/dist/zlib/contrib/masmx86/gvmat32.asm (revision 44bedb31d842b4b0444105519bcf929a69fe2dc1)
1*44bedb31SLionel Sambuc; gvmat32.asm -- Asm portion of the optimized longest_match for 32 bits x86
2*44bedb31SLionel Sambuc; Copyright (C) 1995-1996 Jean-loup Gailly and Gilles Vollant.
3*44bedb31SLionel Sambuc; File written by Gilles Vollant, by modifiying the longest_match
4*44bedb31SLionel Sambuc;  from Jean-loup Gailly in deflate.c
5*44bedb31SLionel Sambuc;
6*44bedb31SLionel Sambuc;         http://www.zlib.net
7*44bedb31SLionel Sambuc;         http://www.winimage.com/zLibDll
8*44bedb31SLionel Sambuc;         http://www.muppetlabs.com/~breadbox/software/assembly.html
9*44bedb31SLionel Sambuc;
10*44bedb31SLionel Sambuc; For Visual C++ 4.x and higher and ML 6.x and higher
11*44bedb31SLionel Sambuc;   ml.exe is in directory \MASM611C of Win95 DDK
12*44bedb31SLionel Sambuc;   ml.exe is also distributed in http://www.masm32.com/masmdl.htm
13*44bedb31SLionel Sambuc;    and in VC++2003 toolkit at http://msdn.microsoft.com/visualc/vctoolkit2003/
14*44bedb31SLionel Sambuc;
15*44bedb31SLionel Sambuc; this file contain two implementation of longest_match
16*44bedb31SLionel Sambuc;
17*44bedb31SLionel Sambuc;  longest_match_7fff : written 1996 by Gilles Vollant optimized for
18*44bedb31SLionel Sambuc;            first Pentium. Assume s->w_mask == 0x7fff
19*44bedb31SLionel Sambuc;  longest_match_686 : written by Brian raiter (1998), optimized for Pentium Pro
20*44bedb31SLionel Sambuc;
21*44bedb31SLionel Sambuc;  for using an seembly version of longest_match, you need define ASMV in project
22*44bedb31SLionel Sambuc;  There is two way in using gvmat32.asm
23*44bedb31SLionel Sambuc;
24*44bedb31SLionel Sambuc;  A) Suggested method
25*44bedb31SLionel Sambuc;    if you want include both longest_match_7fff and longest_match_686
26*44bedb31SLionel Sambuc;    compile the asm file running
27*44bedb31SLionel Sambuc;           ml /coff /Zi /Flgvmat32.lst /c gvmat32.asm
28*44bedb31SLionel Sambuc;    and include gvmat32c.c in your project
29*44bedb31SLionel Sambuc;    if you have an old cpu (386,486 or first Pentium) and s->w_mask==0x7fff,
30*44bedb31SLionel Sambuc;        longest_match_7fff will be used
31*44bedb31SLionel Sambuc;    if you have a more modern CPU (Pentium Pro, II and higher)
32*44bedb31SLionel Sambuc;        longest_match_686 will be used
33*44bedb31SLionel Sambuc;    on old cpu with s->w_mask!=0x7fff, longest_match_686 will be used,
34*44bedb31SLionel Sambuc;        but this is not a sitation you'll find often
35*44bedb31SLionel Sambuc;
36*44bedb31SLionel Sambuc;  B) Alternative
37*44bedb31SLionel Sambuc;    if you are not interresed in old cpu performance and want the smaller
38*44bedb31SLionel Sambuc;       binaries possible
39*44bedb31SLionel Sambuc;
40*44bedb31SLionel Sambuc;    compile the asm file running
41*44bedb31SLionel Sambuc;           ml /coff /Zi /c /Flgvmat32.lst /DNOOLDPENTIUMCODE gvmat32.asm
42*44bedb31SLionel Sambuc;    and do not include gvmat32c.c in your project (ou define also
43*44bedb31SLionel Sambuc;              NOOLDPENTIUMCODE)
44*44bedb31SLionel Sambuc;
45*44bedb31SLionel Sambuc; note : as I known, longest_match_686 is very faster than longest_match_7fff
46*44bedb31SLionel Sambuc;        on pentium Pro/II/III, faster (but less) in P4, but it seem
47*44bedb31SLionel Sambuc;        longest_match_7fff can be faster (very very litte) on AMD Athlon64/K8
48*44bedb31SLionel Sambuc;
49*44bedb31SLionel Sambuc; see below : zlib1222add must be adjuster if you use a zlib version < 1.2.2.2
50*44bedb31SLionel Sambuc
51*44bedb31SLionel Sambuc;uInt longest_match_7fff(s, cur_match)
52*44bedb31SLionel Sambuc;    deflate_state *s;
53*44bedb31SLionel Sambuc;    IPos cur_match;                             /* current match */
54*44bedb31SLionel Sambuc
55*44bedb31SLionel Sambuc    NbStack         equ     76
56*44bedb31SLionel Sambuc    cur_match       equ     dword ptr[esp+NbStack-0]
57*44bedb31SLionel Sambuc    str_s           equ     dword ptr[esp+NbStack-4]
58*44bedb31SLionel Sambuc; 5 dword on top (ret,ebp,esi,edi,ebx)
59*44bedb31SLionel Sambuc    adrret          equ     dword ptr[esp+NbStack-8]
60*44bedb31SLionel Sambuc    pushebp         equ     dword ptr[esp+NbStack-12]
61*44bedb31SLionel Sambuc    pushedi         equ     dword ptr[esp+NbStack-16]
62*44bedb31SLionel Sambuc    pushesi         equ     dword ptr[esp+NbStack-20]
63*44bedb31SLionel Sambuc    pushebx         equ     dword ptr[esp+NbStack-24]
64*44bedb31SLionel Sambuc
65*44bedb31SLionel Sambuc    chain_length    equ     dword ptr [esp+NbStack-28]
66*44bedb31SLionel Sambuc    limit           equ     dword ptr [esp+NbStack-32]
67*44bedb31SLionel Sambuc    best_len        equ     dword ptr [esp+NbStack-36]
68*44bedb31SLionel Sambuc    window          equ     dword ptr [esp+NbStack-40]
69*44bedb31SLionel Sambuc    prev            equ     dword ptr [esp+NbStack-44]
70*44bedb31SLionel Sambuc    scan_start      equ      word ptr [esp+NbStack-48]
71*44bedb31SLionel Sambuc    wmask           equ     dword ptr [esp+NbStack-52]
72*44bedb31SLionel Sambuc    match_start_ptr equ     dword ptr [esp+NbStack-56]
73*44bedb31SLionel Sambuc    nice_match      equ     dword ptr [esp+NbStack-60]
74*44bedb31SLionel Sambuc    scan            equ     dword ptr [esp+NbStack-64]
75*44bedb31SLionel Sambuc
76*44bedb31SLionel Sambuc    windowlen       equ     dword ptr [esp+NbStack-68]
77*44bedb31SLionel Sambuc    match_start     equ     dword ptr [esp+NbStack-72]
78*44bedb31SLionel Sambuc    strend          equ     dword ptr [esp+NbStack-76]
79*44bedb31SLionel Sambuc    NbStackAdd      equ     (NbStack-24)
80*44bedb31SLionel Sambuc
81*44bedb31SLionel Sambuc    .386p
82*44bedb31SLionel Sambuc
83*44bedb31SLionel Sambuc    name    gvmatch
84*44bedb31SLionel Sambuc    .MODEL  FLAT
85*44bedb31SLionel Sambuc
86*44bedb31SLionel Sambuc
87*44bedb31SLionel Sambuc
88*44bedb31SLionel Sambuc;  all the +zlib1222add offsets are due to the addition of fields
89*44bedb31SLionel Sambuc;  in zlib in the deflate_state structure since the asm code was first written
90*44bedb31SLionel Sambuc;  (if you compile with zlib 1.0.4 or older, use "zlib1222add equ (-4)").
91*44bedb31SLionel Sambuc;  (if you compile with zlib between 1.0.5 and 1.2.2.1, use "zlib1222add equ 0").
92*44bedb31SLionel Sambuc;  if you compile with zlib 1.2.2.2 or later , use "zlib1222add equ 8").
93*44bedb31SLionel Sambuc
94*44bedb31SLionel Sambuc    zlib1222add         equ     8
95*44bedb31SLionel Sambuc
96*44bedb31SLionel Sambuc;  Note : these value are good with a 8 bytes boundary pack structure
97*44bedb31SLionel Sambuc    dep_chain_length    equ     74h+zlib1222add
98*44bedb31SLionel Sambuc    dep_window          equ     30h+zlib1222add
99*44bedb31SLionel Sambuc    dep_strstart        equ     64h+zlib1222add
100*44bedb31SLionel Sambuc    dep_prev_length     equ     70h+zlib1222add
101*44bedb31SLionel Sambuc    dep_nice_match      equ     88h+zlib1222add
102*44bedb31SLionel Sambuc    dep_w_size          equ     24h+zlib1222add
103*44bedb31SLionel Sambuc    dep_prev            equ     38h+zlib1222add
104*44bedb31SLionel Sambuc    dep_w_mask          equ     2ch+zlib1222add
105*44bedb31SLionel Sambuc    dep_good_match      equ     84h+zlib1222add
106*44bedb31SLionel Sambuc    dep_match_start     equ     68h+zlib1222add
107*44bedb31SLionel Sambuc    dep_lookahead       equ     6ch+zlib1222add
108*44bedb31SLionel Sambuc
109*44bedb31SLionel Sambuc
110*44bedb31SLionel Sambuc_TEXT                   segment
111*44bedb31SLionel Sambuc
112*44bedb31SLionel SambucIFDEF NOUNDERLINE
113*44bedb31SLionel Sambuc   IFDEF NOOLDPENTIUMCODE
114*44bedb31SLionel Sambuc            public  longest_match
115*44bedb31SLionel Sambuc            public  match_init
116*44bedb31SLionel Sambuc   ELSE
117*44bedb31SLionel Sambuc            public  longest_match_7fff
118*44bedb31SLionel Sambuc            public  cpudetect32
119*44bedb31SLionel Sambuc            public  longest_match_686
120*44bedb31SLionel Sambuc   ENDIF
121*44bedb31SLionel SambucELSE
122*44bedb31SLionel Sambuc   IFDEF NOOLDPENTIUMCODE
123*44bedb31SLionel Sambuc            public  _longest_match
124*44bedb31SLionel Sambuc            public  _match_init
125*44bedb31SLionel Sambuc   ELSE
126*44bedb31SLionel Sambuc            public  _longest_match_7fff
127*44bedb31SLionel Sambuc            public  _cpudetect32
128*44bedb31SLionel Sambuc            public  _longest_match_686
129*44bedb31SLionel Sambuc   ENDIF
130*44bedb31SLionel SambucENDIF
131*44bedb31SLionel Sambuc
132*44bedb31SLionel Sambuc    MAX_MATCH           equ     258
133*44bedb31SLionel Sambuc    MIN_MATCH           equ     3
134*44bedb31SLionel Sambuc    MIN_LOOKAHEAD       equ     (MAX_MATCH+MIN_MATCH+1)
135*44bedb31SLionel Sambuc
136*44bedb31SLionel Sambuc
137*44bedb31SLionel Sambuc
138*44bedb31SLionel SambucIFNDEF NOOLDPENTIUMCODE
139*44bedb31SLionel SambucIFDEF NOUNDERLINE
140*44bedb31SLionel Sambuclongest_match_7fff   proc near
141*44bedb31SLionel SambucELSE
142*44bedb31SLionel Sambuc_longest_match_7fff  proc near
143*44bedb31SLionel SambucENDIF
144*44bedb31SLionel Sambuc
145*44bedb31SLionel Sambuc    mov     edx,[esp+4]
146*44bedb31SLionel Sambuc
147*44bedb31SLionel Sambuc
148*44bedb31SLionel Sambuc
149*44bedb31SLionel Sambuc    push    ebp
150*44bedb31SLionel Sambuc    push    edi
151*44bedb31SLionel Sambuc    push    esi
152*44bedb31SLionel Sambuc    push    ebx
153*44bedb31SLionel Sambuc
154*44bedb31SLionel Sambuc    sub     esp,NbStackAdd
155*44bedb31SLionel Sambuc
156*44bedb31SLionel Sambuc; initialize or check the variables used in match.asm.
157*44bedb31SLionel Sambuc    mov     ebp,edx
158*44bedb31SLionel Sambuc
159*44bedb31SLionel Sambuc; chain_length = s->max_chain_length
160*44bedb31SLionel Sambuc; if (prev_length>=good_match) chain_length >>= 2
161*44bedb31SLionel Sambuc    mov     edx,[ebp+dep_chain_length]
162*44bedb31SLionel Sambuc    mov     ebx,[ebp+dep_prev_length]
163*44bedb31SLionel Sambuc    cmp     [ebp+dep_good_match],ebx
164*44bedb31SLionel Sambuc    ja      noshr
165*44bedb31SLionel Sambuc    shr     edx,2
166*44bedb31SLionel Sambucnoshr:
167*44bedb31SLionel Sambuc; we increment chain_length because in the asm, the --chain_lenght is in the beginning of the loop
168*44bedb31SLionel Sambuc    inc     edx
169*44bedb31SLionel Sambuc    mov     edi,[ebp+dep_nice_match]
170*44bedb31SLionel Sambuc    mov     chain_length,edx
171*44bedb31SLionel Sambuc    mov     eax,[ebp+dep_lookahead]
172*44bedb31SLionel Sambuc    cmp     eax,edi
173*44bedb31SLionel Sambuc; if ((uInt)nice_match > s->lookahead) nice_match = s->lookahead;
174*44bedb31SLionel Sambuc    jae     nolookaheadnicematch
175*44bedb31SLionel Sambuc    mov     edi,eax
176*44bedb31SLionel Sambucnolookaheadnicematch:
177*44bedb31SLionel Sambuc; best_len = s->prev_length
178*44bedb31SLionel Sambuc    mov     best_len,ebx
179*44bedb31SLionel Sambuc
180*44bedb31SLionel Sambuc; window = s->window
181*44bedb31SLionel Sambuc    mov     esi,[ebp+dep_window]
182*44bedb31SLionel Sambuc    mov     ecx,[ebp+dep_strstart]
183*44bedb31SLionel Sambuc    mov     window,esi
184*44bedb31SLionel Sambuc
185*44bedb31SLionel Sambuc    mov     nice_match,edi
186*44bedb31SLionel Sambuc; scan = window + strstart
187*44bedb31SLionel Sambuc    add     esi,ecx
188*44bedb31SLionel Sambuc    mov     scan,esi
189*44bedb31SLionel Sambuc; dx = *window
190*44bedb31SLionel Sambuc    mov     dx,word ptr [esi]
191*44bedb31SLionel Sambuc; bx = *(window+best_len-1)
192*44bedb31SLionel Sambuc    mov     bx,word ptr [esi+ebx-1]
193*44bedb31SLionel Sambuc    add     esi,MAX_MATCH-1
194*44bedb31SLionel Sambuc; scan_start = *scan
195*44bedb31SLionel Sambuc    mov     scan_start,dx
196*44bedb31SLionel Sambuc; strend = scan + MAX_MATCH-1
197*44bedb31SLionel Sambuc    mov     strend,esi
198*44bedb31SLionel Sambuc; bx = scan_end = *(window+best_len-1)
199*44bedb31SLionel Sambuc
200*44bedb31SLionel Sambuc;    IPos limit = s->strstart > (IPos)MAX_DIST(s) ?
201*44bedb31SLionel Sambuc;        s->strstart - (IPos)MAX_DIST(s) : NIL;
202*44bedb31SLionel Sambuc
203*44bedb31SLionel Sambuc    mov     esi,[ebp+dep_w_size]
204*44bedb31SLionel Sambuc    sub     esi,MIN_LOOKAHEAD
205*44bedb31SLionel Sambuc; here esi = MAX_DIST(s)
206*44bedb31SLionel Sambuc    sub     ecx,esi
207*44bedb31SLionel Sambuc    ja      nodist
208*44bedb31SLionel Sambuc    xor     ecx,ecx
209*44bedb31SLionel Sambucnodist:
210*44bedb31SLionel Sambuc    mov     limit,ecx
211*44bedb31SLionel Sambuc
212*44bedb31SLionel Sambuc; prev = s->prev
213*44bedb31SLionel Sambuc    mov     edx,[ebp+dep_prev]
214*44bedb31SLionel Sambuc    mov     prev,edx
215*44bedb31SLionel Sambuc
216*44bedb31SLionel Sambuc;
217*44bedb31SLionel Sambuc    mov     edx,dword ptr [ebp+dep_match_start]
218*44bedb31SLionel Sambuc    mov     bp,scan_start
219*44bedb31SLionel Sambuc    mov     eax,cur_match
220*44bedb31SLionel Sambuc    mov     match_start,edx
221*44bedb31SLionel Sambuc
222*44bedb31SLionel Sambuc    mov     edx,window
223*44bedb31SLionel Sambuc    mov     edi,edx
224*44bedb31SLionel Sambuc    add     edi,best_len
225*44bedb31SLionel Sambuc    mov     esi,prev
226*44bedb31SLionel Sambuc    dec     edi
227*44bedb31SLionel Sambuc; windowlen = window + best_len -1
228*44bedb31SLionel Sambuc    mov     windowlen,edi
229*44bedb31SLionel Sambuc
230*44bedb31SLionel Sambuc    jmp     beginloop2
231*44bedb31SLionel Sambuc    align   4
232*44bedb31SLionel Sambuc
233*44bedb31SLionel Sambuc; here, in the loop
234*44bedb31SLionel Sambuc;       eax = ax = cur_match
235*44bedb31SLionel Sambuc;       ecx = limit
236*44bedb31SLionel Sambuc;        bx = scan_end
237*44bedb31SLionel Sambuc;        bp = scan_start
238*44bedb31SLionel Sambuc;       edi = windowlen (window + best_len -1)
239*44bedb31SLionel Sambuc;       esi = prev
240*44bedb31SLionel Sambuc
241*44bedb31SLionel Sambuc
242*44bedb31SLionel Sambuc;// here; chain_length <=16
243*44bedb31SLionel Sambucnormalbeg0add16:
244*44bedb31SLionel Sambuc    add     chain_length,16
245*44bedb31SLionel Sambuc    jz      exitloop
246*44bedb31SLionel Sambucnormalbeg0:
247*44bedb31SLionel Sambuc    cmp     word ptr[edi+eax],bx
248*44bedb31SLionel Sambuc    je      normalbeg2noroll
249*44bedb31SLionel Sambucrcontlabnoroll:
250*44bedb31SLionel Sambuc; cur_match = prev[cur_match & wmask]
251*44bedb31SLionel Sambuc    and     eax,7fffh
252*44bedb31SLionel Sambuc    mov     ax,word ptr[esi+eax*2]
253*44bedb31SLionel Sambuc; if cur_match > limit, go to exitloop
254*44bedb31SLionel Sambuc    cmp     ecx,eax
255*44bedb31SLionel Sambuc    jnb     exitloop
256*44bedb31SLionel Sambuc; if --chain_length != 0, go to exitloop
257*44bedb31SLionel Sambuc    dec     chain_length
258*44bedb31SLionel Sambuc    jnz     normalbeg0
259*44bedb31SLionel Sambuc    jmp     exitloop
260*44bedb31SLionel Sambuc
261*44bedb31SLionel Sambucnormalbeg2noroll:
262*44bedb31SLionel Sambuc; if (scan_start==*(cur_match+window)) goto normalbeg2
263*44bedb31SLionel Sambuc    cmp     bp,word ptr[edx+eax]
264*44bedb31SLionel Sambuc    jne     rcontlabnoroll
265*44bedb31SLionel Sambuc    jmp     normalbeg2
266*44bedb31SLionel Sambuc
267*44bedb31SLionel Sambuccontloop3:
268*44bedb31SLionel Sambuc    mov     edi,windowlen
269*44bedb31SLionel Sambuc
270*44bedb31SLionel Sambuc; cur_match = prev[cur_match & wmask]
271*44bedb31SLionel Sambuc    and     eax,7fffh
272*44bedb31SLionel Sambuc    mov     ax,word ptr[esi+eax*2]
273*44bedb31SLionel Sambuc; if cur_match > limit, go to exitloop
274*44bedb31SLionel Sambuc    cmp     ecx,eax
275*44bedb31SLionel Sambucjnbexitloopshort1:
276*44bedb31SLionel Sambuc    jnb     exitloop
277*44bedb31SLionel Sambuc; if --chain_length != 0, go to exitloop
278*44bedb31SLionel Sambuc
279*44bedb31SLionel Sambuc
280*44bedb31SLionel Sambuc; begin the main loop
281*44bedb31SLionel Sambucbeginloop2:
282*44bedb31SLionel Sambuc    sub     chain_length,16+1
283*44bedb31SLionel Sambuc; if chain_length <=16, don't use the unrolled loop
284*44bedb31SLionel Sambuc    jna     normalbeg0add16
285*44bedb31SLionel Sambuc
286*44bedb31SLionel Sambucdo16:
287*44bedb31SLionel Sambuc    cmp     word ptr[edi+eax],bx
288*44bedb31SLionel Sambuc    je      normalbeg2dc0
289*44bedb31SLionel Sambuc
290*44bedb31SLionel Sambucmaccn   MACRO   lab
291*44bedb31SLionel Sambuc    and     eax,7fffh
292*44bedb31SLionel Sambuc    mov     ax,word ptr[esi+eax*2]
293*44bedb31SLionel Sambuc    cmp     ecx,eax
294*44bedb31SLionel Sambuc    jnb     exitloop
295*44bedb31SLionel Sambuc    cmp     word ptr[edi+eax],bx
296*44bedb31SLionel Sambuc    je      lab
297*44bedb31SLionel Sambuc    ENDM
298*44bedb31SLionel Sambuc
299*44bedb31SLionel Sambucrcontloop0:
300*44bedb31SLionel Sambuc    maccn   normalbeg2dc1
301*44bedb31SLionel Sambuc
302*44bedb31SLionel Sambucrcontloop1:
303*44bedb31SLionel Sambuc    maccn   normalbeg2dc2
304*44bedb31SLionel Sambuc
305*44bedb31SLionel Sambucrcontloop2:
306*44bedb31SLionel Sambuc    maccn   normalbeg2dc3
307*44bedb31SLionel Sambuc
308*44bedb31SLionel Sambucrcontloop3:
309*44bedb31SLionel Sambuc    maccn   normalbeg2dc4
310*44bedb31SLionel Sambuc
311*44bedb31SLionel Sambucrcontloop4:
312*44bedb31SLionel Sambuc    maccn   normalbeg2dc5
313*44bedb31SLionel Sambuc
314*44bedb31SLionel Sambucrcontloop5:
315*44bedb31SLionel Sambuc    maccn   normalbeg2dc6
316*44bedb31SLionel Sambuc
317*44bedb31SLionel Sambucrcontloop6:
318*44bedb31SLionel Sambuc    maccn   normalbeg2dc7
319*44bedb31SLionel Sambuc
320*44bedb31SLionel Sambucrcontloop7:
321*44bedb31SLionel Sambuc    maccn   normalbeg2dc8
322*44bedb31SLionel Sambuc
323*44bedb31SLionel Sambucrcontloop8:
324*44bedb31SLionel Sambuc    maccn   normalbeg2dc9
325*44bedb31SLionel Sambuc
326*44bedb31SLionel Sambucrcontloop9:
327*44bedb31SLionel Sambuc    maccn   normalbeg2dc10
328*44bedb31SLionel Sambuc
329*44bedb31SLionel Sambucrcontloop10:
330*44bedb31SLionel Sambuc    maccn   short normalbeg2dc11
331*44bedb31SLionel Sambuc
332*44bedb31SLionel Sambucrcontloop11:
333*44bedb31SLionel Sambuc    maccn   short normalbeg2dc12
334*44bedb31SLionel Sambuc
335*44bedb31SLionel Sambucrcontloop12:
336*44bedb31SLionel Sambuc    maccn   short normalbeg2dc13
337*44bedb31SLionel Sambuc
338*44bedb31SLionel Sambucrcontloop13:
339*44bedb31SLionel Sambuc    maccn   short normalbeg2dc14
340*44bedb31SLionel Sambuc
341*44bedb31SLionel Sambucrcontloop14:
342*44bedb31SLionel Sambuc    maccn   short normalbeg2dc15
343*44bedb31SLionel Sambuc
344*44bedb31SLionel Sambucrcontloop15:
345*44bedb31SLionel Sambuc    and     eax,7fffh
346*44bedb31SLionel Sambuc    mov     ax,word ptr[esi+eax*2]
347*44bedb31SLionel Sambuc    cmp     ecx,eax
348*44bedb31SLionel Sambuc    jnb     exitloop
349*44bedb31SLionel Sambuc
350*44bedb31SLionel Sambuc    sub     chain_length,16
351*44bedb31SLionel Sambuc    ja      do16
352*44bedb31SLionel Sambuc    jmp     normalbeg0add16
353*44bedb31SLionel Sambuc
354*44bedb31SLionel Sambuc;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
355*44bedb31SLionel Sambuc
356*44bedb31SLionel Sambucnormbeg MACRO   rcontlab,valsub
357*44bedb31SLionel Sambuc; if we are here, we know that *(match+best_len-1) == scan_end
358*44bedb31SLionel Sambuc    cmp     bp,word ptr[edx+eax]
359*44bedb31SLionel Sambuc; if (match != scan_start) goto rcontlab
360*44bedb31SLionel Sambuc    jne     rcontlab
361*44bedb31SLionel Sambuc; calculate the good chain_length, and we'll compare scan and match string
362*44bedb31SLionel Sambuc    add     chain_length,16-valsub
363*44bedb31SLionel Sambuc    jmp     iseq
364*44bedb31SLionel Sambuc    ENDM
365*44bedb31SLionel Sambuc
366*44bedb31SLionel Sambuc
367*44bedb31SLionel Sambucnormalbeg2dc11:
368*44bedb31SLionel Sambuc    normbeg rcontloop11,11
369*44bedb31SLionel Sambuc
370*44bedb31SLionel Sambucnormalbeg2dc12:
371*44bedb31SLionel Sambuc    normbeg short rcontloop12,12
372*44bedb31SLionel Sambuc
373*44bedb31SLionel Sambucnormalbeg2dc13:
374*44bedb31SLionel Sambuc    normbeg short rcontloop13,13
375*44bedb31SLionel Sambuc
376*44bedb31SLionel Sambucnormalbeg2dc14:
377*44bedb31SLionel Sambuc    normbeg short rcontloop14,14
378*44bedb31SLionel Sambuc
379*44bedb31SLionel Sambucnormalbeg2dc15:
380*44bedb31SLionel Sambuc    normbeg short rcontloop15,15
381*44bedb31SLionel Sambuc
382*44bedb31SLionel Sambucnormalbeg2dc10:
383*44bedb31SLionel Sambuc    normbeg rcontloop10,10
384*44bedb31SLionel Sambuc
385*44bedb31SLionel Sambucnormalbeg2dc9:
386*44bedb31SLionel Sambuc    normbeg rcontloop9,9
387*44bedb31SLionel Sambuc
388*44bedb31SLionel Sambucnormalbeg2dc8:
389*44bedb31SLionel Sambuc    normbeg rcontloop8,8
390*44bedb31SLionel Sambuc
391*44bedb31SLionel Sambucnormalbeg2dc7:
392*44bedb31SLionel Sambuc    normbeg rcontloop7,7
393*44bedb31SLionel Sambuc
394*44bedb31SLionel Sambucnormalbeg2dc6:
395*44bedb31SLionel Sambuc    normbeg rcontloop6,6
396*44bedb31SLionel Sambuc
397*44bedb31SLionel Sambucnormalbeg2dc5:
398*44bedb31SLionel Sambuc    normbeg rcontloop5,5
399*44bedb31SLionel Sambuc
400*44bedb31SLionel Sambucnormalbeg2dc4:
401*44bedb31SLionel Sambuc    normbeg rcontloop4,4
402*44bedb31SLionel Sambuc
403*44bedb31SLionel Sambucnormalbeg2dc3:
404*44bedb31SLionel Sambuc    normbeg rcontloop3,3
405*44bedb31SLionel Sambuc
406*44bedb31SLionel Sambucnormalbeg2dc2:
407*44bedb31SLionel Sambuc    normbeg rcontloop2,2
408*44bedb31SLionel Sambuc
409*44bedb31SLionel Sambucnormalbeg2dc1:
410*44bedb31SLionel Sambuc    normbeg rcontloop1,1
411*44bedb31SLionel Sambuc
412*44bedb31SLionel Sambucnormalbeg2dc0:
413*44bedb31SLionel Sambuc    normbeg rcontloop0,0
414*44bedb31SLionel Sambuc
415*44bedb31SLionel Sambuc
416*44bedb31SLionel Sambuc; we go in normalbeg2 because *(ushf*)(match+best_len-1) == scan_end
417*44bedb31SLionel Sambuc
418*44bedb31SLionel Sambucnormalbeg2:
419*44bedb31SLionel Sambuc    mov     edi,window
420*44bedb31SLionel Sambuc
421*44bedb31SLionel Sambuc    cmp     bp,word ptr[edi+eax]
422*44bedb31SLionel Sambuc    jne     contloop3                   ; if *(ushf*)match != scan_start, continue
423*44bedb31SLionel Sambuc
424*44bedb31SLionel Sambuciseq:
425*44bedb31SLionel Sambuc; if we are here, we know that *(match+best_len-1) == scan_end
426*44bedb31SLionel Sambuc; and (match == scan_start)
427*44bedb31SLionel Sambuc
428*44bedb31SLionel Sambuc    mov     edi,edx
429*44bedb31SLionel Sambuc    mov     esi,scan                    ; esi = scan
430*44bedb31SLionel Sambuc    add     edi,eax                     ; edi = window + cur_match = match
431*44bedb31SLionel Sambuc
432*44bedb31SLionel Sambuc    mov     edx,[esi+3]                 ; compare manually dword at match+3
433*44bedb31SLionel Sambuc    xor     edx,[edi+3]                 ; and scan +3
434*44bedb31SLionel Sambuc
435*44bedb31SLionel Sambuc    jz      begincompare                ; if equal, go to long compare
436*44bedb31SLionel Sambuc
437*44bedb31SLionel Sambuc; we will determine the unmatch byte and calculate len (in esi)
438*44bedb31SLionel Sambuc    or      dl,dl
439*44bedb31SLionel Sambuc    je      eq1rr
440*44bedb31SLionel Sambuc    mov     esi,3
441*44bedb31SLionel Sambuc    jmp     trfinval
442*44bedb31SLionel Sambuceq1rr:
443*44bedb31SLionel Sambuc    or      dx,dx
444*44bedb31SLionel Sambuc    je      eq1
445*44bedb31SLionel Sambuc
446*44bedb31SLionel Sambuc    mov     esi,4
447*44bedb31SLionel Sambuc    jmp     trfinval
448*44bedb31SLionel Sambuceq1:
449*44bedb31SLionel Sambuc    and     edx,0ffffffh
450*44bedb31SLionel Sambuc    jz      eq11
451*44bedb31SLionel Sambuc    mov     esi,5
452*44bedb31SLionel Sambuc    jmp     trfinval
453*44bedb31SLionel Sambuceq11:
454*44bedb31SLionel Sambuc    mov     esi,6
455*44bedb31SLionel Sambuc    jmp     trfinval
456*44bedb31SLionel Sambuc
457*44bedb31SLionel Sambucbegincompare:
458*44bedb31SLionel Sambuc    ; here we now scan and match begin same
459*44bedb31SLionel Sambuc    add     edi,6
460*44bedb31SLionel Sambuc    add     esi,6
461*44bedb31SLionel Sambuc    mov     ecx,(MAX_MATCH-(2+4))/4     ; scan for at most MAX_MATCH bytes
462*44bedb31SLionel Sambuc    repe    cmpsd                       ; loop until mismatch
463*44bedb31SLionel Sambuc
464*44bedb31SLionel Sambuc    je      trfin                       ; go to trfin if not unmatch
465*44bedb31SLionel Sambuc; we determine the unmatch byte
466*44bedb31SLionel Sambuc    sub     esi,4
467*44bedb31SLionel Sambuc    mov     edx,[edi-4]
468*44bedb31SLionel Sambuc    xor     edx,[esi]
469*44bedb31SLionel Sambuc
470*44bedb31SLionel Sambuc    or      dl,dl
471*44bedb31SLionel Sambuc    jnz     trfin
472*44bedb31SLionel Sambuc    inc     esi
473*44bedb31SLionel Sambuc
474*44bedb31SLionel Sambuc    or      dx,dx
475*44bedb31SLionel Sambuc    jnz     trfin
476*44bedb31SLionel Sambuc    inc     esi
477*44bedb31SLionel Sambuc
478*44bedb31SLionel Sambuc    and     edx,0ffffffh
479*44bedb31SLionel Sambuc    jnz     trfin
480*44bedb31SLionel Sambuc    inc     esi
481*44bedb31SLionel Sambuc
482*44bedb31SLionel Sambuctrfin:
483*44bedb31SLionel Sambuc    sub     esi,scan          ; esi = len
484*44bedb31SLionel Sambuctrfinval:
485*44bedb31SLionel Sambuc; here we have finised compare, and esi contain len of equal string
486*44bedb31SLionel Sambuc    cmp     esi,best_len        ; if len > best_len, go newbestlen
487*44bedb31SLionel Sambuc    ja      short newbestlen
488*44bedb31SLionel Sambuc; now we restore edx, ecx and esi, for the big loop
489*44bedb31SLionel Sambuc    mov     esi,prev
490*44bedb31SLionel Sambuc    mov     ecx,limit
491*44bedb31SLionel Sambuc    mov     edx,window
492*44bedb31SLionel Sambuc    jmp     contloop3
493*44bedb31SLionel Sambuc
494*44bedb31SLionel Sambucnewbestlen:
495*44bedb31SLionel Sambuc    mov     best_len,esi        ; len become best_len
496*44bedb31SLionel Sambuc
497*44bedb31SLionel Sambuc    mov     match_start,eax     ; save new position as match_start
498*44bedb31SLionel Sambuc    cmp     esi,nice_match      ; if best_len >= nice_match, exit
499*44bedb31SLionel Sambuc    jae     exitloop
500*44bedb31SLionel Sambuc    mov     ecx,scan
501*44bedb31SLionel Sambuc    mov     edx,window          ; restore edx=window
502*44bedb31SLionel Sambuc    add     ecx,esi
503*44bedb31SLionel Sambuc    add     esi,edx
504*44bedb31SLionel Sambuc
505*44bedb31SLionel Sambuc    dec     esi
506*44bedb31SLionel Sambuc    mov     windowlen,esi       ; windowlen = window + best_len-1
507*44bedb31SLionel Sambuc    mov     bx,[ecx-1]          ; bx = *(scan+best_len-1) = scan_end
508*44bedb31SLionel Sambuc
509*44bedb31SLionel Sambuc; now we restore ecx and esi, for the big loop :
510*44bedb31SLionel Sambuc    mov     esi,prev
511*44bedb31SLionel Sambuc    mov     ecx,limit
512*44bedb31SLionel Sambuc    jmp     contloop3
513*44bedb31SLionel Sambuc
514*44bedb31SLionel Sambucexitloop:
515*44bedb31SLionel Sambuc; exit : s->match_start=match_start
516*44bedb31SLionel Sambuc    mov     ebx,match_start
517*44bedb31SLionel Sambuc    mov     ebp,str_s
518*44bedb31SLionel Sambuc    mov     ecx,best_len
519*44bedb31SLionel Sambuc    mov     dword ptr [ebp+dep_match_start],ebx
520*44bedb31SLionel Sambuc    mov     eax,dword ptr [ebp+dep_lookahead]
521*44bedb31SLionel Sambuc    cmp     ecx,eax
522*44bedb31SLionel Sambuc    ja      minexlo
523*44bedb31SLionel Sambuc    mov     eax,ecx
524*44bedb31SLionel Sambucminexlo:
525*44bedb31SLionel Sambuc; return min(best_len,s->lookahead)
526*44bedb31SLionel Sambuc
527*44bedb31SLionel Sambuc; restore stack and register ebx,esi,edi,ebp
528*44bedb31SLionel Sambuc    add     esp,NbStackAdd
529*44bedb31SLionel Sambuc
530*44bedb31SLionel Sambuc    pop     ebx
531*44bedb31SLionel Sambuc    pop     esi
532*44bedb31SLionel Sambuc    pop     edi
533*44bedb31SLionel Sambuc    pop     ebp
534*44bedb31SLionel Sambuc    ret
535*44bedb31SLionel SambucInfoAuthor:
536*44bedb31SLionel Sambuc; please don't remove this string !
537*44bedb31SLionel Sambuc; Your are free use gvmat32 in any fre or commercial apps if you don't remove the string in the binary!
538*44bedb31SLionel Sambuc    db     0dh,0ah,"GVMat32 optimised assembly code written 1996-98 by Gilles Vollant",0dh,0ah
539*44bedb31SLionel Sambuc
540*44bedb31SLionel Sambuc
541*44bedb31SLionel Sambuc
542*44bedb31SLionel SambucIFDEF NOUNDERLINE
543*44bedb31SLionel Sambuclongest_match_7fff   endp
544*44bedb31SLionel SambucELSE
545*44bedb31SLionel Sambuc_longest_match_7fff  endp
546*44bedb31SLionel SambucENDIF
547*44bedb31SLionel Sambuc
548*44bedb31SLionel Sambuc
549*44bedb31SLionel SambucIFDEF NOUNDERLINE
550*44bedb31SLionel Sambuccpudetect32     proc near
551*44bedb31SLionel SambucELSE
552*44bedb31SLionel Sambuc_cpudetect32    proc near
553*44bedb31SLionel SambucENDIF
554*44bedb31SLionel Sambuc
555*44bedb31SLionel Sambuc    push    ebx
556*44bedb31SLionel Sambuc
557*44bedb31SLionel Sambuc    pushfd                  ; push original EFLAGS
558*44bedb31SLionel Sambuc    pop     eax             ; get original EFLAGS
559*44bedb31SLionel Sambuc    mov     ecx, eax        ; save original EFLAGS
560*44bedb31SLionel Sambuc    xor     eax, 40000h     ; flip AC bit in EFLAGS
561*44bedb31SLionel Sambuc    push    eax             ; save new EFLAGS value on stack
562*44bedb31SLionel Sambuc    popfd                   ; replace current EFLAGS value
563*44bedb31SLionel Sambuc    pushfd                  ; get new EFLAGS
564*44bedb31SLionel Sambuc    pop     eax             ; store new EFLAGS in EAX
565*44bedb31SLionel Sambuc    xor     eax, ecx        ; can�t toggle AC bit, processor=80386
566*44bedb31SLionel Sambuc    jz      end_cpu_is_386  ; jump if 80386 processor
567*44bedb31SLionel Sambuc    push    ecx
568*44bedb31SLionel Sambuc    popfd                   ; restore AC bit in EFLAGS first
569*44bedb31SLionel Sambuc
570*44bedb31SLionel Sambuc    pushfd
571*44bedb31SLionel Sambuc    pushfd
572*44bedb31SLionel Sambuc    pop     ecx
573*44bedb31SLionel Sambuc
574*44bedb31SLionel Sambuc    mov     eax, ecx        ; get original EFLAGS
575*44bedb31SLionel Sambuc    xor     eax, 200000h    ; flip ID bit in EFLAGS
576*44bedb31SLionel Sambuc    push    eax             ; save new EFLAGS value on stack
577*44bedb31SLionel Sambuc    popfd                   ; replace current EFLAGS value
578*44bedb31SLionel Sambuc    pushfd                  ; get new EFLAGS
579*44bedb31SLionel Sambuc    pop     eax             ; store new EFLAGS in EAX
580*44bedb31SLionel Sambuc    popfd                   ; restore original EFLAGS
581*44bedb31SLionel Sambuc    xor     eax, ecx        ; can�t toggle ID bit,
582*44bedb31SLionel Sambuc    je      is_old_486      ; processor=old
583*44bedb31SLionel Sambuc
584*44bedb31SLionel Sambuc    mov     eax,1
585*44bedb31SLionel Sambuc    db      0fh,0a2h        ;CPUID
586*44bedb31SLionel Sambuc
587*44bedb31SLionel Sambucexitcpudetect:
588*44bedb31SLionel Sambuc    pop ebx
589*44bedb31SLionel Sambuc    ret
590*44bedb31SLionel Sambuc
591*44bedb31SLionel Sambucend_cpu_is_386:
592*44bedb31SLionel Sambuc    mov     eax,0300h
593*44bedb31SLionel Sambuc    jmp     exitcpudetect
594*44bedb31SLionel Sambuc
595*44bedb31SLionel Sambucis_old_486:
596*44bedb31SLionel Sambuc    mov     eax,0400h
597*44bedb31SLionel Sambuc    jmp     exitcpudetect
598*44bedb31SLionel Sambuc
599*44bedb31SLionel SambucIFDEF NOUNDERLINE
600*44bedb31SLionel Sambuccpudetect32     endp
601*44bedb31SLionel SambucELSE
602*44bedb31SLionel Sambuc_cpudetect32    endp
603*44bedb31SLionel SambucENDIF
604*44bedb31SLionel SambucENDIF
605*44bedb31SLionel Sambuc
606*44bedb31SLionel SambucMAX_MATCH       equ     258
607*44bedb31SLionel SambucMIN_MATCH       equ     3
608*44bedb31SLionel SambucMIN_LOOKAHEAD   equ     (MAX_MATCH + MIN_MATCH + 1)
609*44bedb31SLionel SambucMAX_MATCH_8_     equ     ((MAX_MATCH + 7) AND 0FFF0h)
610*44bedb31SLionel Sambuc
611*44bedb31SLionel Sambuc
612*44bedb31SLionel Sambuc;;; stack frame offsets
613*44bedb31SLionel Sambuc
614*44bedb31SLionel Sambucchainlenwmask   equ  esp + 0    ; high word: current chain len
615*44bedb31SLionel Sambuc                    ; low word: s->wmask
616*44bedb31SLionel Sambucwindow      equ  esp + 4    ; local copy of s->window
617*44bedb31SLionel Sambucwindowbestlen   equ  esp + 8    ; s->window + bestlen
618*44bedb31SLionel Sambucscanstart   equ  esp + 16   ; first two bytes of string
619*44bedb31SLionel Sambucscanend     equ  esp + 12   ; last two bytes of string
620*44bedb31SLionel Sambucscanalign   equ  esp + 20   ; dword-misalignment of string
621*44bedb31SLionel Sambucnicematch   equ  esp + 24   ; a good enough match size
622*44bedb31SLionel Sambucbestlen     equ  esp + 28   ; size of best match so far
623*44bedb31SLionel Sambucscan        equ  esp + 32   ; ptr to string wanting match
624*44bedb31SLionel Sambuc
625*44bedb31SLionel SambucLocalVarsSize   equ 36
626*44bedb31SLionel Sambuc;   saved ebx   byte esp + 36
627*44bedb31SLionel Sambuc;   saved edi   byte esp + 40
628*44bedb31SLionel Sambuc;   saved esi   byte esp + 44
629*44bedb31SLionel Sambuc;   saved ebp   byte esp + 48
630*44bedb31SLionel Sambuc;   return address  byte esp + 52
631*44bedb31SLionel Sambucdeflatestate    equ  esp + 56   ; the function arguments
632*44bedb31SLionel Sambuccurmatch    equ  esp + 60
633*44bedb31SLionel Sambuc
634*44bedb31SLionel Sambuc;;; Offsets for fields in the deflate_state structure. These numbers
635*44bedb31SLionel Sambuc;;; are calculated from the definition of deflate_state, with the
636*44bedb31SLionel Sambuc;;; assumption that the compiler will dword-align the fields. (Thus,
637*44bedb31SLionel Sambuc;;; changing the definition of deflate_state could easily cause this
638*44bedb31SLionel Sambuc;;; program to crash horribly, without so much as a warning at
639*44bedb31SLionel Sambuc;;; compile time. Sigh.)
640*44bedb31SLionel Sambuc
641*44bedb31SLionel SambucdsWSize     equ 36+zlib1222add
642*44bedb31SLionel SambucdsWMask     equ 44+zlib1222add
643*44bedb31SLionel SambucdsWindow    equ 48+zlib1222add
644*44bedb31SLionel SambucdsPrev      equ 56+zlib1222add
645*44bedb31SLionel SambucdsMatchLen  equ 88+zlib1222add
646*44bedb31SLionel SambucdsPrevMatch equ 92+zlib1222add
647*44bedb31SLionel SambucdsStrStart  equ 100+zlib1222add
648*44bedb31SLionel SambucdsMatchStart    equ 104+zlib1222add
649*44bedb31SLionel SambucdsLookahead equ 108+zlib1222add
650*44bedb31SLionel SambucdsPrevLen   equ 112+zlib1222add
651*44bedb31SLionel SambucdsMaxChainLen   equ 116+zlib1222add
652*44bedb31SLionel SambucdsGoodMatch equ 132+zlib1222add
653*44bedb31SLionel SambucdsNiceMatch equ 136+zlib1222add
654*44bedb31SLionel Sambuc
655*44bedb31SLionel Sambuc
656*44bedb31SLionel Sambuc;;; match.asm -- Pentium-Pro-optimized version of longest_match()
657*44bedb31SLionel Sambuc;;; Written for zlib 1.1.2
658*44bedb31SLionel Sambuc;;; Copyright (C) 1998 Brian Raiter <breadbox@muppetlabs.com>
659*44bedb31SLionel Sambuc;;; You can look at http://www.muppetlabs.com/~breadbox/software/assembly.html
660*44bedb31SLionel Sambuc;;;
661*44bedb31SLionel Sambuc;;; This is free software; you can redistribute it and/or modify it
662*44bedb31SLionel Sambuc;;; under the terms of the GNU General Public License.
663*44bedb31SLionel Sambuc
664*44bedb31SLionel Sambuc;GLOBAL _longest_match, _match_init
665*44bedb31SLionel Sambuc
666*44bedb31SLionel Sambuc
667*44bedb31SLionel Sambuc;SECTION    .text
668*44bedb31SLionel Sambuc
669*44bedb31SLionel Sambuc;;; uInt longest_match(deflate_state *deflatestate, IPos curmatch)
670*44bedb31SLionel Sambuc
671*44bedb31SLionel Sambuc;_longest_match:
672*44bedb31SLionel SambucIFDEF NOOLDPENTIUMCODE
673*44bedb31SLionel Sambuc    IFDEF NOUNDERLINE
674*44bedb31SLionel Sambuc    longest_match       proc near
675*44bedb31SLionel Sambuc    ELSE
676*44bedb31SLionel Sambuc    _longest_match      proc near
677*44bedb31SLionel Sambuc    ENDIF
678*44bedb31SLionel SambucELSE
679*44bedb31SLionel Sambuc    IFDEF NOUNDERLINE
680*44bedb31SLionel Sambuc    longest_match_686   proc near
681*44bedb31SLionel Sambuc    ELSE
682*44bedb31SLionel Sambuc    _longest_match_686  proc near
683*44bedb31SLionel Sambuc    ENDIF
684*44bedb31SLionel SambucENDIF
685*44bedb31SLionel Sambuc
686*44bedb31SLionel Sambuc;;; Save registers that the compiler may be using, and adjust esp to
687*44bedb31SLionel Sambuc;;; make room for our stack frame.
688*44bedb31SLionel Sambuc
689*44bedb31SLionel Sambuc        push    ebp
690*44bedb31SLionel Sambuc        push    edi
691*44bedb31SLionel Sambuc        push    esi
692*44bedb31SLionel Sambuc        push    ebx
693*44bedb31SLionel Sambuc        sub esp, LocalVarsSize
694*44bedb31SLionel Sambuc
695*44bedb31SLionel Sambuc;;; Retrieve the function arguments. ecx will hold cur_match
696*44bedb31SLionel Sambuc;;; throughout the entire function. edx will hold the pointer to the
697*44bedb31SLionel Sambuc;;; deflate_state structure during the function's setup (before
698*44bedb31SLionel Sambuc;;; entering the main loop.
699*44bedb31SLionel Sambuc
700*44bedb31SLionel Sambuc        mov edx, [deflatestate]
701*44bedb31SLionel Sambuc        mov ecx, [curmatch]
702*44bedb31SLionel Sambuc
703*44bedb31SLionel Sambuc;;; uInt wmask = s->w_mask;
704*44bedb31SLionel Sambuc;;; unsigned chain_length = s->max_chain_length;
705*44bedb31SLionel Sambuc;;; if (s->prev_length >= s->good_match) {
706*44bedb31SLionel Sambuc;;;     chain_length >>= 2;
707*44bedb31SLionel Sambuc;;; }
708*44bedb31SLionel Sambuc
709*44bedb31SLionel Sambuc        mov eax, [edx + dsPrevLen]
710*44bedb31SLionel Sambuc        mov ebx, [edx + dsGoodMatch]
711*44bedb31SLionel Sambuc        cmp eax, ebx
712*44bedb31SLionel Sambuc        mov eax, [edx + dsWMask]
713*44bedb31SLionel Sambuc        mov ebx, [edx + dsMaxChainLen]
714*44bedb31SLionel Sambuc        jl  LastMatchGood
715*44bedb31SLionel Sambuc        shr ebx, 2
716*44bedb31SLionel SambucLastMatchGood:
717*44bedb31SLionel Sambuc
718*44bedb31SLionel Sambuc;;; chainlen is decremented once beforehand so that the function can
719*44bedb31SLionel Sambuc;;; use the sign flag instead of the zero flag for the exit test.
720*44bedb31SLionel Sambuc;;; It is then shifted into the high word, to make room for the wmask
721*44bedb31SLionel Sambuc;;; value, which it will always accompany.
722*44bedb31SLionel Sambuc
723*44bedb31SLionel Sambuc        dec ebx
724*44bedb31SLionel Sambuc        shl ebx, 16
725*44bedb31SLionel Sambuc        or  ebx, eax
726*44bedb31SLionel Sambuc        mov [chainlenwmask], ebx
727*44bedb31SLionel Sambuc
728*44bedb31SLionel Sambuc;;; if ((uInt)nice_match > s->lookahead) nice_match = s->lookahead;
729*44bedb31SLionel Sambuc
730*44bedb31SLionel Sambuc        mov eax, [edx + dsNiceMatch]
731*44bedb31SLionel Sambuc        mov ebx, [edx + dsLookahead]
732*44bedb31SLionel Sambuc        cmp ebx, eax
733*44bedb31SLionel Sambuc        jl  LookaheadLess
734*44bedb31SLionel Sambuc        mov ebx, eax
735*44bedb31SLionel SambucLookaheadLess:  mov [nicematch], ebx
736*44bedb31SLionel Sambuc
737*44bedb31SLionel Sambuc;;; register Bytef *scan = s->window + s->strstart;
738*44bedb31SLionel Sambuc
739*44bedb31SLionel Sambuc        mov esi, [edx + dsWindow]
740*44bedb31SLionel Sambuc        mov [window], esi
741*44bedb31SLionel Sambuc        mov ebp, [edx + dsStrStart]
742*44bedb31SLionel Sambuc        lea edi, [esi + ebp]
743*44bedb31SLionel Sambuc        mov [scan], edi
744*44bedb31SLionel Sambuc
745*44bedb31SLionel Sambuc;;; Determine how many bytes the scan ptr is off from being
746*44bedb31SLionel Sambuc;;; dword-aligned.
747*44bedb31SLionel Sambuc
748*44bedb31SLionel Sambuc        mov eax, edi
749*44bedb31SLionel Sambuc        neg eax
750*44bedb31SLionel Sambuc        and eax, 3
751*44bedb31SLionel Sambuc        mov [scanalign], eax
752*44bedb31SLionel Sambuc
753*44bedb31SLionel Sambuc;;; IPos limit = s->strstart > (IPos)MAX_DIST(s) ?
754*44bedb31SLionel Sambuc;;;     s->strstart - (IPos)MAX_DIST(s) : NIL;
755*44bedb31SLionel Sambuc
756*44bedb31SLionel Sambuc        mov eax, [edx + dsWSize]
757*44bedb31SLionel Sambuc        sub eax, MIN_LOOKAHEAD
758*44bedb31SLionel Sambuc        sub ebp, eax
759*44bedb31SLionel Sambuc        jg  LimitPositive
760*44bedb31SLionel Sambuc        xor ebp, ebp
761*44bedb31SLionel SambucLimitPositive:
762*44bedb31SLionel Sambuc
763*44bedb31SLionel Sambuc;;; int best_len = s->prev_length;
764*44bedb31SLionel Sambuc
765*44bedb31SLionel Sambuc        mov eax, [edx + dsPrevLen]
766*44bedb31SLionel Sambuc        mov [bestlen], eax
767*44bedb31SLionel Sambuc
768*44bedb31SLionel Sambuc;;; Store the sum of s->window + best_len in esi locally, and in esi.
769*44bedb31SLionel Sambuc
770*44bedb31SLionel Sambuc        add esi, eax
771*44bedb31SLionel Sambuc        mov [windowbestlen], esi
772*44bedb31SLionel Sambuc
773*44bedb31SLionel Sambuc;;; register ush scan_start = *(ushf*)scan;
774*44bedb31SLionel Sambuc;;; register ush scan_end   = *(ushf*)(scan+best_len-1);
775*44bedb31SLionel Sambuc;;; Posf *prev = s->prev;
776*44bedb31SLionel Sambuc
777*44bedb31SLionel Sambuc        movzx   ebx, word ptr [edi]
778*44bedb31SLionel Sambuc        mov [scanstart], ebx
779*44bedb31SLionel Sambuc        movzx   ebx, word ptr [edi + eax - 1]
780*44bedb31SLionel Sambuc        mov [scanend], ebx
781*44bedb31SLionel Sambuc        mov edi, [edx + dsPrev]
782*44bedb31SLionel Sambuc
783*44bedb31SLionel Sambuc;;; Jump into the main loop.
784*44bedb31SLionel Sambuc
785*44bedb31SLionel Sambuc        mov edx, [chainlenwmask]
786*44bedb31SLionel Sambuc        jmp short LoopEntry
787*44bedb31SLionel Sambuc
788*44bedb31SLionel Sambucalign 4
789*44bedb31SLionel Sambuc
790*44bedb31SLionel Sambuc;;; do {
791*44bedb31SLionel Sambuc;;;     match = s->window + cur_match;
792*44bedb31SLionel Sambuc;;;     if (*(ushf*)(match+best_len-1) != scan_end ||
793*44bedb31SLionel Sambuc;;;         *(ushf*)match != scan_start) continue;
794*44bedb31SLionel Sambuc;;;     [...]
795*44bedb31SLionel Sambuc;;; } while ((cur_match = prev[cur_match & wmask]) > limit
796*44bedb31SLionel Sambuc;;;          && --chain_length != 0);
797*44bedb31SLionel Sambuc;;;
798*44bedb31SLionel Sambuc;;; Here is the inner loop of the function. The function will spend the
799*44bedb31SLionel Sambuc;;; majority of its time in this loop, and majority of that time will
800*44bedb31SLionel Sambuc;;; be spent in the first ten instructions.
801*44bedb31SLionel Sambuc;;;
802*44bedb31SLionel Sambuc;;; Within this loop:
803*44bedb31SLionel Sambuc;;; ebx = scanend
804*44bedb31SLionel Sambuc;;; ecx = curmatch
805*44bedb31SLionel Sambuc;;; edx = chainlenwmask - i.e., ((chainlen << 16) | wmask)
806*44bedb31SLionel Sambuc;;; esi = windowbestlen - i.e., (window + bestlen)
807*44bedb31SLionel Sambuc;;; edi = prev
808*44bedb31SLionel Sambuc;;; ebp = limit
809*44bedb31SLionel Sambuc
810*44bedb31SLionel SambucLookupLoop:
811*44bedb31SLionel Sambuc        and ecx, edx
812*44bedb31SLionel Sambuc        movzx   ecx, word ptr [edi + ecx*2]
813*44bedb31SLionel Sambuc        cmp ecx, ebp
814*44bedb31SLionel Sambuc        jbe LeaveNow
815*44bedb31SLionel Sambuc        sub edx, 00010000h
816*44bedb31SLionel Sambuc        js  LeaveNow
817*44bedb31SLionel SambucLoopEntry:  movzx   eax, word ptr [esi + ecx - 1]
818*44bedb31SLionel Sambuc        cmp eax, ebx
819*44bedb31SLionel Sambuc        jnz LookupLoop
820*44bedb31SLionel Sambuc        mov eax, [window]
821*44bedb31SLionel Sambuc        movzx   eax, word ptr [eax + ecx]
822*44bedb31SLionel Sambuc        cmp eax, [scanstart]
823*44bedb31SLionel Sambuc        jnz LookupLoop
824*44bedb31SLionel Sambuc
825*44bedb31SLionel Sambuc;;; Store the current value of chainlen.
826*44bedb31SLionel Sambuc
827*44bedb31SLionel Sambuc        mov [chainlenwmask], edx
828*44bedb31SLionel Sambuc
829*44bedb31SLionel Sambuc;;; Point edi to the string under scrutiny, and esi to the string we
830*44bedb31SLionel Sambuc;;; are hoping to match it up with. In actuality, esi and edi are
831*44bedb31SLionel Sambuc;;; both pointed (MAX_MATCH_8 - scanalign) bytes ahead, and edx is
832*44bedb31SLionel Sambuc;;; initialized to -(MAX_MATCH_8 - scanalign).
833*44bedb31SLionel Sambuc
834*44bedb31SLionel Sambuc        mov esi, [window]
835*44bedb31SLionel Sambuc        mov edi, [scan]
836*44bedb31SLionel Sambuc        add esi, ecx
837*44bedb31SLionel Sambuc        mov eax, [scanalign]
838*44bedb31SLionel Sambuc        mov edx, 0fffffef8h; -(MAX_MATCH_8)
839*44bedb31SLionel Sambuc        lea edi, [edi + eax + 0108h] ;MAX_MATCH_8]
840*44bedb31SLionel Sambuc        lea esi, [esi + eax + 0108h] ;MAX_MATCH_8]
841*44bedb31SLionel Sambuc
842*44bedb31SLionel Sambuc;;; Test the strings for equality, 8 bytes at a time. At the end,
843*44bedb31SLionel Sambuc;;; adjust edx so that it is offset to the exact byte that mismatched.
844*44bedb31SLionel Sambuc;;;
845*44bedb31SLionel Sambuc;;; We already know at this point that the first three bytes of the
846*44bedb31SLionel Sambuc;;; strings match each other, and they can be safely passed over before
847*44bedb31SLionel Sambuc;;; starting the compare loop. So what this code does is skip over 0-3
848*44bedb31SLionel Sambuc;;; bytes, as much as necessary in order to dword-align the edi
849*44bedb31SLionel Sambuc;;; pointer. (esi will still be misaligned three times out of four.)
850*44bedb31SLionel Sambuc;;;
851*44bedb31SLionel Sambuc;;; It should be confessed that this loop usually does not represent
852*44bedb31SLionel Sambuc;;; much of the total running time. Replacing it with a more
853*44bedb31SLionel Sambuc;;; straightforward "rep cmpsb" would not drastically degrade
854*44bedb31SLionel Sambuc;;; performance.
855*44bedb31SLionel Sambuc
856*44bedb31SLionel SambucLoopCmps:
857*44bedb31SLionel Sambuc        mov eax, [esi + edx]
858*44bedb31SLionel Sambuc        xor eax, [edi + edx]
859*44bedb31SLionel Sambuc        jnz LeaveLoopCmps
860*44bedb31SLionel Sambuc        mov eax, [esi + edx + 4]
861*44bedb31SLionel Sambuc        xor eax, [edi + edx + 4]
862*44bedb31SLionel Sambuc        jnz LeaveLoopCmps4
863*44bedb31SLionel Sambuc        add edx, 8
864*44bedb31SLionel Sambuc        jnz LoopCmps
865*44bedb31SLionel Sambuc        jmp short LenMaximum
866*44bedb31SLionel SambucLeaveLoopCmps4: add edx, 4
867*44bedb31SLionel SambucLeaveLoopCmps:  test    eax, 0000FFFFh
868*44bedb31SLionel Sambuc        jnz LenLower
869*44bedb31SLionel Sambuc        add edx,  2
870*44bedb31SLionel Sambuc        shr eax, 16
871*44bedb31SLionel SambucLenLower:   sub al, 1
872*44bedb31SLionel Sambuc        adc edx, 0
873*44bedb31SLionel Sambuc
874*44bedb31SLionel Sambuc;;; Calculate the length of the match. If it is longer than MAX_MATCH,
875*44bedb31SLionel Sambuc;;; then automatically accept it as the best possible match and leave.
876*44bedb31SLionel Sambuc
877*44bedb31SLionel Sambuc        lea eax, [edi + edx]
878*44bedb31SLionel Sambuc        mov edi, [scan]
879*44bedb31SLionel Sambuc        sub eax, edi
880*44bedb31SLionel Sambuc        cmp eax, MAX_MATCH
881*44bedb31SLionel Sambuc        jge LenMaximum
882*44bedb31SLionel Sambuc
883*44bedb31SLionel Sambuc;;; If the length of the match is not longer than the best match we
884*44bedb31SLionel Sambuc;;; have so far, then forget it and return to the lookup loop.
885*44bedb31SLionel Sambuc
886*44bedb31SLionel Sambuc        mov edx, [deflatestate]
887*44bedb31SLionel Sambuc        mov ebx, [bestlen]
888*44bedb31SLionel Sambuc        cmp eax, ebx
889*44bedb31SLionel Sambuc        jg  LongerMatch
890*44bedb31SLionel Sambuc        mov esi, [windowbestlen]
891*44bedb31SLionel Sambuc        mov edi, [edx + dsPrev]
892*44bedb31SLionel Sambuc        mov ebx, [scanend]
893*44bedb31SLionel Sambuc        mov edx, [chainlenwmask]
894*44bedb31SLionel Sambuc        jmp LookupLoop
895*44bedb31SLionel Sambuc
896*44bedb31SLionel Sambuc;;;         s->match_start = cur_match;
897*44bedb31SLionel Sambuc;;;         best_len = len;
898*44bedb31SLionel Sambuc;;;         if (len >= nice_match) break;
899*44bedb31SLionel Sambuc;;;         scan_end = *(ushf*)(scan+best_len-1);
900*44bedb31SLionel Sambuc
901*44bedb31SLionel SambucLongerMatch:    mov ebx, [nicematch]
902*44bedb31SLionel Sambuc        mov [bestlen], eax
903*44bedb31SLionel Sambuc        mov [edx + dsMatchStart], ecx
904*44bedb31SLionel Sambuc        cmp eax, ebx
905*44bedb31SLionel Sambuc        jge LeaveNow
906*44bedb31SLionel Sambuc        mov esi, [window]
907*44bedb31SLionel Sambuc        add esi, eax
908*44bedb31SLionel Sambuc        mov [windowbestlen], esi
909*44bedb31SLionel Sambuc        movzx   ebx, word ptr [edi + eax - 1]
910*44bedb31SLionel Sambuc        mov edi, [edx + dsPrev]
911*44bedb31SLionel Sambuc        mov [scanend], ebx
912*44bedb31SLionel Sambuc        mov edx, [chainlenwmask]
913*44bedb31SLionel Sambuc        jmp LookupLoop
914*44bedb31SLionel Sambuc
915*44bedb31SLionel Sambuc;;; Accept the current string, with the maximum possible length.
916*44bedb31SLionel Sambuc
917*44bedb31SLionel SambucLenMaximum: mov edx, [deflatestate]
918*44bedb31SLionel Sambuc        mov dword ptr [bestlen], MAX_MATCH
919*44bedb31SLionel Sambuc        mov [edx + dsMatchStart], ecx
920*44bedb31SLionel Sambuc
921*44bedb31SLionel Sambuc;;; if ((uInt)best_len <= s->lookahead) return (uInt)best_len;
922*44bedb31SLionel Sambuc;;; return s->lookahead;
923*44bedb31SLionel Sambuc
924*44bedb31SLionel SambucLeaveNow:
925*44bedb31SLionel Sambuc        mov edx, [deflatestate]
926*44bedb31SLionel Sambuc        mov ebx, [bestlen]
927*44bedb31SLionel Sambuc        mov eax, [edx + dsLookahead]
928*44bedb31SLionel Sambuc        cmp ebx, eax
929*44bedb31SLionel Sambuc        jg  LookaheadRet
930*44bedb31SLionel Sambuc        mov eax, ebx
931*44bedb31SLionel SambucLookaheadRet:
932*44bedb31SLionel Sambuc
933*44bedb31SLionel Sambuc;;; Restore the stack and return from whence we came.
934*44bedb31SLionel Sambuc
935*44bedb31SLionel Sambuc        add esp, LocalVarsSize
936*44bedb31SLionel Sambuc        pop ebx
937*44bedb31SLionel Sambuc        pop esi
938*44bedb31SLionel Sambuc        pop edi
939*44bedb31SLionel Sambuc        pop ebp
940*44bedb31SLionel Sambuc
941*44bedb31SLionel Sambuc        ret
942*44bedb31SLionel Sambuc; please don't remove this string !
943*44bedb31SLionel Sambuc; Your can freely use gvmat32 in any free or commercial app if you don't remove the string in the binary!
944*44bedb31SLionel Sambuc    db     0dh,0ah,"asm686 with masm, optimised assembly code from Brian Raiter, written 1998",0dh,0ah
945*44bedb31SLionel Sambuc
946*44bedb31SLionel Sambuc
947*44bedb31SLionel SambucIFDEF NOOLDPENTIUMCODE
948*44bedb31SLionel Sambuc    IFDEF NOUNDERLINE
949*44bedb31SLionel Sambuc    longest_match       endp
950*44bedb31SLionel Sambuc    ELSE
951*44bedb31SLionel Sambuc    _longest_match      endp
952*44bedb31SLionel Sambuc    ENDIF
953*44bedb31SLionel Sambuc
954*44bedb31SLionel Sambuc    IFDEF NOUNDERLINE
955*44bedb31SLionel Sambuc    match_init      proc near
956*44bedb31SLionel Sambuc                    ret
957*44bedb31SLionel Sambuc    match_init      endp
958*44bedb31SLionel Sambuc    ELSE
959*44bedb31SLionel Sambuc    _match_init     proc near
960*44bedb31SLionel Sambuc                    ret
961*44bedb31SLionel Sambuc    _match_init     endp
962*44bedb31SLionel Sambuc    ENDIF
963*44bedb31SLionel SambucELSE
964*44bedb31SLionel Sambuc    IFDEF NOUNDERLINE
965*44bedb31SLionel Sambuc    longest_match_686   endp
966*44bedb31SLionel Sambuc    ELSE
967*44bedb31SLionel Sambuc    _longest_match_686  endp
968*44bedb31SLionel Sambuc    ENDIF
969*44bedb31SLionel SambucENDIF
970*44bedb31SLionel Sambuc
971*44bedb31SLionel Sambuc_TEXT   ends
972*44bedb31SLionel Sambucend
973