1*44bedb31SLionel Sambuc; gvmat32.asm -- Asm portion of the optimized longest_match for 32 bits x86 2*44bedb31SLionel Sambuc; Copyright (C) 1995-1996 Jean-loup Gailly and Gilles Vollant. 3*44bedb31SLionel Sambuc; File written by Gilles Vollant, by modifiying the longest_match 4*44bedb31SLionel Sambuc; from Jean-loup Gailly in deflate.c 5*44bedb31SLionel Sambuc; 6*44bedb31SLionel Sambuc; http://www.zlib.net 7*44bedb31SLionel Sambuc; http://www.winimage.com/zLibDll 8*44bedb31SLionel Sambuc; http://www.muppetlabs.com/~breadbox/software/assembly.html 9*44bedb31SLionel Sambuc; 10*44bedb31SLionel Sambuc; For Visual C++ 4.x and higher and ML 6.x and higher 11*44bedb31SLionel Sambuc; ml.exe is in directory \MASM611C of Win95 DDK 12*44bedb31SLionel Sambuc; ml.exe is also distributed in http://www.masm32.com/masmdl.htm 13*44bedb31SLionel Sambuc; and in VC++2003 toolkit at http://msdn.microsoft.com/visualc/vctoolkit2003/ 14*44bedb31SLionel Sambuc; 15*44bedb31SLionel Sambuc; this file contain two implementation of longest_match 16*44bedb31SLionel Sambuc; 17*44bedb31SLionel Sambuc; longest_match_7fff : written 1996 by Gilles Vollant optimized for 18*44bedb31SLionel Sambuc; first Pentium. Assume s->w_mask == 0x7fff 19*44bedb31SLionel Sambuc; longest_match_686 : written by Brian raiter (1998), optimized for Pentium Pro 20*44bedb31SLionel Sambuc; 21*44bedb31SLionel Sambuc; for using an seembly version of longest_match, you need define ASMV in project 22*44bedb31SLionel Sambuc; There is two way in using gvmat32.asm 23*44bedb31SLionel Sambuc; 24*44bedb31SLionel Sambuc; A) Suggested method 25*44bedb31SLionel Sambuc; if you want include both longest_match_7fff and longest_match_686 26*44bedb31SLionel Sambuc; compile the asm file running 27*44bedb31SLionel Sambuc; ml /coff /Zi /Flgvmat32.lst /c gvmat32.asm 28*44bedb31SLionel Sambuc; and include gvmat32c.c in your project 29*44bedb31SLionel Sambuc; if you have an old cpu (386,486 or first Pentium) and s->w_mask==0x7fff, 30*44bedb31SLionel Sambuc; longest_match_7fff will be used 31*44bedb31SLionel Sambuc; if you have a more modern CPU (Pentium Pro, II and higher) 32*44bedb31SLionel Sambuc; longest_match_686 will be used 33*44bedb31SLionel Sambuc; on old cpu with s->w_mask!=0x7fff, longest_match_686 will be used, 34*44bedb31SLionel Sambuc; but this is not a sitation you'll find often 35*44bedb31SLionel Sambuc; 36*44bedb31SLionel Sambuc; B) Alternative 37*44bedb31SLionel Sambuc; if you are not interresed in old cpu performance and want the smaller 38*44bedb31SLionel Sambuc; binaries possible 39*44bedb31SLionel Sambuc; 40*44bedb31SLionel Sambuc; compile the asm file running 41*44bedb31SLionel Sambuc; ml /coff /Zi /c /Flgvmat32.lst /DNOOLDPENTIUMCODE gvmat32.asm 42*44bedb31SLionel Sambuc; and do not include gvmat32c.c in your project (ou define also 43*44bedb31SLionel Sambuc; NOOLDPENTIUMCODE) 44*44bedb31SLionel Sambuc; 45*44bedb31SLionel Sambuc; note : as I known, longest_match_686 is very faster than longest_match_7fff 46*44bedb31SLionel Sambuc; on pentium Pro/II/III, faster (but less) in P4, but it seem 47*44bedb31SLionel Sambuc; longest_match_7fff can be faster (very very litte) on AMD Athlon64/K8 48*44bedb31SLionel Sambuc; 49*44bedb31SLionel Sambuc; see below : zlib1222add must be adjuster if you use a zlib version < 1.2.2.2 50*44bedb31SLionel Sambuc 51*44bedb31SLionel Sambuc;uInt longest_match_7fff(s, cur_match) 52*44bedb31SLionel Sambuc; deflate_state *s; 53*44bedb31SLionel Sambuc; IPos cur_match; /* current match */ 54*44bedb31SLionel Sambuc 55*44bedb31SLionel Sambuc NbStack equ 76 56*44bedb31SLionel Sambuc cur_match equ dword ptr[esp+NbStack-0] 57*44bedb31SLionel Sambuc str_s equ dword ptr[esp+NbStack-4] 58*44bedb31SLionel Sambuc; 5 dword on top (ret,ebp,esi,edi,ebx) 59*44bedb31SLionel Sambuc adrret equ dword ptr[esp+NbStack-8] 60*44bedb31SLionel Sambuc pushebp equ dword ptr[esp+NbStack-12] 61*44bedb31SLionel Sambuc pushedi equ dword ptr[esp+NbStack-16] 62*44bedb31SLionel Sambuc pushesi equ dword ptr[esp+NbStack-20] 63*44bedb31SLionel Sambuc pushebx equ dword ptr[esp+NbStack-24] 64*44bedb31SLionel Sambuc 65*44bedb31SLionel Sambuc chain_length equ dword ptr [esp+NbStack-28] 66*44bedb31SLionel Sambuc limit equ dword ptr [esp+NbStack-32] 67*44bedb31SLionel Sambuc best_len equ dword ptr [esp+NbStack-36] 68*44bedb31SLionel Sambuc window equ dword ptr [esp+NbStack-40] 69*44bedb31SLionel Sambuc prev equ dword ptr [esp+NbStack-44] 70*44bedb31SLionel Sambuc scan_start equ word ptr [esp+NbStack-48] 71*44bedb31SLionel Sambuc wmask equ dword ptr [esp+NbStack-52] 72*44bedb31SLionel Sambuc match_start_ptr equ dword ptr [esp+NbStack-56] 73*44bedb31SLionel Sambuc nice_match equ dword ptr [esp+NbStack-60] 74*44bedb31SLionel Sambuc scan equ dword ptr [esp+NbStack-64] 75*44bedb31SLionel Sambuc 76*44bedb31SLionel Sambuc windowlen equ dword ptr [esp+NbStack-68] 77*44bedb31SLionel Sambuc match_start equ dword ptr [esp+NbStack-72] 78*44bedb31SLionel Sambuc strend equ dword ptr [esp+NbStack-76] 79*44bedb31SLionel Sambuc NbStackAdd equ (NbStack-24) 80*44bedb31SLionel Sambuc 81*44bedb31SLionel Sambuc .386p 82*44bedb31SLionel Sambuc 83*44bedb31SLionel Sambuc name gvmatch 84*44bedb31SLionel Sambuc .MODEL FLAT 85*44bedb31SLionel Sambuc 86*44bedb31SLionel Sambuc 87*44bedb31SLionel Sambuc 88*44bedb31SLionel Sambuc; all the +zlib1222add offsets are due to the addition of fields 89*44bedb31SLionel Sambuc; in zlib in the deflate_state structure since the asm code was first written 90*44bedb31SLionel Sambuc; (if you compile with zlib 1.0.4 or older, use "zlib1222add equ (-4)"). 91*44bedb31SLionel Sambuc; (if you compile with zlib between 1.0.5 and 1.2.2.1, use "zlib1222add equ 0"). 92*44bedb31SLionel Sambuc; if you compile with zlib 1.2.2.2 or later , use "zlib1222add equ 8"). 93*44bedb31SLionel Sambuc 94*44bedb31SLionel Sambuc zlib1222add equ 8 95*44bedb31SLionel Sambuc 96*44bedb31SLionel Sambuc; Note : these value are good with a 8 bytes boundary pack structure 97*44bedb31SLionel Sambuc dep_chain_length equ 74h+zlib1222add 98*44bedb31SLionel Sambuc dep_window equ 30h+zlib1222add 99*44bedb31SLionel Sambuc dep_strstart equ 64h+zlib1222add 100*44bedb31SLionel Sambuc dep_prev_length equ 70h+zlib1222add 101*44bedb31SLionel Sambuc dep_nice_match equ 88h+zlib1222add 102*44bedb31SLionel Sambuc dep_w_size equ 24h+zlib1222add 103*44bedb31SLionel Sambuc dep_prev equ 38h+zlib1222add 104*44bedb31SLionel Sambuc dep_w_mask equ 2ch+zlib1222add 105*44bedb31SLionel Sambuc dep_good_match equ 84h+zlib1222add 106*44bedb31SLionel Sambuc dep_match_start equ 68h+zlib1222add 107*44bedb31SLionel Sambuc dep_lookahead equ 6ch+zlib1222add 108*44bedb31SLionel Sambuc 109*44bedb31SLionel Sambuc 110*44bedb31SLionel Sambuc_TEXT segment 111*44bedb31SLionel Sambuc 112*44bedb31SLionel SambucIFDEF NOUNDERLINE 113*44bedb31SLionel Sambuc IFDEF NOOLDPENTIUMCODE 114*44bedb31SLionel Sambuc public longest_match 115*44bedb31SLionel Sambuc public match_init 116*44bedb31SLionel Sambuc ELSE 117*44bedb31SLionel Sambuc public longest_match_7fff 118*44bedb31SLionel Sambuc public cpudetect32 119*44bedb31SLionel Sambuc public longest_match_686 120*44bedb31SLionel Sambuc ENDIF 121*44bedb31SLionel SambucELSE 122*44bedb31SLionel Sambuc IFDEF NOOLDPENTIUMCODE 123*44bedb31SLionel Sambuc public _longest_match 124*44bedb31SLionel Sambuc public _match_init 125*44bedb31SLionel Sambuc ELSE 126*44bedb31SLionel Sambuc public _longest_match_7fff 127*44bedb31SLionel Sambuc public _cpudetect32 128*44bedb31SLionel Sambuc public _longest_match_686 129*44bedb31SLionel Sambuc ENDIF 130*44bedb31SLionel SambucENDIF 131*44bedb31SLionel Sambuc 132*44bedb31SLionel Sambuc MAX_MATCH equ 258 133*44bedb31SLionel Sambuc MIN_MATCH equ 3 134*44bedb31SLionel Sambuc MIN_LOOKAHEAD equ (MAX_MATCH+MIN_MATCH+1) 135*44bedb31SLionel Sambuc 136*44bedb31SLionel Sambuc 137*44bedb31SLionel Sambuc 138*44bedb31SLionel SambucIFNDEF NOOLDPENTIUMCODE 139*44bedb31SLionel SambucIFDEF NOUNDERLINE 140*44bedb31SLionel Sambuclongest_match_7fff proc near 141*44bedb31SLionel SambucELSE 142*44bedb31SLionel Sambuc_longest_match_7fff proc near 143*44bedb31SLionel SambucENDIF 144*44bedb31SLionel Sambuc 145*44bedb31SLionel Sambuc mov edx,[esp+4] 146*44bedb31SLionel Sambuc 147*44bedb31SLionel Sambuc 148*44bedb31SLionel Sambuc 149*44bedb31SLionel Sambuc push ebp 150*44bedb31SLionel Sambuc push edi 151*44bedb31SLionel Sambuc push esi 152*44bedb31SLionel Sambuc push ebx 153*44bedb31SLionel Sambuc 154*44bedb31SLionel Sambuc sub esp,NbStackAdd 155*44bedb31SLionel Sambuc 156*44bedb31SLionel Sambuc; initialize or check the variables used in match.asm. 157*44bedb31SLionel Sambuc mov ebp,edx 158*44bedb31SLionel Sambuc 159*44bedb31SLionel Sambuc; chain_length = s->max_chain_length 160*44bedb31SLionel Sambuc; if (prev_length>=good_match) chain_length >>= 2 161*44bedb31SLionel Sambuc mov edx,[ebp+dep_chain_length] 162*44bedb31SLionel Sambuc mov ebx,[ebp+dep_prev_length] 163*44bedb31SLionel Sambuc cmp [ebp+dep_good_match],ebx 164*44bedb31SLionel Sambuc ja noshr 165*44bedb31SLionel Sambuc shr edx,2 166*44bedb31SLionel Sambucnoshr: 167*44bedb31SLionel Sambuc; we increment chain_length because in the asm, the --chain_lenght is in the beginning of the loop 168*44bedb31SLionel Sambuc inc edx 169*44bedb31SLionel Sambuc mov edi,[ebp+dep_nice_match] 170*44bedb31SLionel Sambuc mov chain_length,edx 171*44bedb31SLionel Sambuc mov eax,[ebp+dep_lookahead] 172*44bedb31SLionel Sambuc cmp eax,edi 173*44bedb31SLionel Sambuc; if ((uInt)nice_match > s->lookahead) nice_match = s->lookahead; 174*44bedb31SLionel Sambuc jae nolookaheadnicematch 175*44bedb31SLionel Sambuc mov edi,eax 176*44bedb31SLionel Sambucnolookaheadnicematch: 177*44bedb31SLionel Sambuc; best_len = s->prev_length 178*44bedb31SLionel Sambuc mov best_len,ebx 179*44bedb31SLionel Sambuc 180*44bedb31SLionel Sambuc; window = s->window 181*44bedb31SLionel Sambuc mov esi,[ebp+dep_window] 182*44bedb31SLionel Sambuc mov ecx,[ebp+dep_strstart] 183*44bedb31SLionel Sambuc mov window,esi 184*44bedb31SLionel Sambuc 185*44bedb31SLionel Sambuc mov nice_match,edi 186*44bedb31SLionel Sambuc; scan = window + strstart 187*44bedb31SLionel Sambuc add esi,ecx 188*44bedb31SLionel Sambuc mov scan,esi 189*44bedb31SLionel Sambuc; dx = *window 190*44bedb31SLionel Sambuc mov dx,word ptr [esi] 191*44bedb31SLionel Sambuc; bx = *(window+best_len-1) 192*44bedb31SLionel Sambuc mov bx,word ptr [esi+ebx-1] 193*44bedb31SLionel Sambuc add esi,MAX_MATCH-1 194*44bedb31SLionel Sambuc; scan_start = *scan 195*44bedb31SLionel Sambuc mov scan_start,dx 196*44bedb31SLionel Sambuc; strend = scan + MAX_MATCH-1 197*44bedb31SLionel Sambuc mov strend,esi 198*44bedb31SLionel Sambuc; bx = scan_end = *(window+best_len-1) 199*44bedb31SLionel Sambuc 200*44bedb31SLionel Sambuc; IPos limit = s->strstart > (IPos)MAX_DIST(s) ? 201*44bedb31SLionel Sambuc; s->strstart - (IPos)MAX_DIST(s) : NIL; 202*44bedb31SLionel Sambuc 203*44bedb31SLionel Sambuc mov esi,[ebp+dep_w_size] 204*44bedb31SLionel Sambuc sub esi,MIN_LOOKAHEAD 205*44bedb31SLionel Sambuc; here esi = MAX_DIST(s) 206*44bedb31SLionel Sambuc sub ecx,esi 207*44bedb31SLionel Sambuc ja nodist 208*44bedb31SLionel Sambuc xor ecx,ecx 209*44bedb31SLionel Sambucnodist: 210*44bedb31SLionel Sambuc mov limit,ecx 211*44bedb31SLionel Sambuc 212*44bedb31SLionel Sambuc; prev = s->prev 213*44bedb31SLionel Sambuc mov edx,[ebp+dep_prev] 214*44bedb31SLionel Sambuc mov prev,edx 215*44bedb31SLionel Sambuc 216*44bedb31SLionel Sambuc; 217*44bedb31SLionel Sambuc mov edx,dword ptr [ebp+dep_match_start] 218*44bedb31SLionel Sambuc mov bp,scan_start 219*44bedb31SLionel Sambuc mov eax,cur_match 220*44bedb31SLionel Sambuc mov match_start,edx 221*44bedb31SLionel Sambuc 222*44bedb31SLionel Sambuc mov edx,window 223*44bedb31SLionel Sambuc mov edi,edx 224*44bedb31SLionel Sambuc add edi,best_len 225*44bedb31SLionel Sambuc mov esi,prev 226*44bedb31SLionel Sambuc dec edi 227*44bedb31SLionel Sambuc; windowlen = window + best_len -1 228*44bedb31SLionel Sambuc mov windowlen,edi 229*44bedb31SLionel Sambuc 230*44bedb31SLionel Sambuc jmp beginloop2 231*44bedb31SLionel Sambuc align 4 232*44bedb31SLionel Sambuc 233*44bedb31SLionel Sambuc; here, in the loop 234*44bedb31SLionel Sambuc; eax = ax = cur_match 235*44bedb31SLionel Sambuc; ecx = limit 236*44bedb31SLionel Sambuc; bx = scan_end 237*44bedb31SLionel Sambuc; bp = scan_start 238*44bedb31SLionel Sambuc; edi = windowlen (window + best_len -1) 239*44bedb31SLionel Sambuc; esi = prev 240*44bedb31SLionel Sambuc 241*44bedb31SLionel Sambuc 242*44bedb31SLionel Sambuc;// here; chain_length <=16 243*44bedb31SLionel Sambucnormalbeg0add16: 244*44bedb31SLionel Sambuc add chain_length,16 245*44bedb31SLionel Sambuc jz exitloop 246*44bedb31SLionel Sambucnormalbeg0: 247*44bedb31SLionel Sambuc cmp word ptr[edi+eax],bx 248*44bedb31SLionel Sambuc je normalbeg2noroll 249*44bedb31SLionel Sambucrcontlabnoroll: 250*44bedb31SLionel Sambuc; cur_match = prev[cur_match & wmask] 251*44bedb31SLionel Sambuc and eax,7fffh 252*44bedb31SLionel Sambuc mov ax,word ptr[esi+eax*2] 253*44bedb31SLionel Sambuc; if cur_match > limit, go to exitloop 254*44bedb31SLionel Sambuc cmp ecx,eax 255*44bedb31SLionel Sambuc jnb exitloop 256*44bedb31SLionel Sambuc; if --chain_length != 0, go to exitloop 257*44bedb31SLionel Sambuc dec chain_length 258*44bedb31SLionel Sambuc jnz normalbeg0 259*44bedb31SLionel Sambuc jmp exitloop 260*44bedb31SLionel Sambuc 261*44bedb31SLionel Sambucnormalbeg2noroll: 262*44bedb31SLionel Sambuc; if (scan_start==*(cur_match+window)) goto normalbeg2 263*44bedb31SLionel Sambuc cmp bp,word ptr[edx+eax] 264*44bedb31SLionel Sambuc jne rcontlabnoroll 265*44bedb31SLionel Sambuc jmp normalbeg2 266*44bedb31SLionel Sambuc 267*44bedb31SLionel Sambuccontloop3: 268*44bedb31SLionel Sambuc mov edi,windowlen 269*44bedb31SLionel Sambuc 270*44bedb31SLionel Sambuc; cur_match = prev[cur_match & wmask] 271*44bedb31SLionel Sambuc and eax,7fffh 272*44bedb31SLionel Sambuc mov ax,word ptr[esi+eax*2] 273*44bedb31SLionel Sambuc; if cur_match > limit, go to exitloop 274*44bedb31SLionel Sambuc cmp ecx,eax 275*44bedb31SLionel Sambucjnbexitloopshort1: 276*44bedb31SLionel Sambuc jnb exitloop 277*44bedb31SLionel Sambuc; if --chain_length != 0, go to exitloop 278*44bedb31SLionel Sambuc 279*44bedb31SLionel Sambuc 280*44bedb31SLionel Sambuc; begin the main loop 281*44bedb31SLionel Sambucbeginloop2: 282*44bedb31SLionel Sambuc sub chain_length,16+1 283*44bedb31SLionel Sambuc; if chain_length <=16, don't use the unrolled loop 284*44bedb31SLionel Sambuc jna normalbeg0add16 285*44bedb31SLionel Sambuc 286*44bedb31SLionel Sambucdo16: 287*44bedb31SLionel Sambuc cmp word ptr[edi+eax],bx 288*44bedb31SLionel Sambuc je normalbeg2dc0 289*44bedb31SLionel Sambuc 290*44bedb31SLionel Sambucmaccn MACRO lab 291*44bedb31SLionel Sambuc and eax,7fffh 292*44bedb31SLionel Sambuc mov ax,word ptr[esi+eax*2] 293*44bedb31SLionel Sambuc cmp ecx,eax 294*44bedb31SLionel Sambuc jnb exitloop 295*44bedb31SLionel Sambuc cmp word ptr[edi+eax],bx 296*44bedb31SLionel Sambuc je lab 297*44bedb31SLionel Sambuc ENDM 298*44bedb31SLionel Sambuc 299*44bedb31SLionel Sambucrcontloop0: 300*44bedb31SLionel Sambuc maccn normalbeg2dc1 301*44bedb31SLionel Sambuc 302*44bedb31SLionel Sambucrcontloop1: 303*44bedb31SLionel Sambuc maccn normalbeg2dc2 304*44bedb31SLionel Sambuc 305*44bedb31SLionel Sambucrcontloop2: 306*44bedb31SLionel Sambuc maccn normalbeg2dc3 307*44bedb31SLionel Sambuc 308*44bedb31SLionel Sambucrcontloop3: 309*44bedb31SLionel Sambuc maccn normalbeg2dc4 310*44bedb31SLionel Sambuc 311*44bedb31SLionel Sambucrcontloop4: 312*44bedb31SLionel Sambuc maccn normalbeg2dc5 313*44bedb31SLionel Sambuc 314*44bedb31SLionel Sambucrcontloop5: 315*44bedb31SLionel Sambuc maccn normalbeg2dc6 316*44bedb31SLionel Sambuc 317*44bedb31SLionel Sambucrcontloop6: 318*44bedb31SLionel Sambuc maccn normalbeg2dc7 319*44bedb31SLionel Sambuc 320*44bedb31SLionel Sambucrcontloop7: 321*44bedb31SLionel Sambuc maccn normalbeg2dc8 322*44bedb31SLionel Sambuc 323*44bedb31SLionel Sambucrcontloop8: 324*44bedb31SLionel Sambuc maccn normalbeg2dc9 325*44bedb31SLionel Sambuc 326*44bedb31SLionel Sambucrcontloop9: 327*44bedb31SLionel Sambuc maccn normalbeg2dc10 328*44bedb31SLionel Sambuc 329*44bedb31SLionel Sambucrcontloop10: 330*44bedb31SLionel Sambuc maccn short normalbeg2dc11 331*44bedb31SLionel Sambuc 332*44bedb31SLionel Sambucrcontloop11: 333*44bedb31SLionel Sambuc maccn short normalbeg2dc12 334*44bedb31SLionel Sambuc 335*44bedb31SLionel Sambucrcontloop12: 336*44bedb31SLionel Sambuc maccn short normalbeg2dc13 337*44bedb31SLionel Sambuc 338*44bedb31SLionel Sambucrcontloop13: 339*44bedb31SLionel Sambuc maccn short normalbeg2dc14 340*44bedb31SLionel Sambuc 341*44bedb31SLionel Sambucrcontloop14: 342*44bedb31SLionel Sambuc maccn short normalbeg2dc15 343*44bedb31SLionel Sambuc 344*44bedb31SLionel Sambucrcontloop15: 345*44bedb31SLionel Sambuc and eax,7fffh 346*44bedb31SLionel Sambuc mov ax,word ptr[esi+eax*2] 347*44bedb31SLionel Sambuc cmp ecx,eax 348*44bedb31SLionel Sambuc jnb exitloop 349*44bedb31SLionel Sambuc 350*44bedb31SLionel Sambuc sub chain_length,16 351*44bedb31SLionel Sambuc ja do16 352*44bedb31SLionel Sambuc jmp normalbeg0add16 353*44bedb31SLionel Sambuc 354*44bedb31SLionel Sambuc;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 355*44bedb31SLionel Sambuc 356*44bedb31SLionel Sambucnormbeg MACRO rcontlab,valsub 357*44bedb31SLionel Sambuc; if we are here, we know that *(match+best_len-1) == scan_end 358*44bedb31SLionel Sambuc cmp bp,word ptr[edx+eax] 359*44bedb31SLionel Sambuc; if (match != scan_start) goto rcontlab 360*44bedb31SLionel Sambuc jne rcontlab 361*44bedb31SLionel Sambuc; calculate the good chain_length, and we'll compare scan and match string 362*44bedb31SLionel Sambuc add chain_length,16-valsub 363*44bedb31SLionel Sambuc jmp iseq 364*44bedb31SLionel Sambuc ENDM 365*44bedb31SLionel Sambuc 366*44bedb31SLionel Sambuc 367*44bedb31SLionel Sambucnormalbeg2dc11: 368*44bedb31SLionel Sambuc normbeg rcontloop11,11 369*44bedb31SLionel Sambuc 370*44bedb31SLionel Sambucnormalbeg2dc12: 371*44bedb31SLionel Sambuc normbeg short rcontloop12,12 372*44bedb31SLionel Sambuc 373*44bedb31SLionel Sambucnormalbeg2dc13: 374*44bedb31SLionel Sambuc normbeg short rcontloop13,13 375*44bedb31SLionel Sambuc 376*44bedb31SLionel Sambucnormalbeg2dc14: 377*44bedb31SLionel Sambuc normbeg short rcontloop14,14 378*44bedb31SLionel Sambuc 379*44bedb31SLionel Sambucnormalbeg2dc15: 380*44bedb31SLionel Sambuc normbeg short rcontloop15,15 381*44bedb31SLionel Sambuc 382*44bedb31SLionel Sambucnormalbeg2dc10: 383*44bedb31SLionel Sambuc normbeg rcontloop10,10 384*44bedb31SLionel Sambuc 385*44bedb31SLionel Sambucnormalbeg2dc9: 386*44bedb31SLionel Sambuc normbeg rcontloop9,9 387*44bedb31SLionel Sambuc 388*44bedb31SLionel Sambucnormalbeg2dc8: 389*44bedb31SLionel Sambuc normbeg rcontloop8,8 390*44bedb31SLionel Sambuc 391*44bedb31SLionel Sambucnormalbeg2dc7: 392*44bedb31SLionel Sambuc normbeg rcontloop7,7 393*44bedb31SLionel Sambuc 394*44bedb31SLionel Sambucnormalbeg2dc6: 395*44bedb31SLionel Sambuc normbeg rcontloop6,6 396*44bedb31SLionel Sambuc 397*44bedb31SLionel Sambucnormalbeg2dc5: 398*44bedb31SLionel Sambuc normbeg rcontloop5,5 399*44bedb31SLionel Sambuc 400*44bedb31SLionel Sambucnormalbeg2dc4: 401*44bedb31SLionel Sambuc normbeg rcontloop4,4 402*44bedb31SLionel Sambuc 403*44bedb31SLionel Sambucnormalbeg2dc3: 404*44bedb31SLionel Sambuc normbeg rcontloop3,3 405*44bedb31SLionel Sambuc 406*44bedb31SLionel Sambucnormalbeg2dc2: 407*44bedb31SLionel Sambuc normbeg rcontloop2,2 408*44bedb31SLionel Sambuc 409*44bedb31SLionel Sambucnormalbeg2dc1: 410*44bedb31SLionel Sambuc normbeg rcontloop1,1 411*44bedb31SLionel Sambuc 412*44bedb31SLionel Sambucnormalbeg2dc0: 413*44bedb31SLionel Sambuc normbeg rcontloop0,0 414*44bedb31SLionel Sambuc 415*44bedb31SLionel Sambuc 416*44bedb31SLionel Sambuc; we go in normalbeg2 because *(ushf*)(match+best_len-1) == scan_end 417*44bedb31SLionel Sambuc 418*44bedb31SLionel Sambucnormalbeg2: 419*44bedb31SLionel Sambuc mov edi,window 420*44bedb31SLionel Sambuc 421*44bedb31SLionel Sambuc cmp bp,word ptr[edi+eax] 422*44bedb31SLionel Sambuc jne contloop3 ; if *(ushf*)match != scan_start, continue 423*44bedb31SLionel Sambuc 424*44bedb31SLionel Sambuciseq: 425*44bedb31SLionel Sambuc; if we are here, we know that *(match+best_len-1) == scan_end 426*44bedb31SLionel Sambuc; and (match == scan_start) 427*44bedb31SLionel Sambuc 428*44bedb31SLionel Sambuc mov edi,edx 429*44bedb31SLionel Sambuc mov esi,scan ; esi = scan 430*44bedb31SLionel Sambuc add edi,eax ; edi = window + cur_match = match 431*44bedb31SLionel Sambuc 432*44bedb31SLionel Sambuc mov edx,[esi+3] ; compare manually dword at match+3 433*44bedb31SLionel Sambuc xor edx,[edi+3] ; and scan +3 434*44bedb31SLionel Sambuc 435*44bedb31SLionel Sambuc jz begincompare ; if equal, go to long compare 436*44bedb31SLionel Sambuc 437*44bedb31SLionel Sambuc; we will determine the unmatch byte and calculate len (in esi) 438*44bedb31SLionel Sambuc or dl,dl 439*44bedb31SLionel Sambuc je eq1rr 440*44bedb31SLionel Sambuc mov esi,3 441*44bedb31SLionel Sambuc jmp trfinval 442*44bedb31SLionel Sambuceq1rr: 443*44bedb31SLionel Sambuc or dx,dx 444*44bedb31SLionel Sambuc je eq1 445*44bedb31SLionel Sambuc 446*44bedb31SLionel Sambuc mov esi,4 447*44bedb31SLionel Sambuc jmp trfinval 448*44bedb31SLionel Sambuceq1: 449*44bedb31SLionel Sambuc and edx,0ffffffh 450*44bedb31SLionel Sambuc jz eq11 451*44bedb31SLionel Sambuc mov esi,5 452*44bedb31SLionel Sambuc jmp trfinval 453*44bedb31SLionel Sambuceq11: 454*44bedb31SLionel Sambuc mov esi,6 455*44bedb31SLionel Sambuc jmp trfinval 456*44bedb31SLionel Sambuc 457*44bedb31SLionel Sambucbegincompare: 458*44bedb31SLionel Sambuc ; here we now scan and match begin same 459*44bedb31SLionel Sambuc add edi,6 460*44bedb31SLionel Sambuc add esi,6 461*44bedb31SLionel Sambuc mov ecx,(MAX_MATCH-(2+4))/4 ; scan for at most MAX_MATCH bytes 462*44bedb31SLionel Sambuc repe cmpsd ; loop until mismatch 463*44bedb31SLionel Sambuc 464*44bedb31SLionel Sambuc je trfin ; go to trfin if not unmatch 465*44bedb31SLionel Sambuc; we determine the unmatch byte 466*44bedb31SLionel Sambuc sub esi,4 467*44bedb31SLionel Sambuc mov edx,[edi-4] 468*44bedb31SLionel Sambuc xor edx,[esi] 469*44bedb31SLionel Sambuc 470*44bedb31SLionel Sambuc or dl,dl 471*44bedb31SLionel Sambuc jnz trfin 472*44bedb31SLionel Sambuc inc esi 473*44bedb31SLionel Sambuc 474*44bedb31SLionel Sambuc or dx,dx 475*44bedb31SLionel Sambuc jnz trfin 476*44bedb31SLionel Sambuc inc esi 477*44bedb31SLionel Sambuc 478*44bedb31SLionel Sambuc and edx,0ffffffh 479*44bedb31SLionel Sambuc jnz trfin 480*44bedb31SLionel Sambuc inc esi 481*44bedb31SLionel Sambuc 482*44bedb31SLionel Sambuctrfin: 483*44bedb31SLionel Sambuc sub esi,scan ; esi = len 484*44bedb31SLionel Sambuctrfinval: 485*44bedb31SLionel Sambuc; here we have finised compare, and esi contain len of equal string 486*44bedb31SLionel Sambuc cmp esi,best_len ; if len > best_len, go newbestlen 487*44bedb31SLionel Sambuc ja short newbestlen 488*44bedb31SLionel Sambuc; now we restore edx, ecx and esi, for the big loop 489*44bedb31SLionel Sambuc mov esi,prev 490*44bedb31SLionel Sambuc mov ecx,limit 491*44bedb31SLionel Sambuc mov edx,window 492*44bedb31SLionel Sambuc jmp contloop3 493*44bedb31SLionel Sambuc 494*44bedb31SLionel Sambucnewbestlen: 495*44bedb31SLionel Sambuc mov best_len,esi ; len become best_len 496*44bedb31SLionel Sambuc 497*44bedb31SLionel Sambuc mov match_start,eax ; save new position as match_start 498*44bedb31SLionel Sambuc cmp esi,nice_match ; if best_len >= nice_match, exit 499*44bedb31SLionel Sambuc jae exitloop 500*44bedb31SLionel Sambuc mov ecx,scan 501*44bedb31SLionel Sambuc mov edx,window ; restore edx=window 502*44bedb31SLionel Sambuc add ecx,esi 503*44bedb31SLionel Sambuc add esi,edx 504*44bedb31SLionel Sambuc 505*44bedb31SLionel Sambuc dec esi 506*44bedb31SLionel Sambuc mov windowlen,esi ; windowlen = window + best_len-1 507*44bedb31SLionel Sambuc mov bx,[ecx-1] ; bx = *(scan+best_len-1) = scan_end 508*44bedb31SLionel Sambuc 509*44bedb31SLionel Sambuc; now we restore ecx and esi, for the big loop : 510*44bedb31SLionel Sambuc mov esi,prev 511*44bedb31SLionel Sambuc mov ecx,limit 512*44bedb31SLionel Sambuc jmp contloop3 513*44bedb31SLionel Sambuc 514*44bedb31SLionel Sambucexitloop: 515*44bedb31SLionel Sambuc; exit : s->match_start=match_start 516*44bedb31SLionel Sambuc mov ebx,match_start 517*44bedb31SLionel Sambuc mov ebp,str_s 518*44bedb31SLionel Sambuc mov ecx,best_len 519*44bedb31SLionel Sambuc mov dword ptr [ebp+dep_match_start],ebx 520*44bedb31SLionel Sambuc mov eax,dword ptr [ebp+dep_lookahead] 521*44bedb31SLionel Sambuc cmp ecx,eax 522*44bedb31SLionel Sambuc ja minexlo 523*44bedb31SLionel Sambuc mov eax,ecx 524*44bedb31SLionel Sambucminexlo: 525*44bedb31SLionel Sambuc; return min(best_len,s->lookahead) 526*44bedb31SLionel Sambuc 527*44bedb31SLionel Sambuc; restore stack and register ebx,esi,edi,ebp 528*44bedb31SLionel Sambuc add esp,NbStackAdd 529*44bedb31SLionel Sambuc 530*44bedb31SLionel Sambuc pop ebx 531*44bedb31SLionel Sambuc pop esi 532*44bedb31SLionel Sambuc pop edi 533*44bedb31SLionel Sambuc pop ebp 534*44bedb31SLionel Sambuc ret 535*44bedb31SLionel SambucInfoAuthor: 536*44bedb31SLionel Sambuc; please don't remove this string ! 537*44bedb31SLionel Sambuc; Your are free use gvmat32 in any fre or commercial apps if you don't remove the string in the binary! 538*44bedb31SLionel Sambuc db 0dh,0ah,"GVMat32 optimised assembly code written 1996-98 by Gilles Vollant",0dh,0ah 539*44bedb31SLionel Sambuc 540*44bedb31SLionel Sambuc 541*44bedb31SLionel Sambuc 542*44bedb31SLionel SambucIFDEF NOUNDERLINE 543*44bedb31SLionel Sambuclongest_match_7fff endp 544*44bedb31SLionel SambucELSE 545*44bedb31SLionel Sambuc_longest_match_7fff endp 546*44bedb31SLionel SambucENDIF 547*44bedb31SLionel Sambuc 548*44bedb31SLionel Sambuc 549*44bedb31SLionel SambucIFDEF NOUNDERLINE 550*44bedb31SLionel Sambuccpudetect32 proc near 551*44bedb31SLionel SambucELSE 552*44bedb31SLionel Sambuc_cpudetect32 proc near 553*44bedb31SLionel SambucENDIF 554*44bedb31SLionel Sambuc 555*44bedb31SLionel Sambuc push ebx 556*44bedb31SLionel Sambuc 557*44bedb31SLionel Sambuc pushfd ; push original EFLAGS 558*44bedb31SLionel Sambuc pop eax ; get original EFLAGS 559*44bedb31SLionel Sambuc mov ecx, eax ; save original EFLAGS 560*44bedb31SLionel Sambuc xor eax, 40000h ; flip AC bit in EFLAGS 561*44bedb31SLionel Sambuc push eax ; save new EFLAGS value on stack 562*44bedb31SLionel Sambuc popfd ; replace current EFLAGS value 563*44bedb31SLionel Sambuc pushfd ; get new EFLAGS 564*44bedb31SLionel Sambuc pop eax ; store new EFLAGS in EAX 565*44bedb31SLionel Sambuc xor eax, ecx ; can�t toggle AC bit, processor=80386 566*44bedb31SLionel Sambuc jz end_cpu_is_386 ; jump if 80386 processor 567*44bedb31SLionel Sambuc push ecx 568*44bedb31SLionel Sambuc popfd ; restore AC bit in EFLAGS first 569*44bedb31SLionel Sambuc 570*44bedb31SLionel Sambuc pushfd 571*44bedb31SLionel Sambuc pushfd 572*44bedb31SLionel Sambuc pop ecx 573*44bedb31SLionel Sambuc 574*44bedb31SLionel Sambuc mov eax, ecx ; get original EFLAGS 575*44bedb31SLionel Sambuc xor eax, 200000h ; flip ID bit in EFLAGS 576*44bedb31SLionel Sambuc push eax ; save new EFLAGS value on stack 577*44bedb31SLionel Sambuc popfd ; replace current EFLAGS value 578*44bedb31SLionel Sambuc pushfd ; get new EFLAGS 579*44bedb31SLionel Sambuc pop eax ; store new EFLAGS in EAX 580*44bedb31SLionel Sambuc popfd ; restore original EFLAGS 581*44bedb31SLionel Sambuc xor eax, ecx ; can�t toggle ID bit, 582*44bedb31SLionel Sambuc je is_old_486 ; processor=old 583*44bedb31SLionel Sambuc 584*44bedb31SLionel Sambuc mov eax,1 585*44bedb31SLionel Sambuc db 0fh,0a2h ;CPUID 586*44bedb31SLionel Sambuc 587*44bedb31SLionel Sambucexitcpudetect: 588*44bedb31SLionel Sambuc pop ebx 589*44bedb31SLionel Sambuc ret 590*44bedb31SLionel Sambuc 591*44bedb31SLionel Sambucend_cpu_is_386: 592*44bedb31SLionel Sambuc mov eax,0300h 593*44bedb31SLionel Sambuc jmp exitcpudetect 594*44bedb31SLionel Sambuc 595*44bedb31SLionel Sambucis_old_486: 596*44bedb31SLionel Sambuc mov eax,0400h 597*44bedb31SLionel Sambuc jmp exitcpudetect 598*44bedb31SLionel Sambuc 599*44bedb31SLionel SambucIFDEF NOUNDERLINE 600*44bedb31SLionel Sambuccpudetect32 endp 601*44bedb31SLionel SambucELSE 602*44bedb31SLionel Sambuc_cpudetect32 endp 603*44bedb31SLionel SambucENDIF 604*44bedb31SLionel SambucENDIF 605*44bedb31SLionel Sambuc 606*44bedb31SLionel SambucMAX_MATCH equ 258 607*44bedb31SLionel SambucMIN_MATCH equ 3 608*44bedb31SLionel SambucMIN_LOOKAHEAD equ (MAX_MATCH + MIN_MATCH + 1) 609*44bedb31SLionel SambucMAX_MATCH_8_ equ ((MAX_MATCH + 7) AND 0FFF0h) 610*44bedb31SLionel Sambuc 611*44bedb31SLionel Sambuc 612*44bedb31SLionel Sambuc;;; stack frame offsets 613*44bedb31SLionel Sambuc 614*44bedb31SLionel Sambucchainlenwmask equ esp + 0 ; high word: current chain len 615*44bedb31SLionel Sambuc ; low word: s->wmask 616*44bedb31SLionel Sambucwindow equ esp + 4 ; local copy of s->window 617*44bedb31SLionel Sambucwindowbestlen equ esp + 8 ; s->window + bestlen 618*44bedb31SLionel Sambucscanstart equ esp + 16 ; first two bytes of string 619*44bedb31SLionel Sambucscanend equ esp + 12 ; last two bytes of string 620*44bedb31SLionel Sambucscanalign equ esp + 20 ; dword-misalignment of string 621*44bedb31SLionel Sambucnicematch equ esp + 24 ; a good enough match size 622*44bedb31SLionel Sambucbestlen equ esp + 28 ; size of best match so far 623*44bedb31SLionel Sambucscan equ esp + 32 ; ptr to string wanting match 624*44bedb31SLionel Sambuc 625*44bedb31SLionel SambucLocalVarsSize equ 36 626*44bedb31SLionel Sambuc; saved ebx byte esp + 36 627*44bedb31SLionel Sambuc; saved edi byte esp + 40 628*44bedb31SLionel Sambuc; saved esi byte esp + 44 629*44bedb31SLionel Sambuc; saved ebp byte esp + 48 630*44bedb31SLionel Sambuc; return address byte esp + 52 631*44bedb31SLionel Sambucdeflatestate equ esp + 56 ; the function arguments 632*44bedb31SLionel Sambuccurmatch equ esp + 60 633*44bedb31SLionel Sambuc 634*44bedb31SLionel Sambuc;;; Offsets for fields in the deflate_state structure. These numbers 635*44bedb31SLionel Sambuc;;; are calculated from the definition of deflate_state, with the 636*44bedb31SLionel Sambuc;;; assumption that the compiler will dword-align the fields. (Thus, 637*44bedb31SLionel Sambuc;;; changing the definition of deflate_state could easily cause this 638*44bedb31SLionel Sambuc;;; program to crash horribly, without so much as a warning at 639*44bedb31SLionel Sambuc;;; compile time. Sigh.) 640*44bedb31SLionel Sambuc 641*44bedb31SLionel SambucdsWSize equ 36+zlib1222add 642*44bedb31SLionel SambucdsWMask equ 44+zlib1222add 643*44bedb31SLionel SambucdsWindow equ 48+zlib1222add 644*44bedb31SLionel SambucdsPrev equ 56+zlib1222add 645*44bedb31SLionel SambucdsMatchLen equ 88+zlib1222add 646*44bedb31SLionel SambucdsPrevMatch equ 92+zlib1222add 647*44bedb31SLionel SambucdsStrStart equ 100+zlib1222add 648*44bedb31SLionel SambucdsMatchStart equ 104+zlib1222add 649*44bedb31SLionel SambucdsLookahead equ 108+zlib1222add 650*44bedb31SLionel SambucdsPrevLen equ 112+zlib1222add 651*44bedb31SLionel SambucdsMaxChainLen equ 116+zlib1222add 652*44bedb31SLionel SambucdsGoodMatch equ 132+zlib1222add 653*44bedb31SLionel SambucdsNiceMatch equ 136+zlib1222add 654*44bedb31SLionel Sambuc 655*44bedb31SLionel Sambuc 656*44bedb31SLionel Sambuc;;; match.asm -- Pentium-Pro-optimized version of longest_match() 657*44bedb31SLionel Sambuc;;; Written for zlib 1.1.2 658*44bedb31SLionel Sambuc;;; Copyright (C) 1998 Brian Raiter <breadbox@muppetlabs.com> 659*44bedb31SLionel Sambuc;;; You can look at http://www.muppetlabs.com/~breadbox/software/assembly.html 660*44bedb31SLionel Sambuc;;; 661*44bedb31SLionel Sambuc;;; This is free software; you can redistribute it and/or modify it 662*44bedb31SLionel Sambuc;;; under the terms of the GNU General Public License. 663*44bedb31SLionel Sambuc 664*44bedb31SLionel Sambuc;GLOBAL _longest_match, _match_init 665*44bedb31SLionel Sambuc 666*44bedb31SLionel Sambuc 667*44bedb31SLionel Sambuc;SECTION .text 668*44bedb31SLionel Sambuc 669*44bedb31SLionel Sambuc;;; uInt longest_match(deflate_state *deflatestate, IPos curmatch) 670*44bedb31SLionel Sambuc 671*44bedb31SLionel Sambuc;_longest_match: 672*44bedb31SLionel SambucIFDEF NOOLDPENTIUMCODE 673*44bedb31SLionel Sambuc IFDEF NOUNDERLINE 674*44bedb31SLionel Sambuc longest_match proc near 675*44bedb31SLionel Sambuc ELSE 676*44bedb31SLionel Sambuc _longest_match proc near 677*44bedb31SLionel Sambuc ENDIF 678*44bedb31SLionel SambucELSE 679*44bedb31SLionel Sambuc IFDEF NOUNDERLINE 680*44bedb31SLionel Sambuc longest_match_686 proc near 681*44bedb31SLionel Sambuc ELSE 682*44bedb31SLionel Sambuc _longest_match_686 proc near 683*44bedb31SLionel Sambuc ENDIF 684*44bedb31SLionel SambucENDIF 685*44bedb31SLionel Sambuc 686*44bedb31SLionel Sambuc;;; Save registers that the compiler may be using, and adjust esp to 687*44bedb31SLionel Sambuc;;; make room for our stack frame. 688*44bedb31SLionel Sambuc 689*44bedb31SLionel Sambuc push ebp 690*44bedb31SLionel Sambuc push edi 691*44bedb31SLionel Sambuc push esi 692*44bedb31SLionel Sambuc push ebx 693*44bedb31SLionel Sambuc sub esp, LocalVarsSize 694*44bedb31SLionel Sambuc 695*44bedb31SLionel Sambuc;;; Retrieve the function arguments. ecx will hold cur_match 696*44bedb31SLionel Sambuc;;; throughout the entire function. edx will hold the pointer to the 697*44bedb31SLionel Sambuc;;; deflate_state structure during the function's setup (before 698*44bedb31SLionel Sambuc;;; entering the main loop. 699*44bedb31SLionel Sambuc 700*44bedb31SLionel Sambuc mov edx, [deflatestate] 701*44bedb31SLionel Sambuc mov ecx, [curmatch] 702*44bedb31SLionel Sambuc 703*44bedb31SLionel Sambuc;;; uInt wmask = s->w_mask; 704*44bedb31SLionel Sambuc;;; unsigned chain_length = s->max_chain_length; 705*44bedb31SLionel Sambuc;;; if (s->prev_length >= s->good_match) { 706*44bedb31SLionel Sambuc;;; chain_length >>= 2; 707*44bedb31SLionel Sambuc;;; } 708*44bedb31SLionel Sambuc 709*44bedb31SLionel Sambuc mov eax, [edx + dsPrevLen] 710*44bedb31SLionel Sambuc mov ebx, [edx + dsGoodMatch] 711*44bedb31SLionel Sambuc cmp eax, ebx 712*44bedb31SLionel Sambuc mov eax, [edx + dsWMask] 713*44bedb31SLionel Sambuc mov ebx, [edx + dsMaxChainLen] 714*44bedb31SLionel Sambuc jl LastMatchGood 715*44bedb31SLionel Sambuc shr ebx, 2 716*44bedb31SLionel SambucLastMatchGood: 717*44bedb31SLionel Sambuc 718*44bedb31SLionel Sambuc;;; chainlen is decremented once beforehand so that the function can 719*44bedb31SLionel Sambuc;;; use the sign flag instead of the zero flag for the exit test. 720*44bedb31SLionel Sambuc;;; It is then shifted into the high word, to make room for the wmask 721*44bedb31SLionel Sambuc;;; value, which it will always accompany. 722*44bedb31SLionel Sambuc 723*44bedb31SLionel Sambuc dec ebx 724*44bedb31SLionel Sambuc shl ebx, 16 725*44bedb31SLionel Sambuc or ebx, eax 726*44bedb31SLionel Sambuc mov [chainlenwmask], ebx 727*44bedb31SLionel Sambuc 728*44bedb31SLionel Sambuc;;; if ((uInt)nice_match > s->lookahead) nice_match = s->lookahead; 729*44bedb31SLionel Sambuc 730*44bedb31SLionel Sambuc mov eax, [edx + dsNiceMatch] 731*44bedb31SLionel Sambuc mov ebx, [edx + dsLookahead] 732*44bedb31SLionel Sambuc cmp ebx, eax 733*44bedb31SLionel Sambuc jl LookaheadLess 734*44bedb31SLionel Sambuc mov ebx, eax 735*44bedb31SLionel SambucLookaheadLess: mov [nicematch], ebx 736*44bedb31SLionel Sambuc 737*44bedb31SLionel Sambuc;;; register Bytef *scan = s->window + s->strstart; 738*44bedb31SLionel Sambuc 739*44bedb31SLionel Sambuc mov esi, [edx + dsWindow] 740*44bedb31SLionel Sambuc mov [window], esi 741*44bedb31SLionel Sambuc mov ebp, [edx + dsStrStart] 742*44bedb31SLionel Sambuc lea edi, [esi + ebp] 743*44bedb31SLionel Sambuc mov [scan], edi 744*44bedb31SLionel Sambuc 745*44bedb31SLionel Sambuc;;; Determine how many bytes the scan ptr is off from being 746*44bedb31SLionel Sambuc;;; dword-aligned. 747*44bedb31SLionel Sambuc 748*44bedb31SLionel Sambuc mov eax, edi 749*44bedb31SLionel Sambuc neg eax 750*44bedb31SLionel Sambuc and eax, 3 751*44bedb31SLionel Sambuc mov [scanalign], eax 752*44bedb31SLionel Sambuc 753*44bedb31SLionel Sambuc;;; IPos limit = s->strstart > (IPos)MAX_DIST(s) ? 754*44bedb31SLionel Sambuc;;; s->strstart - (IPos)MAX_DIST(s) : NIL; 755*44bedb31SLionel Sambuc 756*44bedb31SLionel Sambuc mov eax, [edx + dsWSize] 757*44bedb31SLionel Sambuc sub eax, MIN_LOOKAHEAD 758*44bedb31SLionel Sambuc sub ebp, eax 759*44bedb31SLionel Sambuc jg LimitPositive 760*44bedb31SLionel Sambuc xor ebp, ebp 761*44bedb31SLionel SambucLimitPositive: 762*44bedb31SLionel Sambuc 763*44bedb31SLionel Sambuc;;; int best_len = s->prev_length; 764*44bedb31SLionel Sambuc 765*44bedb31SLionel Sambuc mov eax, [edx + dsPrevLen] 766*44bedb31SLionel Sambuc mov [bestlen], eax 767*44bedb31SLionel Sambuc 768*44bedb31SLionel Sambuc;;; Store the sum of s->window + best_len in esi locally, and in esi. 769*44bedb31SLionel Sambuc 770*44bedb31SLionel Sambuc add esi, eax 771*44bedb31SLionel Sambuc mov [windowbestlen], esi 772*44bedb31SLionel Sambuc 773*44bedb31SLionel Sambuc;;; register ush scan_start = *(ushf*)scan; 774*44bedb31SLionel Sambuc;;; register ush scan_end = *(ushf*)(scan+best_len-1); 775*44bedb31SLionel Sambuc;;; Posf *prev = s->prev; 776*44bedb31SLionel Sambuc 777*44bedb31SLionel Sambuc movzx ebx, word ptr [edi] 778*44bedb31SLionel Sambuc mov [scanstart], ebx 779*44bedb31SLionel Sambuc movzx ebx, word ptr [edi + eax - 1] 780*44bedb31SLionel Sambuc mov [scanend], ebx 781*44bedb31SLionel Sambuc mov edi, [edx + dsPrev] 782*44bedb31SLionel Sambuc 783*44bedb31SLionel Sambuc;;; Jump into the main loop. 784*44bedb31SLionel Sambuc 785*44bedb31SLionel Sambuc mov edx, [chainlenwmask] 786*44bedb31SLionel Sambuc jmp short LoopEntry 787*44bedb31SLionel Sambuc 788*44bedb31SLionel Sambucalign 4 789*44bedb31SLionel Sambuc 790*44bedb31SLionel Sambuc;;; do { 791*44bedb31SLionel Sambuc;;; match = s->window + cur_match; 792*44bedb31SLionel Sambuc;;; if (*(ushf*)(match+best_len-1) != scan_end || 793*44bedb31SLionel Sambuc;;; *(ushf*)match != scan_start) continue; 794*44bedb31SLionel Sambuc;;; [...] 795*44bedb31SLionel Sambuc;;; } while ((cur_match = prev[cur_match & wmask]) > limit 796*44bedb31SLionel Sambuc;;; && --chain_length != 0); 797*44bedb31SLionel Sambuc;;; 798*44bedb31SLionel Sambuc;;; Here is the inner loop of the function. The function will spend the 799*44bedb31SLionel Sambuc;;; majority of its time in this loop, and majority of that time will 800*44bedb31SLionel Sambuc;;; be spent in the first ten instructions. 801*44bedb31SLionel Sambuc;;; 802*44bedb31SLionel Sambuc;;; Within this loop: 803*44bedb31SLionel Sambuc;;; ebx = scanend 804*44bedb31SLionel Sambuc;;; ecx = curmatch 805*44bedb31SLionel Sambuc;;; edx = chainlenwmask - i.e., ((chainlen << 16) | wmask) 806*44bedb31SLionel Sambuc;;; esi = windowbestlen - i.e., (window + bestlen) 807*44bedb31SLionel Sambuc;;; edi = prev 808*44bedb31SLionel Sambuc;;; ebp = limit 809*44bedb31SLionel Sambuc 810*44bedb31SLionel SambucLookupLoop: 811*44bedb31SLionel Sambuc and ecx, edx 812*44bedb31SLionel Sambuc movzx ecx, word ptr [edi + ecx*2] 813*44bedb31SLionel Sambuc cmp ecx, ebp 814*44bedb31SLionel Sambuc jbe LeaveNow 815*44bedb31SLionel Sambuc sub edx, 00010000h 816*44bedb31SLionel Sambuc js LeaveNow 817*44bedb31SLionel SambucLoopEntry: movzx eax, word ptr [esi + ecx - 1] 818*44bedb31SLionel Sambuc cmp eax, ebx 819*44bedb31SLionel Sambuc jnz LookupLoop 820*44bedb31SLionel Sambuc mov eax, [window] 821*44bedb31SLionel Sambuc movzx eax, word ptr [eax + ecx] 822*44bedb31SLionel Sambuc cmp eax, [scanstart] 823*44bedb31SLionel Sambuc jnz LookupLoop 824*44bedb31SLionel Sambuc 825*44bedb31SLionel Sambuc;;; Store the current value of chainlen. 826*44bedb31SLionel Sambuc 827*44bedb31SLionel Sambuc mov [chainlenwmask], edx 828*44bedb31SLionel Sambuc 829*44bedb31SLionel Sambuc;;; Point edi to the string under scrutiny, and esi to the string we 830*44bedb31SLionel Sambuc;;; are hoping to match it up with. In actuality, esi and edi are 831*44bedb31SLionel Sambuc;;; both pointed (MAX_MATCH_8 - scanalign) bytes ahead, and edx is 832*44bedb31SLionel Sambuc;;; initialized to -(MAX_MATCH_8 - scanalign). 833*44bedb31SLionel Sambuc 834*44bedb31SLionel Sambuc mov esi, [window] 835*44bedb31SLionel Sambuc mov edi, [scan] 836*44bedb31SLionel Sambuc add esi, ecx 837*44bedb31SLionel Sambuc mov eax, [scanalign] 838*44bedb31SLionel Sambuc mov edx, 0fffffef8h; -(MAX_MATCH_8) 839*44bedb31SLionel Sambuc lea edi, [edi + eax + 0108h] ;MAX_MATCH_8] 840*44bedb31SLionel Sambuc lea esi, [esi + eax + 0108h] ;MAX_MATCH_8] 841*44bedb31SLionel Sambuc 842*44bedb31SLionel Sambuc;;; Test the strings for equality, 8 bytes at a time. At the end, 843*44bedb31SLionel Sambuc;;; adjust edx so that it is offset to the exact byte that mismatched. 844*44bedb31SLionel Sambuc;;; 845*44bedb31SLionel Sambuc;;; We already know at this point that the first three bytes of the 846*44bedb31SLionel Sambuc;;; strings match each other, and they can be safely passed over before 847*44bedb31SLionel Sambuc;;; starting the compare loop. So what this code does is skip over 0-3 848*44bedb31SLionel Sambuc;;; bytes, as much as necessary in order to dword-align the edi 849*44bedb31SLionel Sambuc;;; pointer. (esi will still be misaligned three times out of four.) 850*44bedb31SLionel Sambuc;;; 851*44bedb31SLionel Sambuc;;; It should be confessed that this loop usually does not represent 852*44bedb31SLionel Sambuc;;; much of the total running time. Replacing it with a more 853*44bedb31SLionel Sambuc;;; straightforward "rep cmpsb" would not drastically degrade 854*44bedb31SLionel Sambuc;;; performance. 855*44bedb31SLionel Sambuc 856*44bedb31SLionel SambucLoopCmps: 857*44bedb31SLionel Sambuc mov eax, [esi + edx] 858*44bedb31SLionel Sambuc xor eax, [edi + edx] 859*44bedb31SLionel Sambuc jnz LeaveLoopCmps 860*44bedb31SLionel Sambuc mov eax, [esi + edx + 4] 861*44bedb31SLionel Sambuc xor eax, [edi + edx + 4] 862*44bedb31SLionel Sambuc jnz LeaveLoopCmps4 863*44bedb31SLionel Sambuc add edx, 8 864*44bedb31SLionel Sambuc jnz LoopCmps 865*44bedb31SLionel Sambuc jmp short LenMaximum 866*44bedb31SLionel SambucLeaveLoopCmps4: add edx, 4 867*44bedb31SLionel SambucLeaveLoopCmps: test eax, 0000FFFFh 868*44bedb31SLionel Sambuc jnz LenLower 869*44bedb31SLionel Sambuc add edx, 2 870*44bedb31SLionel Sambuc shr eax, 16 871*44bedb31SLionel SambucLenLower: sub al, 1 872*44bedb31SLionel Sambuc adc edx, 0 873*44bedb31SLionel Sambuc 874*44bedb31SLionel Sambuc;;; Calculate the length of the match. If it is longer than MAX_MATCH, 875*44bedb31SLionel Sambuc;;; then automatically accept it as the best possible match and leave. 876*44bedb31SLionel Sambuc 877*44bedb31SLionel Sambuc lea eax, [edi + edx] 878*44bedb31SLionel Sambuc mov edi, [scan] 879*44bedb31SLionel Sambuc sub eax, edi 880*44bedb31SLionel Sambuc cmp eax, MAX_MATCH 881*44bedb31SLionel Sambuc jge LenMaximum 882*44bedb31SLionel Sambuc 883*44bedb31SLionel Sambuc;;; If the length of the match is not longer than the best match we 884*44bedb31SLionel Sambuc;;; have so far, then forget it and return to the lookup loop. 885*44bedb31SLionel Sambuc 886*44bedb31SLionel Sambuc mov edx, [deflatestate] 887*44bedb31SLionel Sambuc mov ebx, [bestlen] 888*44bedb31SLionel Sambuc cmp eax, ebx 889*44bedb31SLionel Sambuc jg LongerMatch 890*44bedb31SLionel Sambuc mov esi, [windowbestlen] 891*44bedb31SLionel Sambuc mov edi, [edx + dsPrev] 892*44bedb31SLionel Sambuc mov ebx, [scanend] 893*44bedb31SLionel Sambuc mov edx, [chainlenwmask] 894*44bedb31SLionel Sambuc jmp LookupLoop 895*44bedb31SLionel Sambuc 896*44bedb31SLionel Sambuc;;; s->match_start = cur_match; 897*44bedb31SLionel Sambuc;;; best_len = len; 898*44bedb31SLionel Sambuc;;; if (len >= nice_match) break; 899*44bedb31SLionel Sambuc;;; scan_end = *(ushf*)(scan+best_len-1); 900*44bedb31SLionel Sambuc 901*44bedb31SLionel SambucLongerMatch: mov ebx, [nicematch] 902*44bedb31SLionel Sambuc mov [bestlen], eax 903*44bedb31SLionel Sambuc mov [edx + dsMatchStart], ecx 904*44bedb31SLionel Sambuc cmp eax, ebx 905*44bedb31SLionel Sambuc jge LeaveNow 906*44bedb31SLionel Sambuc mov esi, [window] 907*44bedb31SLionel Sambuc add esi, eax 908*44bedb31SLionel Sambuc mov [windowbestlen], esi 909*44bedb31SLionel Sambuc movzx ebx, word ptr [edi + eax - 1] 910*44bedb31SLionel Sambuc mov edi, [edx + dsPrev] 911*44bedb31SLionel Sambuc mov [scanend], ebx 912*44bedb31SLionel Sambuc mov edx, [chainlenwmask] 913*44bedb31SLionel Sambuc jmp LookupLoop 914*44bedb31SLionel Sambuc 915*44bedb31SLionel Sambuc;;; Accept the current string, with the maximum possible length. 916*44bedb31SLionel Sambuc 917*44bedb31SLionel SambucLenMaximum: mov edx, [deflatestate] 918*44bedb31SLionel Sambuc mov dword ptr [bestlen], MAX_MATCH 919*44bedb31SLionel Sambuc mov [edx + dsMatchStart], ecx 920*44bedb31SLionel Sambuc 921*44bedb31SLionel Sambuc;;; if ((uInt)best_len <= s->lookahead) return (uInt)best_len; 922*44bedb31SLionel Sambuc;;; return s->lookahead; 923*44bedb31SLionel Sambuc 924*44bedb31SLionel SambucLeaveNow: 925*44bedb31SLionel Sambuc mov edx, [deflatestate] 926*44bedb31SLionel Sambuc mov ebx, [bestlen] 927*44bedb31SLionel Sambuc mov eax, [edx + dsLookahead] 928*44bedb31SLionel Sambuc cmp ebx, eax 929*44bedb31SLionel Sambuc jg LookaheadRet 930*44bedb31SLionel Sambuc mov eax, ebx 931*44bedb31SLionel SambucLookaheadRet: 932*44bedb31SLionel Sambuc 933*44bedb31SLionel Sambuc;;; Restore the stack and return from whence we came. 934*44bedb31SLionel Sambuc 935*44bedb31SLionel Sambuc add esp, LocalVarsSize 936*44bedb31SLionel Sambuc pop ebx 937*44bedb31SLionel Sambuc pop esi 938*44bedb31SLionel Sambuc pop edi 939*44bedb31SLionel Sambuc pop ebp 940*44bedb31SLionel Sambuc 941*44bedb31SLionel Sambuc ret 942*44bedb31SLionel Sambuc; please don't remove this string ! 943*44bedb31SLionel Sambuc; Your can freely use gvmat32 in any free or commercial app if you don't remove the string in the binary! 944*44bedb31SLionel Sambuc db 0dh,0ah,"asm686 with masm, optimised assembly code from Brian Raiter, written 1998",0dh,0ah 945*44bedb31SLionel Sambuc 946*44bedb31SLionel Sambuc 947*44bedb31SLionel SambucIFDEF NOOLDPENTIUMCODE 948*44bedb31SLionel Sambuc IFDEF NOUNDERLINE 949*44bedb31SLionel Sambuc longest_match endp 950*44bedb31SLionel Sambuc ELSE 951*44bedb31SLionel Sambuc _longest_match endp 952*44bedb31SLionel Sambuc ENDIF 953*44bedb31SLionel Sambuc 954*44bedb31SLionel Sambuc IFDEF NOUNDERLINE 955*44bedb31SLionel Sambuc match_init proc near 956*44bedb31SLionel Sambuc ret 957*44bedb31SLionel Sambuc match_init endp 958*44bedb31SLionel Sambuc ELSE 959*44bedb31SLionel Sambuc _match_init proc near 960*44bedb31SLionel Sambuc ret 961*44bedb31SLionel Sambuc _match_init endp 962*44bedb31SLionel Sambuc ENDIF 963*44bedb31SLionel SambucELSE 964*44bedb31SLionel Sambuc IFDEF NOUNDERLINE 965*44bedb31SLionel Sambuc longest_match_686 endp 966*44bedb31SLionel Sambuc ELSE 967*44bedb31SLionel Sambuc _longest_match_686 endp 968*44bedb31SLionel Sambuc ENDIF 969*44bedb31SLionel SambucENDIF 970*44bedb31SLionel Sambuc 971*44bedb31SLionel Sambuc_TEXT ends 972*44bedb31SLionel Sambucend 973