1*0Sstevel@tonic-gate.ident "sparcv8plus.s, Version 1.4" 2*0Sstevel@tonic-gate.ident "SPARC v9 ISA artwork by Andy Polyakov <appro@fy.chalmers.se>" 3*0Sstevel@tonic-gate 4*0Sstevel@tonic-gate/* 5*0Sstevel@tonic-gate * ==================================================================== 6*0Sstevel@tonic-gate * Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL 7*0Sstevel@tonic-gate * project. 8*0Sstevel@tonic-gate * 9*0Sstevel@tonic-gate * Rights for redistribution and usage in source and binary forms are 10*0Sstevel@tonic-gate * granted according to the OpenSSL license. Warranty of any kind is 11*0Sstevel@tonic-gate * disclaimed. 12*0Sstevel@tonic-gate * ==================================================================== 13*0Sstevel@tonic-gate */ 14*0Sstevel@tonic-gate 15*0Sstevel@tonic-gate/* 16*0Sstevel@tonic-gate * This is my modest contributon to OpenSSL project (see 17*0Sstevel@tonic-gate * http://www.openssl.org/ for more information about it) and is 18*0Sstevel@tonic-gate * a drop-in UltraSPARC ISA replacement for crypto/bn/bn_asm.c 19*0Sstevel@tonic-gate * module. For updates see http://fy.chalmers.se/~appro/hpe/. 20*0Sstevel@tonic-gate * 21*0Sstevel@tonic-gate * Questions-n-answers. 22*0Sstevel@tonic-gate * 23*0Sstevel@tonic-gate * Q. How to compile? 24*0Sstevel@tonic-gate * A. With SC4.x/SC5.x: 25*0Sstevel@tonic-gate * 26*0Sstevel@tonic-gate * cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o 27*0Sstevel@tonic-gate * 28*0Sstevel@tonic-gate * and with gcc: 29*0Sstevel@tonic-gate * 30*0Sstevel@tonic-gate * gcc -mcpu=ultrasparc -c bn_asm.sparc.v8plus.S -o bn_asm.o 31*0Sstevel@tonic-gate * 32*0Sstevel@tonic-gate * or if above fails (it does if you have gas installed): 33*0Sstevel@tonic-gate * 34*0Sstevel@tonic-gate * gcc -E bn_asm.sparc.v8plus.S | as -xarch=v8plus /dev/fd/0 -o bn_asm.o 35*0Sstevel@tonic-gate * 36*0Sstevel@tonic-gate * Quick-n-dirty way to fuse the module into the library. 37*0Sstevel@tonic-gate * Provided that the library is already configured and built 38*0Sstevel@tonic-gate * (in 0.9.2 case with no-asm option): 39*0Sstevel@tonic-gate * 40*0Sstevel@tonic-gate * # cd crypto/bn 41*0Sstevel@tonic-gate * # cp /some/place/bn_asm.sparc.v8plus.S . 42*0Sstevel@tonic-gate * # cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o 43*0Sstevel@tonic-gate * # make 44*0Sstevel@tonic-gate * # cd ../.. 45*0Sstevel@tonic-gate * # make; make test 46*0Sstevel@tonic-gate * 47*0Sstevel@tonic-gate * Quick-n-dirty way to get rid of it: 48*0Sstevel@tonic-gate * 49*0Sstevel@tonic-gate * # cd crypto/bn 50*0Sstevel@tonic-gate * # touch bn_asm.c 51*0Sstevel@tonic-gate * # make 52*0Sstevel@tonic-gate * # cd ../.. 53*0Sstevel@tonic-gate * # make; make test 54*0Sstevel@tonic-gate * 55*0Sstevel@tonic-gate * Q. V8plus achitecture? What kind of beast is that? 56*0Sstevel@tonic-gate * A. Well, it's rather a programming model than an architecture... 57*0Sstevel@tonic-gate * It's actually v9-compliant, i.e. *any* UltraSPARC, CPU under 58*0Sstevel@tonic-gate * special conditions, namely when kernel doesn't preserve upper 59*0Sstevel@tonic-gate * 32 bits of otherwise 64-bit registers during a context switch. 60*0Sstevel@tonic-gate * 61*0Sstevel@tonic-gate * Q. Why just UltraSPARC? What about SuperSPARC? 62*0Sstevel@tonic-gate * A. Original release did target UltraSPARC only. Now SuperSPARC 63*0Sstevel@tonic-gate * version is provided along. Both version share bn_*comba[48] 64*0Sstevel@tonic-gate * implementations (see comment later in code for explanation). 65*0Sstevel@tonic-gate * But what's so special about this UltraSPARC implementation? 66*0Sstevel@tonic-gate * Why didn't I let compiler do the job? Trouble is that most of 67*0Sstevel@tonic-gate * available compilers (well, SC5.0 is the only exception) don't 68*0Sstevel@tonic-gate * attempt to take advantage of UltraSPARC's 64-bitness under 69*0Sstevel@tonic-gate * 32-bit kernels even though it's perfectly possible (see next 70*0Sstevel@tonic-gate * question). 71*0Sstevel@tonic-gate * 72*0Sstevel@tonic-gate * Q. 64-bit registers under 32-bit kernels? Didn't you just say it 73*0Sstevel@tonic-gate * doesn't work? 74*0Sstevel@tonic-gate * A. You can't adress *all* registers as 64-bit wide:-( The catch is 75*0Sstevel@tonic-gate * that you actually may rely upon %o0-%o5 and %g1-%g4 being fully 76*0Sstevel@tonic-gate * preserved if you're in a leaf function, i.e. such never calling 77*0Sstevel@tonic-gate * any other functions. All functions in this module are leaf and 78*0Sstevel@tonic-gate * 10 registers is a handful. And as a matter of fact none-"comba" 79*0Sstevel@tonic-gate * routines don't require even that much and I could even afford to 80*0Sstevel@tonic-gate * not allocate own stack frame for 'em:-) 81*0Sstevel@tonic-gate * 82*0Sstevel@tonic-gate * Q. What about 64-bit kernels? 83*0Sstevel@tonic-gate * A. What about 'em? Just kidding:-) Pure 64-bit version is currently 84*0Sstevel@tonic-gate * under evaluation and development... 85*0Sstevel@tonic-gate * 86*0Sstevel@tonic-gate * Q. What about shared libraries? 87*0Sstevel@tonic-gate * A. What about 'em? Kidding again:-) Code does *not* contain any 88*0Sstevel@tonic-gate * code position dependencies and it's safe to include it into 89*0Sstevel@tonic-gate * shared library as is. 90*0Sstevel@tonic-gate * 91*0Sstevel@tonic-gate * Q. How much faster does it go? 92*0Sstevel@tonic-gate * A. Do you have a good benchmark? In either case below is what I 93*0Sstevel@tonic-gate * experience with crypto/bn/expspeed.c test program: 94*0Sstevel@tonic-gate * 95*0Sstevel@tonic-gate * v8plus module on U10/300MHz against bn_asm.c compiled with: 96*0Sstevel@tonic-gate * 97*0Sstevel@tonic-gate * cc-5.0 -xarch=v8plus -xO5 -xdepend +7-12% 98*0Sstevel@tonic-gate * cc-4.2 -xarch=v8plus -xO5 -xdepend +25-35% 99*0Sstevel@tonic-gate * egcs-1.1.2 -mcpu=ultrasparc -O3 +35-45% 100*0Sstevel@tonic-gate * 101*0Sstevel@tonic-gate * v8 module on SS10/60MHz against bn_asm.c compiled with: 102*0Sstevel@tonic-gate * 103*0Sstevel@tonic-gate * cc-5.0 -xarch=v8 -xO5 -xdepend +7-10% 104*0Sstevel@tonic-gate * cc-4.2 -xarch=v8 -xO5 -xdepend +10% 105*0Sstevel@tonic-gate * egcs-1.1.2 -mv8 -O3 +35-45% 106*0Sstevel@tonic-gate * 107*0Sstevel@tonic-gate * As you can see it's damn hard to beat the new Sun C compiler 108*0Sstevel@tonic-gate * and it's in first place GNU C users who will appreciate this 109*0Sstevel@tonic-gate * assembler implementation:-) 110*0Sstevel@tonic-gate */ 111*0Sstevel@tonic-gate 112*0Sstevel@tonic-gate/* 113*0Sstevel@tonic-gate * Revision history. 114*0Sstevel@tonic-gate * 115*0Sstevel@tonic-gate * 1.0 - initial release; 116*0Sstevel@tonic-gate * 1.1 - new loop unrolling model(*); 117*0Sstevel@tonic-gate * - some more fine tuning; 118*0Sstevel@tonic-gate * 1.2 - made gas friendly; 119*0Sstevel@tonic-gate * - updates to documentation concerning v9; 120*0Sstevel@tonic-gate * - new performance comparison matrix; 121*0Sstevel@tonic-gate * 1.3 - fixed problem with /usr/ccs/lib/cpp; 122*0Sstevel@tonic-gate * 1.4 - native V9 bn_*_comba[48] implementation (15% more efficient) 123*0Sstevel@tonic-gate * resulting in slight overall performance kick; 124*0Sstevel@tonic-gate * - some retunes; 125*0Sstevel@tonic-gate * - support for GNU as added; 126*0Sstevel@tonic-gate * 127*0Sstevel@tonic-gate * (*) Originally unrolled loop looked like this: 128*0Sstevel@tonic-gate * for (;;) { 129*0Sstevel@tonic-gate * op(p+0); if (--n==0) break; 130*0Sstevel@tonic-gate * op(p+1); if (--n==0) break; 131*0Sstevel@tonic-gate * op(p+2); if (--n==0) break; 132*0Sstevel@tonic-gate * op(p+3); if (--n==0) break; 133*0Sstevel@tonic-gate * p+=4; 134*0Sstevel@tonic-gate * } 135*0Sstevel@tonic-gate * I unroll according to following: 136*0Sstevel@tonic-gate * while (n&~3) { 137*0Sstevel@tonic-gate * op(p+0); op(p+1); op(p+2); op(p+3); 138*0Sstevel@tonic-gate * p+=4; n=-4; 139*0Sstevel@tonic-gate * } 140*0Sstevel@tonic-gate * if (n) { 141*0Sstevel@tonic-gate * op(p+0); if (--n==0) return; 142*0Sstevel@tonic-gate * op(p+2); if (--n==0) return; 143*0Sstevel@tonic-gate * op(p+3); return; 144*0Sstevel@tonic-gate * } 145*0Sstevel@tonic-gate */ 146*0Sstevel@tonic-gate 147*0Sstevel@tonic-gate/* 148*0Sstevel@tonic-gate * GNU assembler can't stand stuw:-( 149*0Sstevel@tonic-gate */ 150*0Sstevel@tonic-gate#define stuw st 151*0Sstevel@tonic-gate 152*0Sstevel@tonic-gate.section ".text",#alloc,#execinstr 153*0Sstevel@tonic-gate.file "bn_asm.sparc.v8plus.S" 154*0Sstevel@tonic-gate 155*0Sstevel@tonic-gate.align 32 156*0Sstevel@tonic-gate 157*0Sstevel@tonic-gate.global bn_mul_add_words 158*0Sstevel@tonic-gate/* 159*0Sstevel@tonic-gate * BN_ULONG bn_mul_add_words(rp,ap,num,w) 160*0Sstevel@tonic-gate * BN_ULONG *rp,*ap; 161*0Sstevel@tonic-gate * int num; 162*0Sstevel@tonic-gate * BN_ULONG w; 163*0Sstevel@tonic-gate */ 164*0Sstevel@tonic-gatebn_mul_add_words: 165*0Sstevel@tonic-gate brgz,a %o2,.L_bn_mul_add_words_proceed 166*0Sstevel@tonic-gate lduw [%o1],%g2 167*0Sstevel@tonic-gate retl 168*0Sstevel@tonic-gate clr %o0 169*0Sstevel@tonic-gate 170*0Sstevel@tonic-gate.L_bn_mul_add_words_proceed: 171*0Sstevel@tonic-gate srl %o3,%g0,%o3 ! clruw %o3 172*0Sstevel@tonic-gate andcc %o2,-4,%g0 173*0Sstevel@tonic-gate bz,pn %icc,.L_bn_mul_add_words_tail 174*0Sstevel@tonic-gate clr %o5 175*0Sstevel@tonic-gate 176*0Sstevel@tonic-gate.L_bn_mul_add_words_loop: ! wow! 32 aligned! 177*0Sstevel@tonic-gate lduw [%o0],%g1 178*0Sstevel@tonic-gate lduw [%o1+4],%g3 179*0Sstevel@tonic-gate mulx %o3,%g2,%g2 180*0Sstevel@tonic-gate add %g1,%o5,%o4 181*0Sstevel@tonic-gate nop 182*0Sstevel@tonic-gate add %o4,%g2,%o4 183*0Sstevel@tonic-gate stuw %o4,[%o0] 184*0Sstevel@tonic-gate srlx %o4,32,%o5 185*0Sstevel@tonic-gate 186*0Sstevel@tonic-gate lduw [%o0+4],%g1 187*0Sstevel@tonic-gate lduw [%o1+8],%g2 188*0Sstevel@tonic-gate mulx %o3,%g3,%g3 189*0Sstevel@tonic-gate add %g1,%o5,%o4 190*0Sstevel@tonic-gate dec 4,%o2 191*0Sstevel@tonic-gate add %o4,%g3,%o4 192*0Sstevel@tonic-gate stuw %o4,[%o0+4] 193*0Sstevel@tonic-gate srlx %o4,32,%o5 194*0Sstevel@tonic-gate 195*0Sstevel@tonic-gate lduw [%o0+8],%g1 196*0Sstevel@tonic-gate lduw [%o1+12],%g3 197*0Sstevel@tonic-gate mulx %o3,%g2,%g2 198*0Sstevel@tonic-gate add %g1,%o5,%o4 199*0Sstevel@tonic-gate inc 16,%o1 200*0Sstevel@tonic-gate add %o4,%g2,%o4 201*0Sstevel@tonic-gate stuw %o4,[%o0+8] 202*0Sstevel@tonic-gate srlx %o4,32,%o5 203*0Sstevel@tonic-gate 204*0Sstevel@tonic-gate lduw [%o0+12],%g1 205*0Sstevel@tonic-gate mulx %o3,%g3,%g3 206*0Sstevel@tonic-gate add %g1,%o5,%o4 207*0Sstevel@tonic-gate inc 16,%o0 208*0Sstevel@tonic-gate add %o4,%g3,%o4 209*0Sstevel@tonic-gate andcc %o2,-4,%g0 210*0Sstevel@tonic-gate stuw %o4,[%o0-4] 211*0Sstevel@tonic-gate srlx %o4,32,%o5 212*0Sstevel@tonic-gate bnz,a,pt %icc,.L_bn_mul_add_words_loop 213*0Sstevel@tonic-gate lduw [%o1],%g2 214*0Sstevel@tonic-gate 215*0Sstevel@tonic-gate brnz,a,pn %o2,.L_bn_mul_add_words_tail 216*0Sstevel@tonic-gate lduw [%o1],%g2 217*0Sstevel@tonic-gate.L_bn_mul_add_words_return: 218*0Sstevel@tonic-gate retl 219*0Sstevel@tonic-gate mov %o5,%o0 220*0Sstevel@tonic-gate 221*0Sstevel@tonic-gate.L_bn_mul_add_words_tail: 222*0Sstevel@tonic-gate lduw [%o0],%g1 223*0Sstevel@tonic-gate mulx %o3,%g2,%g2 224*0Sstevel@tonic-gate add %g1,%o5,%o4 225*0Sstevel@tonic-gate dec %o2 226*0Sstevel@tonic-gate add %o4,%g2,%o4 227*0Sstevel@tonic-gate srlx %o4,32,%o5 228*0Sstevel@tonic-gate brz,pt %o2,.L_bn_mul_add_words_return 229*0Sstevel@tonic-gate stuw %o4,[%o0] 230*0Sstevel@tonic-gate 231*0Sstevel@tonic-gate lduw [%o1+4],%g2 232*0Sstevel@tonic-gate lduw [%o0+4],%g1 233*0Sstevel@tonic-gate mulx %o3,%g2,%g2 234*0Sstevel@tonic-gate add %g1,%o5,%o4 235*0Sstevel@tonic-gate dec %o2 236*0Sstevel@tonic-gate add %o4,%g2,%o4 237*0Sstevel@tonic-gate srlx %o4,32,%o5 238*0Sstevel@tonic-gate brz,pt %o2,.L_bn_mul_add_words_return 239*0Sstevel@tonic-gate stuw %o4,[%o0+4] 240*0Sstevel@tonic-gate 241*0Sstevel@tonic-gate lduw [%o1+8],%g2 242*0Sstevel@tonic-gate lduw [%o0+8],%g1 243*0Sstevel@tonic-gate mulx %o3,%g2,%g2 244*0Sstevel@tonic-gate add %g1,%o5,%o4 245*0Sstevel@tonic-gate add %o4,%g2,%o4 246*0Sstevel@tonic-gate stuw %o4,[%o0+8] 247*0Sstevel@tonic-gate retl 248*0Sstevel@tonic-gate srlx %o4,32,%o0 249*0Sstevel@tonic-gate 250*0Sstevel@tonic-gate.type bn_mul_add_words,#function 251*0Sstevel@tonic-gate.size bn_mul_add_words,(.-bn_mul_add_words) 252*0Sstevel@tonic-gate 253*0Sstevel@tonic-gate.align 32 254*0Sstevel@tonic-gate 255*0Sstevel@tonic-gate.global bn_mul_words 256*0Sstevel@tonic-gate/* 257*0Sstevel@tonic-gate * BN_ULONG bn_mul_words(rp,ap,num,w) 258*0Sstevel@tonic-gate * BN_ULONG *rp,*ap; 259*0Sstevel@tonic-gate * int num; 260*0Sstevel@tonic-gate * BN_ULONG w; 261*0Sstevel@tonic-gate */ 262*0Sstevel@tonic-gatebn_mul_words: 263*0Sstevel@tonic-gate brgz,a %o2,.L_bn_mul_words_proceeed 264*0Sstevel@tonic-gate lduw [%o1],%g2 265*0Sstevel@tonic-gate retl 266*0Sstevel@tonic-gate clr %o0 267*0Sstevel@tonic-gate 268*0Sstevel@tonic-gate.L_bn_mul_words_proceeed: 269*0Sstevel@tonic-gate srl %o3,%g0,%o3 ! clruw %o3 270*0Sstevel@tonic-gate andcc %o2,-4,%g0 271*0Sstevel@tonic-gate bz,pn %icc,.L_bn_mul_words_tail 272*0Sstevel@tonic-gate clr %o5 273*0Sstevel@tonic-gate 274*0Sstevel@tonic-gate.L_bn_mul_words_loop: ! wow! 32 aligned! 275*0Sstevel@tonic-gate lduw [%o1+4],%g3 276*0Sstevel@tonic-gate mulx %o3,%g2,%g2 277*0Sstevel@tonic-gate add %g2,%o5,%o4 278*0Sstevel@tonic-gate nop 279*0Sstevel@tonic-gate stuw %o4,[%o0] 280*0Sstevel@tonic-gate srlx %o4,32,%o5 281*0Sstevel@tonic-gate 282*0Sstevel@tonic-gate lduw [%o1+8],%g2 283*0Sstevel@tonic-gate mulx %o3,%g3,%g3 284*0Sstevel@tonic-gate add %g3,%o5,%o4 285*0Sstevel@tonic-gate dec 4,%o2 286*0Sstevel@tonic-gate stuw %o4,[%o0+4] 287*0Sstevel@tonic-gate srlx %o4,32,%o5 288*0Sstevel@tonic-gate 289*0Sstevel@tonic-gate lduw [%o1+12],%g3 290*0Sstevel@tonic-gate mulx %o3,%g2,%g2 291*0Sstevel@tonic-gate add %g2,%o5,%o4 292*0Sstevel@tonic-gate inc 16,%o1 293*0Sstevel@tonic-gate stuw %o4,[%o0+8] 294*0Sstevel@tonic-gate srlx %o4,32,%o5 295*0Sstevel@tonic-gate 296*0Sstevel@tonic-gate mulx %o3,%g3,%g3 297*0Sstevel@tonic-gate add %g3,%o5,%o4 298*0Sstevel@tonic-gate inc 16,%o0 299*0Sstevel@tonic-gate stuw %o4,[%o0-4] 300*0Sstevel@tonic-gate srlx %o4,32,%o5 301*0Sstevel@tonic-gate andcc %o2,-4,%g0 302*0Sstevel@tonic-gate bnz,a,pt %icc,.L_bn_mul_words_loop 303*0Sstevel@tonic-gate lduw [%o1],%g2 304*0Sstevel@tonic-gate nop 305*0Sstevel@tonic-gate nop 306*0Sstevel@tonic-gate 307*0Sstevel@tonic-gate brnz,a,pn %o2,.L_bn_mul_words_tail 308*0Sstevel@tonic-gate lduw [%o1],%g2 309*0Sstevel@tonic-gate.L_bn_mul_words_return: 310*0Sstevel@tonic-gate retl 311*0Sstevel@tonic-gate mov %o5,%o0 312*0Sstevel@tonic-gate 313*0Sstevel@tonic-gate.L_bn_mul_words_tail: 314*0Sstevel@tonic-gate mulx %o3,%g2,%g2 315*0Sstevel@tonic-gate add %g2,%o5,%o4 316*0Sstevel@tonic-gate dec %o2 317*0Sstevel@tonic-gate srlx %o4,32,%o5 318*0Sstevel@tonic-gate brz,pt %o2,.L_bn_mul_words_return 319*0Sstevel@tonic-gate stuw %o4,[%o0] 320*0Sstevel@tonic-gate 321*0Sstevel@tonic-gate lduw [%o1+4],%g2 322*0Sstevel@tonic-gate mulx %o3,%g2,%g2 323*0Sstevel@tonic-gate add %g2,%o5,%o4 324*0Sstevel@tonic-gate dec %o2 325*0Sstevel@tonic-gate srlx %o4,32,%o5 326*0Sstevel@tonic-gate brz,pt %o2,.L_bn_mul_words_return 327*0Sstevel@tonic-gate stuw %o4,[%o0+4] 328*0Sstevel@tonic-gate 329*0Sstevel@tonic-gate lduw [%o1+8],%g2 330*0Sstevel@tonic-gate mulx %o3,%g2,%g2 331*0Sstevel@tonic-gate add %g2,%o5,%o4 332*0Sstevel@tonic-gate stuw %o4,[%o0+8] 333*0Sstevel@tonic-gate retl 334*0Sstevel@tonic-gate srlx %o4,32,%o0 335*0Sstevel@tonic-gate 336*0Sstevel@tonic-gate.type bn_mul_words,#function 337*0Sstevel@tonic-gate.size bn_mul_words,(.-bn_mul_words) 338*0Sstevel@tonic-gate 339*0Sstevel@tonic-gate.align 32 340*0Sstevel@tonic-gate.global bn_sqr_words 341*0Sstevel@tonic-gate/* 342*0Sstevel@tonic-gate * void bn_sqr_words(r,a,n) 343*0Sstevel@tonic-gate * BN_ULONG *r,*a; 344*0Sstevel@tonic-gate * int n; 345*0Sstevel@tonic-gate */ 346*0Sstevel@tonic-gatebn_sqr_words: 347*0Sstevel@tonic-gate brgz,a %o2,.L_bn_sqr_words_proceeed 348*0Sstevel@tonic-gate lduw [%o1],%g2 349*0Sstevel@tonic-gate retl 350*0Sstevel@tonic-gate clr %o0 351*0Sstevel@tonic-gate 352*0Sstevel@tonic-gate.L_bn_sqr_words_proceeed: 353*0Sstevel@tonic-gate andcc %o2,-4,%g0 354*0Sstevel@tonic-gate nop 355*0Sstevel@tonic-gate bz,pn %icc,.L_bn_sqr_words_tail 356*0Sstevel@tonic-gate nop 357*0Sstevel@tonic-gate 358*0Sstevel@tonic-gate.L_bn_sqr_words_loop: ! wow! 32 aligned! 359*0Sstevel@tonic-gate lduw [%o1+4],%g3 360*0Sstevel@tonic-gate mulx %g2,%g2,%o4 361*0Sstevel@tonic-gate stuw %o4,[%o0] 362*0Sstevel@tonic-gate srlx %o4,32,%o5 363*0Sstevel@tonic-gate stuw %o5,[%o0+4] 364*0Sstevel@tonic-gate nop 365*0Sstevel@tonic-gate 366*0Sstevel@tonic-gate lduw [%o1+8],%g2 367*0Sstevel@tonic-gate mulx %g3,%g3,%o4 368*0Sstevel@tonic-gate dec 4,%o2 369*0Sstevel@tonic-gate stuw %o4,[%o0+8] 370*0Sstevel@tonic-gate srlx %o4,32,%o5 371*0Sstevel@tonic-gate stuw %o5,[%o0+12] 372*0Sstevel@tonic-gate 373*0Sstevel@tonic-gate lduw [%o1+12],%g3 374*0Sstevel@tonic-gate mulx %g2,%g2,%o4 375*0Sstevel@tonic-gate srlx %o4,32,%o5 376*0Sstevel@tonic-gate stuw %o4,[%o0+16] 377*0Sstevel@tonic-gate inc 16,%o1 378*0Sstevel@tonic-gate stuw %o5,[%o0+20] 379*0Sstevel@tonic-gate 380*0Sstevel@tonic-gate mulx %g3,%g3,%o4 381*0Sstevel@tonic-gate inc 32,%o0 382*0Sstevel@tonic-gate stuw %o4,[%o0-8] 383*0Sstevel@tonic-gate srlx %o4,32,%o5 384*0Sstevel@tonic-gate andcc %o2,-4,%g2 385*0Sstevel@tonic-gate stuw %o5,[%o0-4] 386*0Sstevel@tonic-gate bnz,a,pt %icc,.L_bn_sqr_words_loop 387*0Sstevel@tonic-gate lduw [%o1],%g2 388*0Sstevel@tonic-gate nop 389*0Sstevel@tonic-gate 390*0Sstevel@tonic-gate brnz,a,pn %o2,.L_bn_sqr_words_tail 391*0Sstevel@tonic-gate lduw [%o1],%g2 392*0Sstevel@tonic-gate.L_bn_sqr_words_return: 393*0Sstevel@tonic-gate retl 394*0Sstevel@tonic-gate clr %o0 395*0Sstevel@tonic-gate 396*0Sstevel@tonic-gate.L_bn_sqr_words_tail: 397*0Sstevel@tonic-gate mulx %g2,%g2,%o4 398*0Sstevel@tonic-gate dec %o2 399*0Sstevel@tonic-gate stuw %o4,[%o0] 400*0Sstevel@tonic-gate srlx %o4,32,%o5 401*0Sstevel@tonic-gate brz,pt %o2,.L_bn_sqr_words_return 402*0Sstevel@tonic-gate stuw %o5,[%o0+4] 403*0Sstevel@tonic-gate 404*0Sstevel@tonic-gate lduw [%o1+4],%g2 405*0Sstevel@tonic-gate mulx %g2,%g2,%o4 406*0Sstevel@tonic-gate dec %o2 407*0Sstevel@tonic-gate stuw %o4,[%o0+8] 408*0Sstevel@tonic-gate srlx %o4,32,%o5 409*0Sstevel@tonic-gate brz,pt %o2,.L_bn_sqr_words_return 410*0Sstevel@tonic-gate stuw %o5,[%o0+12] 411*0Sstevel@tonic-gate 412*0Sstevel@tonic-gate lduw [%o1+8],%g2 413*0Sstevel@tonic-gate mulx %g2,%g2,%o4 414*0Sstevel@tonic-gate srlx %o4,32,%o5 415*0Sstevel@tonic-gate stuw %o4,[%o0+16] 416*0Sstevel@tonic-gate stuw %o5,[%o0+20] 417*0Sstevel@tonic-gate retl 418*0Sstevel@tonic-gate clr %o0 419*0Sstevel@tonic-gate 420*0Sstevel@tonic-gate.type bn_sqr_words,#function 421*0Sstevel@tonic-gate.size bn_sqr_words,(.-bn_sqr_words) 422*0Sstevel@tonic-gate 423*0Sstevel@tonic-gate.align 32 424*0Sstevel@tonic-gate.global bn_div_words 425*0Sstevel@tonic-gate/* 426*0Sstevel@tonic-gate * BN_ULONG bn_div_words(h,l,d) 427*0Sstevel@tonic-gate * BN_ULONG h,l,d; 428*0Sstevel@tonic-gate */ 429*0Sstevel@tonic-gatebn_div_words: 430*0Sstevel@tonic-gate sllx %o0,32,%o0 431*0Sstevel@tonic-gate or %o0,%o1,%o0 432*0Sstevel@tonic-gate udivx %o0,%o2,%o0 433*0Sstevel@tonic-gate retl 434*0Sstevel@tonic-gate srl %o0,%g0,%o0 ! clruw %o0 435*0Sstevel@tonic-gate 436*0Sstevel@tonic-gate.type bn_div_words,#function 437*0Sstevel@tonic-gate.size bn_div_words,(.-bn_div_words) 438*0Sstevel@tonic-gate 439*0Sstevel@tonic-gate.align 32 440*0Sstevel@tonic-gate 441*0Sstevel@tonic-gate.global bn_add_words 442*0Sstevel@tonic-gate/* 443*0Sstevel@tonic-gate * BN_ULONG bn_add_words(rp,ap,bp,n) 444*0Sstevel@tonic-gate * BN_ULONG *rp,*ap,*bp; 445*0Sstevel@tonic-gate * int n; 446*0Sstevel@tonic-gate */ 447*0Sstevel@tonic-gatebn_add_words: 448*0Sstevel@tonic-gate brgz,a %o3,.L_bn_add_words_proceed 449*0Sstevel@tonic-gate lduw [%o1],%o4 450*0Sstevel@tonic-gate retl 451*0Sstevel@tonic-gate clr %o0 452*0Sstevel@tonic-gate 453*0Sstevel@tonic-gate.L_bn_add_words_proceed: 454*0Sstevel@tonic-gate andcc %o3,-4,%g0 455*0Sstevel@tonic-gate bz,pn %icc,.L_bn_add_words_tail 456*0Sstevel@tonic-gate addcc %g0,0,%g0 ! clear carry flag 457*0Sstevel@tonic-gate nop 458*0Sstevel@tonic-gate 459*0Sstevel@tonic-gate.L_bn_add_words_loop: ! wow! 32 aligned! 460*0Sstevel@tonic-gate dec 4,%o3 461*0Sstevel@tonic-gate lduw [%o2],%o5 462*0Sstevel@tonic-gate lduw [%o1+4],%g1 463*0Sstevel@tonic-gate lduw [%o2+4],%g2 464*0Sstevel@tonic-gate lduw [%o1+8],%g3 465*0Sstevel@tonic-gate lduw [%o2+8],%g4 466*0Sstevel@tonic-gate addccc %o5,%o4,%o5 467*0Sstevel@tonic-gate stuw %o5,[%o0] 468*0Sstevel@tonic-gate 469*0Sstevel@tonic-gate lduw [%o1+12],%o4 470*0Sstevel@tonic-gate lduw [%o2+12],%o5 471*0Sstevel@tonic-gate inc 16,%o1 472*0Sstevel@tonic-gate addccc %g1,%g2,%g1 473*0Sstevel@tonic-gate stuw %g1,[%o0+4] 474*0Sstevel@tonic-gate 475*0Sstevel@tonic-gate inc 16,%o2 476*0Sstevel@tonic-gate addccc %g3,%g4,%g3 477*0Sstevel@tonic-gate stuw %g3,[%o0+8] 478*0Sstevel@tonic-gate 479*0Sstevel@tonic-gate inc 16,%o0 480*0Sstevel@tonic-gate addccc %o5,%o4,%o5 481*0Sstevel@tonic-gate stuw %o5,[%o0-4] 482*0Sstevel@tonic-gate and %o3,-4,%g1 483*0Sstevel@tonic-gate brnz,a,pt %g1,.L_bn_add_words_loop 484*0Sstevel@tonic-gate lduw [%o1],%o4 485*0Sstevel@tonic-gate 486*0Sstevel@tonic-gate brnz,a,pn %o3,.L_bn_add_words_tail 487*0Sstevel@tonic-gate lduw [%o1],%o4 488*0Sstevel@tonic-gate.L_bn_add_words_return: 489*0Sstevel@tonic-gate clr %o0 490*0Sstevel@tonic-gate retl 491*0Sstevel@tonic-gate movcs %icc,1,%o0 492*0Sstevel@tonic-gate nop 493*0Sstevel@tonic-gate 494*0Sstevel@tonic-gate.L_bn_add_words_tail: 495*0Sstevel@tonic-gate lduw [%o2],%o5 496*0Sstevel@tonic-gate dec %o3 497*0Sstevel@tonic-gate addccc %o5,%o4,%o5 498*0Sstevel@tonic-gate brz,pt %o3,.L_bn_add_words_return 499*0Sstevel@tonic-gate stuw %o5,[%o0] 500*0Sstevel@tonic-gate 501*0Sstevel@tonic-gate lduw [%o1+4],%o4 502*0Sstevel@tonic-gate lduw [%o2+4],%o5 503*0Sstevel@tonic-gate dec %o3 504*0Sstevel@tonic-gate addccc %o5,%o4,%o5 505*0Sstevel@tonic-gate brz,pt %o3,.L_bn_add_words_return 506*0Sstevel@tonic-gate stuw %o5,[%o0+4] 507*0Sstevel@tonic-gate 508*0Sstevel@tonic-gate lduw [%o1+8],%o4 509*0Sstevel@tonic-gate lduw [%o2+8],%o5 510*0Sstevel@tonic-gate addccc %o5,%o4,%o5 511*0Sstevel@tonic-gate stuw %o5,[%o0+8] 512*0Sstevel@tonic-gate clr %o0 513*0Sstevel@tonic-gate retl 514*0Sstevel@tonic-gate movcs %icc,1,%o0 515*0Sstevel@tonic-gate 516*0Sstevel@tonic-gate.type bn_add_words,#function 517*0Sstevel@tonic-gate.size bn_add_words,(.-bn_add_words) 518*0Sstevel@tonic-gate 519*0Sstevel@tonic-gate.global bn_sub_words 520*0Sstevel@tonic-gate/* 521*0Sstevel@tonic-gate * BN_ULONG bn_sub_words(rp,ap,bp,n) 522*0Sstevel@tonic-gate * BN_ULONG *rp,*ap,*bp; 523*0Sstevel@tonic-gate * int n; 524*0Sstevel@tonic-gate */ 525*0Sstevel@tonic-gatebn_sub_words: 526*0Sstevel@tonic-gate brgz,a %o3,.L_bn_sub_words_proceed 527*0Sstevel@tonic-gate lduw [%o1],%o4 528*0Sstevel@tonic-gate retl 529*0Sstevel@tonic-gate clr %o0 530*0Sstevel@tonic-gate 531*0Sstevel@tonic-gate.L_bn_sub_words_proceed: 532*0Sstevel@tonic-gate andcc %o3,-4,%g0 533*0Sstevel@tonic-gate bz,pn %icc,.L_bn_sub_words_tail 534*0Sstevel@tonic-gate addcc %g0,0,%g0 ! clear carry flag 535*0Sstevel@tonic-gate nop 536*0Sstevel@tonic-gate 537*0Sstevel@tonic-gate.L_bn_sub_words_loop: ! wow! 32 aligned! 538*0Sstevel@tonic-gate dec 4,%o3 539*0Sstevel@tonic-gate lduw [%o2],%o5 540*0Sstevel@tonic-gate lduw [%o1+4],%g1 541*0Sstevel@tonic-gate lduw [%o2+4],%g2 542*0Sstevel@tonic-gate lduw [%o1+8],%g3 543*0Sstevel@tonic-gate lduw [%o2+8],%g4 544*0Sstevel@tonic-gate subccc %o4,%o5,%o5 545*0Sstevel@tonic-gate stuw %o5,[%o0] 546*0Sstevel@tonic-gate 547*0Sstevel@tonic-gate lduw [%o1+12],%o4 548*0Sstevel@tonic-gate lduw [%o2+12],%o5 549*0Sstevel@tonic-gate inc 16,%o1 550*0Sstevel@tonic-gate subccc %g1,%g2,%g2 551*0Sstevel@tonic-gate stuw %g2,[%o0+4] 552*0Sstevel@tonic-gate 553*0Sstevel@tonic-gate inc 16,%o2 554*0Sstevel@tonic-gate subccc %g3,%g4,%g4 555*0Sstevel@tonic-gate stuw %g4,[%o0+8] 556*0Sstevel@tonic-gate 557*0Sstevel@tonic-gate inc 16,%o0 558*0Sstevel@tonic-gate subccc %o4,%o5,%o5 559*0Sstevel@tonic-gate stuw %o5,[%o0-4] 560*0Sstevel@tonic-gate and %o3,-4,%g1 561*0Sstevel@tonic-gate brnz,a,pt %g1,.L_bn_sub_words_loop 562*0Sstevel@tonic-gate lduw [%o1],%o4 563*0Sstevel@tonic-gate 564*0Sstevel@tonic-gate brnz,a,pn %o3,.L_bn_sub_words_tail 565*0Sstevel@tonic-gate lduw [%o1],%o4 566*0Sstevel@tonic-gate.L_bn_sub_words_return: 567*0Sstevel@tonic-gate clr %o0 568*0Sstevel@tonic-gate retl 569*0Sstevel@tonic-gate movcs %icc,1,%o0 570*0Sstevel@tonic-gate nop 571*0Sstevel@tonic-gate 572*0Sstevel@tonic-gate.L_bn_sub_words_tail: ! wow! 32 aligned! 573*0Sstevel@tonic-gate lduw [%o2],%o5 574*0Sstevel@tonic-gate dec %o3 575*0Sstevel@tonic-gate subccc %o4,%o5,%o5 576*0Sstevel@tonic-gate brz,pt %o3,.L_bn_sub_words_return 577*0Sstevel@tonic-gate stuw %o5,[%o0] 578*0Sstevel@tonic-gate 579*0Sstevel@tonic-gate lduw [%o1+4],%o4 580*0Sstevel@tonic-gate lduw [%o2+4],%o5 581*0Sstevel@tonic-gate dec %o3 582*0Sstevel@tonic-gate subccc %o4,%o5,%o5 583*0Sstevel@tonic-gate brz,pt %o3,.L_bn_sub_words_return 584*0Sstevel@tonic-gate stuw %o5,[%o0+4] 585*0Sstevel@tonic-gate 586*0Sstevel@tonic-gate lduw [%o1+8],%o4 587*0Sstevel@tonic-gate lduw [%o2+8],%o5 588*0Sstevel@tonic-gate subccc %o4,%o5,%o5 589*0Sstevel@tonic-gate stuw %o5,[%o0+8] 590*0Sstevel@tonic-gate clr %o0 591*0Sstevel@tonic-gate retl 592*0Sstevel@tonic-gate movcs %icc,1,%o0 593*0Sstevel@tonic-gate 594*0Sstevel@tonic-gate.type bn_sub_words,#function 595*0Sstevel@tonic-gate.size bn_sub_words,(.-bn_sub_words) 596*0Sstevel@tonic-gate 597*0Sstevel@tonic-gate/* 598*0Sstevel@tonic-gate * Code below depends on the fact that upper parts of the %l0-%l7 599*0Sstevel@tonic-gate * and %i0-%i7 are zeroed by kernel after context switch. In 600*0Sstevel@tonic-gate * previous versions this comment stated that "the trouble is that 601*0Sstevel@tonic-gate * it's not feasible to implement the mumbo-jumbo in less V9 602*0Sstevel@tonic-gate * instructions:-(" which apparently isn't true thanks to 603*0Sstevel@tonic-gate * 'bcs,a %xcc,.+8; inc %rd' pair. But the performance improvement 604*0Sstevel@tonic-gate * results not from the shorter code, but from elimination of 605*0Sstevel@tonic-gate * multicycle none-pairable 'rd %y,%rd' instructions. 606*0Sstevel@tonic-gate * 607*0Sstevel@tonic-gate * Andy. 608*0Sstevel@tonic-gate */ 609*0Sstevel@tonic-gate 610*0Sstevel@tonic-gate#define FRAME_SIZE -96 611*0Sstevel@tonic-gate 612*0Sstevel@tonic-gate/* 613*0Sstevel@tonic-gate * Here is register usage map for *all* routines below. 614*0Sstevel@tonic-gate */ 615*0Sstevel@tonic-gate#define t_1 %o0 616*0Sstevel@tonic-gate#define t_2 %o1 617*0Sstevel@tonic-gate#define c_12 %o2 618*0Sstevel@tonic-gate#define c_3 %o3 619*0Sstevel@tonic-gate 620*0Sstevel@tonic-gate#define ap(I) [%i1+4*I] 621*0Sstevel@tonic-gate#define bp(I) [%i2+4*I] 622*0Sstevel@tonic-gate#define rp(I) [%i0+4*I] 623*0Sstevel@tonic-gate 624*0Sstevel@tonic-gate#define a_0 %l0 625*0Sstevel@tonic-gate#define a_1 %l1 626*0Sstevel@tonic-gate#define a_2 %l2 627*0Sstevel@tonic-gate#define a_3 %l3 628*0Sstevel@tonic-gate#define a_4 %l4 629*0Sstevel@tonic-gate#define a_5 %l5 630*0Sstevel@tonic-gate#define a_6 %l6 631*0Sstevel@tonic-gate#define a_7 %l7 632*0Sstevel@tonic-gate 633*0Sstevel@tonic-gate#define b_0 %i3 634*0Sstevel@tonic-gate#define b_1 %i4 635*0Sstevel@tonic-gate#define b_2 %i5 636*0Sstevel@tonic-gate#define b_3 %o4 637*0Sstevel@tonic-gate#define b_4 %o5 638*0Sstevel@tonic-gate#define b_5 %o7 639*0Sstevel@tonic-gate#define b_6 %g1 640*0Sstevel@tonic-gate#define b_7 %g4 641*0Sstevel@tonic-gate 642*0Sstevel@tonic-gate.align 32 643*0Sstevel@tonic-gate.global bn_mul_comba8 644*0Sstevel@tonic-gate/* 645*0Sstevel@tonic-gate * void bn_mul_comba8(r,a,b) 646*0Sstevel@tonic-gate * BN_ULONG *r,*a,*b; 647*0Sstevel@tonic-gate */ 648*0Sstevel@tonic-gatebn_mul_comba8: 649*0Sstevel@tonic-gate save %sp,FRAME_SIZE,%sp 650*0Sstevel@tonic-gate mov 1,t_2 651*0Sstevel@tonic-gate lduw ap(0),a_0 652*0Sstevel@tonic-gate sllx t_2,32,t_2 653*0Sstevel@tonic-gate lduw bp(0),b_0 != 654*0Sstevel@tonic-gate lduw bp(1),b_1 655*0Sstevel@tonic-gate mulx a_0,b_0,t_1 !mul_add_c(a[0],b[0],c1,c2,c3); 656*0Sstevel@tonic-gate srlx t_1,32,c_12 657*0Sstevel@tonic-gate stuw t_1,rp(0) !=!r[0]=c1; 658*0Sstevel@tonic-gate 659*0Sstevel@tonic-gate lduw ap(1),a_1 660*0Sstevel@tonic-gate mulx a_0,b_1,t_1 !mul_add_c(a[0],b[1],c2,c3,c1); 661*0Sstevel@tonic-gate addcc c_12,t_1,c_12 662*0Sstevel@tonic-gate clr c_3 != 663*0Sstevel@tonic-gate bcs,a %xcc,.+8 664*0Sstevel@tonic-gate add c_3,t_2,c_3 665*0Sstevel@tonic-gate lduw ap(2),a_2 666*0Sstevel@tonic-gate mulx a_1,b_0,t_1 !=!mul_add_c(a[1],b[0],c2,c3,c1); 667*0Sstevel@tonic-gate addcc c_12,t_1,t_1 668*0Sstevel@tonic-gate bcs,a %xcc,.+8 669*0Sstevel@tonic-gate add c_3,t_2,c_3 670*0Sstevel@tonic-gate srlx t_1,32,c_12 != 671*0Sstevel@tonic-gate stuw t_1,rp(1) !r[1]=c2; 672*0Sstevel@tonic-gate or c_12,c_3,c_12 673*0Sstevel@tonic-gate 674*0Sstevel@tonic-gate mulx a_2,b_0,t_1 !mul_add_c(a[2],b[0],c3,c1,c2); 675*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 676*0Sstevel@tonic-gate clr c_3 677*0Sstevel@tonic-gate bcs,a %xcc,.+8 678*0Sstevel@tonic-gate add c_3,t_2,c_3 679*0Sstevel@tonic-gate lduw bp(2),b_2 != 680*0Sstevel@tonic-gate mulx a_1,b_1,t_1 !mul_add_c(a[1],b[1],c3,c1,c2); 681*0Sstevel@tonic-gate addcc c_12,t_1,c_12 682*0Sstevel@tonic-gate bcs,a %xcc,.+8 683*0Sstevel@tonic-gate add c_3,t_2,c_3 != 684*0Sstevel@tonic-gate lduw bp(3),b_3 685*0Sstevel@tonic-gate mulx a_0,b_2,t_1 !mul_add_c(a[0],b[2],c3,c1,c2); 686*0Sstevel@tonic-gate addcc c_12,t_1,t_1 687*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 688*0Sstevel@tonic-gate add c_3,t_2,c_3 689*0Sstevel@tonic-gate srlx t_1,32,c_12 690*0Sstevel@tonic-gate stuw t_1,rp(2) !r[2]=c3; 691*0Sstevel@tonic-gate or c_12,c_3,c_12 != 692*0Sstevel@tonic-gate 693*0Sstevel@tonic-gate mulx a_0,b_3,t_1 !mul_add_c(a[0],b[3],c1,c2,c3); 694*0Sstevel@tonic-gate addcc c_12,t_1,c_12 695*0Sstevel@tonic-gate clr c_3 696*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 697*0Sstevel@tonic-gate add c_3,t_2,c_3 698*0Sstevel@tonic-gate mulx a_1,b_2,t_1 !=!mul_add_c(a[1],b[2],c1,c2,c3); 699*0Sstevel@tonic-gate addcc c_12,t_1,c_12 700*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 701*0Sstevel@tonic-gate add c_3,t_2,c_3 702*0Sstevel@tonic-gate lduw ap(3),a_3 703*0Sstevel@tonic-gate mulx a_2,b_1,t_1 !mul_add_c(a[2],b[1],c1,c2,c3); 704*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 705*0Sstevel@tonic-gate bcs,a %xcc,.+8 706*0Sstevel@tonic-gate add c_3,t_2,c_3 707*0Sstevel@tonic-gate lduw ap(4),a_4 708*0Sstevel@tonic-gate mulx a_3,b_0,t_1 !=!mul_add_c(a[3],b[0],c1,c2,c3);!= 709*0Sstevel@tonic-gate addcc c_12,t_1,t_1 710*0Sstevel@tonic-gate bcs,a %xcc,.+8 711*0Sstevel@tonic-gate add c_3,t_2,c_3 712*0Sstevel@tonic-gate srlx t_1,32,c_12 != 713*0Sstevel@tonic-gate stuw t_1,rp(3) !r[3]=c1; 714*0Sstevel@tonic-gate or c_12,c_3,c_12 715*0Sstevel@tonic-gate 716*0Sstevel@tonic-gate mulx a_4,b_0,t_1 !mul_add_c(a[4],b[0],c2,c3,c1); 717*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 718*0Sstevel@tonic-gate clr c_3 719*0Sstevel@tonic-gate bcs,a %xcc,.+8 720*0Sstevel@tonic-gate add c_3,t_2,c_3 721*0Sstevel@tonic-gate mulx a_3,b_1,t_1 !=!mul_add_c(a[3],b[1],c2,c3,c1); 722*0Sstevel@tonic-gate addcc c_12,t_1,c_12 723*0Sstevel@tonic-gate bcs,a %xcc,.+8 724*0Sstevel@tonic-gate add c_3,t_2,c_3 725*0Sstevel@tonic-gate mulx a_2,b_2,t_1 !=!mul_add_c(a[2],b[2],c2,c3,c1); 726*0Sstevel@tonic-gate addcc c_12,t_1,c_12 727*0Sstevel@tonic-gate bcs,a %xcc,.+8 728*0Sstevel@tonic-gate add c_3,t_2,c_3 729*0Sstevel@tonic-gate lduw bp(4),b_4 != 730*0Sstevel@tonic-gate mulx a_1,b_3,t_1 !mul_add_c(a[1],b[3],c2,c3,c1); 731*0Sstevel@tonic-gate addcc c_12,t_1,c_12 732*0Sstevel@tonic-gate bcs,a %xcc,.+8 733*0Sstevel@tonic-gate add c_3,t_2,c_3 != 734*0Sstevel@tonic-gate lduw bp(5),b_5 735*0Sstevel@tonic-gate mulx a_0,b_4,t_1 !mul_add_c(a[0],b[4],c2,c3,c1); 736*0Sstevel@tonic-gate addcc c_12,t_1,t_1 737*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 738*0Sstevel@tonic-gate add c_3,t_2,c_3 739*0Sstevel@tonic-gate srlx t_1,32,c_12 740*0Sstevel@tonic-gate stuw t_1,rp(4) !r[4]=c2; 741*0Sstevel@tonic-gate or c_12,c_3,c_12 != 742*0Sstevel@tonic-gate 743*0Sstevel@tonic-gate mulx a_0,b_5,t_1 !mul_add_c(a[0],b[5],c3,c1,c2); 744*0Sstevel@tonic-gate addcc c_12,t_1,c_12 745*0Sstevel@tonic-gate clr c_3 746*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 747*0Sstevel@tonic-gate add c_3,t_2,c_3 748*0Sstevel@tonic-gate mulx a_1,b_4,t_1 !mul_add_c(a[1],b[4],c3,c1,c2); 749*0Sstevel@tonic-gate addcc c_12,t_1,c_12 750*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 751*0Sstevel@tonic-gate add c_3,t_2,c_3 752*0Sstevel@tonic-gate mulx a_2,b_3,t_1 !mul_add_c(a[2],b[3],c3,c1,c2); 753*0Sstevel@tonic-gate addcc c_12,t_1,c_12 754*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 755*0Sstevel@tonic-gate add c_3,t_2,c_3 756*0Sstevel@tonic-gate mulx a_3,b_2,t_1 !mul_add_c(a[3],b[2],c3,c1,c2); 757*0Sstevel@tonic-gate addcc c_12,t_1,c_12 758*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 759*0Sstevel@tonic-gate add c_3,t_2,c_3 760*0Sstevel@tonic-gate lduw ap(5),a_5 761*0Sstevel@tonic-gate mulx a_4,b_1,t_1 !mul_add_c(a[4],b[1],c3,c1,c2); 762*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 763*0Sstevel@tonic-gate bcs,a %xcc,.+8 764*0Sstevel@tonic-gate add c_3,t_2,c_3 765*0Sstevel@tonic-gate lduw ap(6),a_6 766*0Sstevel@tonic-gate mulx a_5,b_0,t_1 !=!mul_add_c(a[5],b[0],c3,c1,c2); 767*0Sstevel@tonic-gate addcc c_12,t_1,t_1 768*0Sstevel@tonic-gate bcs,a %xcc,.+8 769*0Sstevel@tonic-gate add c_3,t_2,c_3 770*0Sstevel@tonic-gate srlx t_1,32,c_12 != 771*0Sstevel@tonic-gate stuw t_1,rp(5) !r[5]=c3; 772*0Sstevel@tonic-gate or c_12,c_3,c_12 773*0Sstevel@tonic-gate 774*0Sstevel@tonic-gate mulx a_6,b_0,t_1 !mul_add_c(a[6],b[0],c1,c2,c3); 775*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 776*0Sstevel@tonic-gate clr c_3 777*0Sstevel@tonic-gate bcs,a %xcc,.+8 778*0Sstevel@tonic-gate add c_3,t_2,c_3 779*0Sstevel@tonic-gate mulx a_5,b_1,t_1 !=!mul_add_c(a[5],b[1],c1,c2,c3); 780*0Sstevel@tonic-gate addcc c_12,t_1,c_12 781*0Sstevel@tonic-gate bcs,a %xcc,.+8 782*0Sstevel@tonic-gate add c_3,t_2,c_3 783*0Sstevel@tonic-gate mulx a_4,b_2,t_1 !=!mul_add_c(a[4],b[2],c1,c2,c3); 784*0Sstevel@tonic-gate addcc c_12,t_1,c_12 785*0Sstevel@tonic-gate bcs,a %xcc,.+8 786*0Sstevel@tonic-gate add c_3,t_2,c_3 787*0Sstevel@tonic-gate mulx a_3,b_3,t_1 !=!mul_add_c(a[3],b[3],c1,c2,c3); 788*0Sstevel@tonic-gate addcc c_12,t_1,c_12 789*0Sstevel@tonic-gate bcs,a %xcc,.+8 790*0Sstevel@tonic-gate add c_3,t_2,c_3 791*0Sstevel@tonic-gate mulx a_2,b_4,t_1 !=!mul_add_c(a[2],b[4],c1,c2,c3); 792*0Sstevel@tonic-gate addcc c_12,t_1,c_12 793*0Sstevel@tonic-gate bcs,a %xcc,.+8 794*0Sstevel@tonic-gate add c_3,t_2,c_3 795*0Sstevel@tonic-gate lduw bp(6),b_6 != 796*0Sstevel@tonic-gate mulx a_1,b_5,t_1 !mul_add_c(a[1],b[5],c1,c2,c3); 797*0Sstevel@tonic-gate addcc c_12,t_1,c_12 798*0Sstevel@tonic-gate bcs,a %xcc,.+8 799*0Sstevel@tonic-gate add c_3,t_2,c_3 != 800*0Sstevel@tonic-gate lduw bp(7),b_7 801*0Sstevel@tonic-gate mulx a_0,b_6,t_1 !mul_add_c(a[0],b[6],c1,c2,c3); 802*0Sstevel@tonic-gate addcc c_12,t_1,t_1 803*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 804*0Sstevel@tonic-gate add c_3,t_2,c_3 805*0Sstevel@tonic-gate srlx t_1,32,c_12 806*0Sstevel@tonic-gate stuw t_1,rp(6) !r[6]=c1; 807*0Sstevel@tonic-gate or c_12,c_3,c_12 != 808*0Sstevel@tonic-gate 809*0Sstevel@tonic-gate mulx a_0,b_7,t_1 !mul_add_c(a[0],b[7],c2,c3,c1); 810*0Sstevel@tonic-gate addcc c_12,t_1,c_12 811*0Sstevel@tonic-gate clr c_3 812*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 813*0Sstevel@tonic-gate add c_3,t_2,c_3 814*0Sstevel@tonic-gate mulx a_1,b_6,t_1 !mul_add_c(a[1],b[6],c2,c3,c1); 815*0Sstevel@tonic-gate addcc c_12,t_1,c_12 816*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 817*0Sstevel@tonic-gate add c_3,t_2,c_3 818*0Sstevel@tonic-gate mulx a_2,b_5,t_1 !mul_add_c(a[2],b[5],c2,c3,c1); 819*0Sstevel@tonic-gate addcc c_12,t_1,c_12 820*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 821*0Sstevel@tonic-gate add c_3,t_2,c_3 822*0Sstevel@tonic-gate mulx a_3,b_4,t_1 !mul_add_c(a[3],b[4],c2,c3,c1); 823*0Sstevel@tonic-gate addcc c_12,t_1,c_12 824*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 825*0Sstevel@tonic-gate add c_3,t_2,c_3 826*0Sstevel@tonic-gate mulx a_4,b_3,t_1 !mul_add_c(a[4],b[3],c2,c3,c1); 827*0Sstevel@tonic-gate addcc c_12,t_1,c_12 828*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 829*0Sstevel@tonic-gate add c_3,t_2,c_3 830*0Sstevel@tonic-gate mulx a_5,b_2,t_1 !mul_add_c(a[5],b[2],c2,c3,c1); 831*0Sstevel@tonic-gate addcc c_12,t_1,c_12 832*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 833*0Sstevel@tonic-gate add c_3,t_2,c_3 834*0Sstevel@tonic-gate lduw ap(7),a_7 835*0Sstevel@tonic-gate mulx a_6,b_1,t_1 !=!mul_add_c(a[6],b[1],c2,c3,c1); 836*0Sstevel@tonic-gate addcc c_12,t_1,c_12 837*0Sstevel@tonic-gate bcs,a %xcc,.+8 838*0Sstevel@tonic-gate add c_3,t_2,c_3 839*0Sstevel@tonic-gate mulx a_7,b_0,t_1 !=!mul_add_c(a[7],b[0],c2,c3,c1); 840*0Sstevel@tonic-gate addcc c_12,t_1,t_1 841*0Sstevel@tonic-gate bcs,a %xcc,.+8 842*0Sstevel@tonic-gate add c_3,t_2,c_3 843*0Sstevel@tonic-gate srlx t_1,32,c_12 != 844*0Sstevel@tonic-gate stuw t_1,rp(7) !r[7]=c2; 845*0Sstevel@tonic-gate or c_12,c_3,c_12 846*0Sstevel@tonic-gate 847*0Sstevel@tonic-gate mulx a_7,b_1,t_1 !=!mul_add_c(a[7],b[1],c3,c1,c2); 848*0Sstevel@tonic-gate addcc c_12,t_1,c_12 849*0Sstevel@tonic-gate clr c_3 850*0Sstevel@tonic-gate bcs,a %xcc,.+8 851*0Sstevel@tonic-gate add c_3,t_2,c_3 != 852*0Sstevel@tonic-gate mulx a_6,b_2,t_1 !mul_add_c(a[6],b[2],c3,c1,c2); 853*0Sstevel@tonic-gate addcc c_12,t_1,c_12 854*0Sstevel@tonic-gate bcs,a %xcc,.+8 855*0Sstevel@tonic-gate add c_3,t_2,c_3 != 856*0Sstevel@tonic-gate mulx a_5,b_3,t_1 !mul_add_c(a[5],b[3],c3,c1,c2); 857*0Sstevel@tonic-gate addcc c_12,t_1,c_12 858*0Sstevel@tonic-gate bcs,a %xcc,.+8 859*0Sstevel@tonic-gate add c_3,t_2,c_3 != 860*0Sstevel@tonic-gate mulx a_4,b_4,t_1 !mul_add_c(a[4],b[4],c3,c1,c2); 861*0Sstevel@tonic-gate addcc c_12,t_1,c_12 862*0Sstevel@tonic-gate bcs,a %xcc,.+8 863*0Sstevel@tonic-gate add c_3,t_2,c_3 != 864*0Sstevel@tonic-gate mulx a_3,b_5,t_1 !mul_add_c(a[3],b[5],c3,c1,c2); 865*0Sstevel@tonic-gate addcc c_12,t_1,c_12 866*0Sstevel@tonic-gate bcs,a %xcc,.+8 867*0Sstevel@tonic-gate add c_3,t_2,c_3 != 868*0Sstevel@tonic-gate mulx a_2,b_6,t_1 !mul_add_c(a[2],b[6],c3,c1,c2); 869*0Sstevel@tonic-gate addcc c_12,t_1,c_12 870*0Sstevel@tonic-gate bcs,a %xcc,.+8 871*0Sstevel@tonic-gate add c_3,t_2,c_3 != 872*0Sstevel@tonic-gate mulx a_1,b_7,t_1 !mul_add_c(a[1],b[7],c3,c1,c2); 873*0Sstevel@tonic-gate addcc c_12,t_1,t_1 874*0Sstevel@tonic-gate bcs,a %xcc,.+8 875*0Sstevel@tonic-gate add c_3,t_2,c_3 != 876*0Sstevel@tonic-gate srlx t_1,32,c_12 877*0Sstevel@tonic-gate stuw t_1,rp(8) !r[8]=c3; 878*0Sstevel@tonic-gate or c_12,c_3,c_12 879*0Sstevel@tonic-gate 880*0Sstevel@tonic-gate mulx a_2,b_7,t_1 !=!mul_add_c(a[2],b[7],c1,c2,c3); 881*0Sstevel@tonic-gate addcc c_12,t_1,c_12 882*0Sstevel@tonic-gate clr c_3 883*0Sstevel@tonic-gate bcs,a %xcc,.+8 884*0Sstevel@tonic-gate add c_3,t_2,c_3 != 885*0Sstevel@tonic-gate mulx a_3,b_6,t_1 !mul_add_c(a[3],b[6],c1,c2,c3); 886*0Sstevel@tonic-gate addcc c_12,t_1,c_12 887*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 888*0Sstevel@tonic-gate add c_3,t_2,c_3 889*0Sstevel@tonic-gate mulx a_4,b_5,t_1 !mul_add_c(a[4],b[5],c1,c2,c3); 890*0Sstevel@tonic-gate addcc c_12,t_1,c_12 891*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 892*0Sstevel@tonic-gate add c_3,t_2,c_3 893*0Sstevel@tonic-gate mulx a_5,b_4,t_1 !mul_add_c(a[5],b[4],c1,c2,c3); 894*0Sstevel@tonic-gate addcc c_12,t_1,c_12 895*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 896*0Sstevel@tonic-gate add c_3,t_2,c_3 897*0Sstevel@tonic-gate mulx a_6,b_3,t_1 !mul_add_c(a[6],b[3],c1,c2,c3); 898*0Sstevel@tonic-gate addcc c_12,t_1,c_12 899*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 900*0Sstevel@tonic-gate add c_3,t_2,c_3 901*0Sstevel@tonic-gate mulx a_7,b_2,t_1 !mul_add_c(a[7],b[2],c1,c2,c3); 902*0Sstevel@tonic-gate addcc c_12,t_1,t_1 903*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 904*0Sstevel@tonic-gate add c_3,t_2,c_3 905*0Sstevel@tonic-gate srlx t_1,32,c_12 906*0Sstevel@tonic-gate stuw t_1,rp(9) !r[9]=c1; 907*0Sstevel@tonic-gate or c_12,c_3,c_12 != 908*0Sstevel@tonic-gate 909*0Sstevel@tonic-gate mulx a_7,b_3,t_1 !mul_add_c(a[7],b[3],c2,c3,c1); 910*0Sstevel@tonic-gate addcc c_12,t_1,c_12 911*0Sstevel@tonic-gate clr c_3 912*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 913*0Sstevel@tonic-gate add c_3,t_2,c_3 914*0Sstevel@tonic-gate mulx a_6,b_4,t_1 !mul_add_c(a[6],b[4],c2,c3,c1); 915*0Sstevel@tonic-gate addcc c_12,t_1,c_12 916*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 917*0Sstevel@tonic-gate add c_3,t_2,c_3 918*0Sstevel@tonic-gate mulx a_5,b_5,t_1 !mul_add_c(a[5],b[5],c2,c3,c1); 919*0Sstevel@tonic-gate addcc c_12,t_1,c_12 920*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 921*0Sstevel@tonic-gate add c_3,t_2,c_3 922*0Sstevel@tonic-gate mulx a_4,b_6,t_1 !mul_add_c(a[4],b[6],c2,c3,c1); 923*0Sstevel@tonic-gate addcc c_12,t_1,c_12 924*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 925*0Sstevel@tonic-gate add c_3,t_2,c_3 926*0Sstevel@tonic-gate mulx a_3,b_7,t_1 !mul_add_c(a[3],b[7],c2,c3,c1); 927*0Sstevel@tonic-gate addcc c_12,t_1,t_1 928*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 929*0Sstevel@tonic-gate add c_3,t_2,c_3 930*0Sstevel@tonic-gate srlx t_1,32,c_12 931*0Sstevel@tonic-gate stuw t_1,rp(10) !r[10]=c2; 932*0Sstevel@tonic-gate or c_12,c_3,c_12 != 933*0Sstevel@tonic-gate 934*0Sstevel@tonic-gate mulx a_4,b_7,t_1 !mul_add_c(a[4],b[7],c3,c1,c2); 935*0Sstevel@tonic-gate addcc c_12,t_1,c_12 936*0Sstevel@tonic-gate clr c_3 937*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 938*0Sstevel@tonic-gate add c_3,t_2,c_3 939*0Sstevel@tonic-gate mulx a_5,b_6,t_1 !mul_add_c(a[5],b[6],c3,c1,c2); 940*0Sstevel@tonic-gate addcc c_12,t_1,c_12 941*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 942*0Sstevel@tonic-gate add c_3,t_2,c_3 943*0Sstevel@tonic-gate mulx a_6,b_5,t_1 !mul_add_c(a[6],b[5],c3,c1,c2); 944*0Sstevel@tonic-gate addcc c_12,t_1,c_12 945*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 946*0Sstevel@tonic-gate add c_3,t_2,c_3 947*0Sstevel@tonic-gate mulx a_7,b_4,t_1 !mul_add_c(a[7],b[4],c3,c1,c2); 948*0Sstevel@tonic-gate addcc c_12,t_1,t_1 949*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 950*0Sstevel@tonic-gate add c_3,t_2,c_3 951*0Sstevel@tonic-gate srlx t_1,32,c_12 952*0Sstevel@tonic-gate stuw t_1,rp(11) !r[11]=c3; 953*0Sstevel@tonic-gate or c_12,c_3,c_12 != 954*0Sstevel@tonic-gate 955*0Sstevel@tonic-gate mulx a_7,b_5,t_1 !mul_add_c(a[7],b[5],c1,c2,c3); 956*0Sstevel@tonic-gate addcc c_12,t_1,c_12 957*0Sstevel@tonic-gate clr c_3 958*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 959*0Sstevel@tonic-gate add c_3,t_2,c_3 960*0Sstevel@tonic-gate mulx a_6,b_6,t_1 !mul_add_c(a[6],b[6],c1,c2,c3); 961*0Sstevel@tonic-gate addcc c_12,t_1,c_12 962*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 963*0Sstevel@tonic-gate add c_3,t_2,c_3 964*0Sstevel@tonic-gate mulx a_5,b_7,t_1 !mul_add_c(a[5],b[7],c1,c2,c3); 965*0Sstevel@tonic-gate addcc c_12,t_1,t_1 966*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 967*0Sstevel@tonic-gate add c_3,t_2,c_3 968*0Sstevel@tonic-gate srlx t_1,32,c_12 969*0Sstevel@tonic-gate stuw t_1,rp(12) !r[12]=c1; 970*0Sstevel@tonic-gate or c_12,c_3,c_12 != 971*0Sstevel@tonic-gate 972*0Sstevel@tonic-gate mulx a_6,b_7,t_1 !mul_add_c(a[6],b[7],c2,c3,c1); 973*0Sstevel@tonic-gate addcc c_12,t_1,c_12 974*0Sstevel@tonic-gate clr c_3 975*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 976*0Sstevel@tonic-gate add c_3,t_2,c_3 977*0Sstevel@tonic-gate mulx a_7,b_6,t_1 !mul_add_c(a[7],b[6],c2,c3,c1); 978*0Sstevel@tonic-gate addcc c_12,t_1,t_1 979*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 980*0Sstevel@tonic-gate add c_3,t_2,c_3 981*0Sstevel@tonic-gate srlx t_1,32,c_12 982*0Sstevel@tonic-gate st t_1,rp(13) !r[13]=c2; 983*0Sstevel@tonic-gate or c_12,c_3,c_12 != 984*0Sstevel@tonic-gate 985*0Sstevel@tonic-gate mulx a_7,b_7,t_1 !mul_add_c(a[7],b[7],c3,c1,c2); 986*0Sstevel@tonic-gate addcc c_12,t_1,t_1 987*0Sstevel@tonic-gate srlx t_1,32,c_12 != 988*0Sstevel@tonic-gate stuw t_1,rp(14) !r[14]=c3; 989*0Sstevel@tonic-gate stuw c_12,rp(15) !r[15]=c1; 990*0Sstevel@tonic-gate 991*0Sstevel@tonic-gate ret 992*0Sstevel@tonic-gate restore %g0,%g0,%o0 != 993*0Sstevel@tonic-gate 994*0Sstevel@tonic-gate.type bn_mul_comba8,#function 995*0Sstevel@tonic-gate.size bn_mul_comba8,(.-bn_mul_comba8) 996*0Sstevel@tonic-gate 997*0Sstevel@tonic-gate.align 32 998*0Sstevel@tonic-gate 999*0Sstevel@tonic-gate.global bn_mul_comba4 1000*0Sstevel@tonic-gate/* 1001*0Sstevel@tonic-gate * void bn_mul_comba4(r,a,b) 1002*0Sstevel@tonic-gate * BN_ULONG *r,*a,*b; 1003*0Sstevel@tonic-gate */ 1004*0Sstevel@tonic-gatebn_mul_comba4: 1005*0Sstevel@tonic-gate save %sp,FRAME_SIZE,%sp 1006*0Sstevel@tonic-gate lduw ap(0),a_0 1007*0Sstevel@tonic-gate mov 1,t_2 1008*0Sstevel@tonic-gate lduw bp(0),b_0 1009*0Sstevel@tonic-gate sllx t_2,32,t_2 != 1010*0Sstevel@tonic-gate lduw bp(1),b_1 1011*0Sstevel@tonic-gate mulx a_0,b_0,t_1 !mul_add_c(a[0],b[0],c1,c2,c3); 1012*0Sstevel@tonic-gate srlx t_1,32,c_12 1013*0Sstevel@tonic-gate stuw t_1,rp(0) !=!r[0]=c1; 1014*0Sstevel@tonic-gate 1015*0Sstevel@tonic-gate lduw ap(1),a_1 1016*0Sstevel@tonic-gate mulx a_0,b_1,t_1 !mul_add_c(a[0],b[1],c2,c3,c1); 1017*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1018*0Sstevel@tonic-gate clr c_3 != 1019*0Sstevel@tonic-gate bcs,a %xcc,.+8 1020*0Sstevel@tonic-gate add c_3,t_2,c_3 1021*0Sstevel@tonic-gate lduw ap(2),a_2 1022*0Sstevel@tonic-gate mulx a_1,b_0,t_1 !=!mul_add_c(a[1],b[0],c2,c3,c1); 1023*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1024*0Sstevel@tonic-gate bcs,a %xcc,.+8 1025*0Sstevel@tonic-gate add c_3,t_2,c_3 1026*0Sstevel@tonic-gate srlx t_1,32,c_12 != 1027*0Sstevel@tonic-gate stuw t_1,rp(1) !r[1]=c2; 1028*0Sstevel@tonic-gate or c_12,c_3,c_12 1029*0Sstevel@tonic-gate 1030*0Sstevel@tonic-gate mulx a_2,b_0,t_1 !mul_add_c(a[2],b[0],c3,c1,c2); 1031*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 1032*0Sstevel@tonic-gate clr c_3 1033*0Sstevel@tonic-gate bcs,a %xcc,.+8 1034*0Sstevel@tonic-gate add c_3,t_2,c_3 1035*0Sstevel@tonic-gate lduw bp(2),b_2 != 1036*0Sstevel@tonic-gate mulx a_1,b_1,t_1 !mul_add_c(a[1],b[1],c3,c1,c2); 1037*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1038*0Sstevel@tonic-gate bcs,a %xcc,.+8 1039*0Sstevel@tonic-gate add c_3,t_2,c_3 != 1040*0Sstevel@tonic-gate lduw bp(3),b_3 1041*0Sstevel@tonic-gate mulx a_0,b_2,t_1 !mul_add_c(a[0],b[2],c3,c1,c2); 1042*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1043*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 1044*0Sstevel@tonic-gate add c_3,t_2,c_3 1045*0Sstevel@tonic-gate srlx t_1,32,c_12 1046*0Sstevel@tonic-gate stuw t_1,rp(2) !r[2]=c3; 1047*0Sstevel@tonic-gate or c_12,c_3,c_12 != 1048*0Sstevel@tonic-gate 1049*0Sstevel@tonic-gate mulx a_0,b_3,t_1 !mul_add_c(a[0],b[3],c1,c2,c3); 1050*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1051*0Sstevel@tonic-gate clr c_3 1052*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 1053*0Sstevel@tonic-gate add c_3,t_2,c_3 1054*0Sstevel@tonic-gate mulx a_1,b_2,t_1 !mul_add_c(a[1],b[2],c1,c2,c3); 1055*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1056*0Sstevel@tonic-gate bcs,a %xcc,.+8 != 1057*0Sstevel@tonic-gate add c_3,t_2,c_3 1058*0Sstevel@tonic-gate lduw ap(3),a_3 1059*0Sstevel@tonic-gate mulx a_2,b_1,t_1 !mul_add_c(a[2],b[1],c1,c2,c3); 1060*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 1061*0Sstevel@tonic-gate bcs,a %xcc,.+8 1062*0Sstevel@tonic-gate add c_3,t_2,c_3 1063*0Sstevel@tonic-gate mulx a_3,b_0,t_1 !mul_add_c(a[3],b[0],c1,c2,c3);!= 1064*0Sstevel@tonic-gate addcc c_12,t_1,t_1 != 1065*0Sstevel@tonic-gate bcs,a %xcc,.+8 1066*0Sstevel@tonic-gate add c_3,t_2,c_3 1067*0Sstevel@tonic-gate srlx t_1,32,c_12 1068*0Sstevel@tonic-gate stuw t_1,rp(3) !=!r[3]=c1; 1069*0Sstevel@tonic-gate or c_12,c_3,c_12 1070*0Sstevel@tonic-gate 1071*0Sstevel@tonic-gate mulx a_3,b_1,t_1 !mul_add_c(a[3],b[1],c2,c3,c1); 1072*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1073*0Sstevel@tonic-gate clr c_3 != 1074*0Sstevel@tonic-gate bcs,a %xcc,.+8 1075*0Sstevel@tonic-gate add c_3,t_2,c_3 1076*0Sstevel@tonic-gate mulx a_2,b_2,t_1 !mul_add_c(a[2],b[2],c2,c3,c1); 1077*0Sstevel@tonic-gate addcc c_12,t_1,c_12 != 1078*0Sstevel@tonic-gate bcs,a %xcc,.+8 1079*0Sstevel@tonic-gate add c_3,t_2,c_3 1080*0Sstevel@tonic-gate mulx a_1,b_3,t_1 !mul_add_c(a[1],b[3],c2,c3,c1); 1081*0Sstevel@tonic-gate addcc c_12,t_1,t_1 != 1082*0Sstevel@tonic-gate bcs,a %xcc,.+8 1083*0Sstevel@tonic-gate add c_3,t_2,c_3 1084*0Sstevel@tonic-gate srlx t_1,32,c_12 1085*0Sstevel@tonic-gate stuw t_1,rp(4) !=!r[4]=c2; 1086*0Sstevel@tonic-gate or c_12,c_3,c_12 1087*0Sstevel@tonic-gate 1088*0Sstevel@tonic-gate mulx a_2,b_3,t_1 !mul_add_c(a[2],b[3],c3,c1,c2); 1089*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1090*0Sstevel@tonic-gate clr c_3 != 1091*0Sstevel@tonic-gate bcs,a %xcc,.+8 1092*0Sstevel@tonic-gate add c_3,t_2,c_3 1093*0Sstevel@tonic-gate mulx a_3,b_2,t_1 !mul_add_c(a[3],b[2],c3,c1,c2); 1094*0Sstevel@tonic-gate addcc c_12,t_1,t_1 != 1095*0Sstevel@tonic-gate bcs,a %xcc,.+8 1096*0Sstevel@tonic-gate add c_3,t_2,c_3 1097*0Sstevel@tonic-gate srlx t_1,32,c_12 1098*0Sstevel@tonic-gate stuw t_1,rp(5) !=!r[5]=c3; 1099*0Sstevel@tonic-gate or c_12,c_3,c_12 1100*0Sstevel@tonic-gate 1101*0Sstevel@tonic-gate mulx a_3,b_3,t_1 !mul_add_c(a[3],b[3],c1,c2,c3); 1102*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1103*0Sstevel@tonic-gate srlx t_1,32,c_12 != 1104*0Sstevel@tonic-gate stuw t_1,rp(6) !r[6]=c1; 1105*0Sstevel@tonic-gate stuw c_12,rp(7) !r[7]=c2; 1106*0Sstevel@tonic-gate 1107*0Sstevel@tonic-gate ret 1108*0Sstevel@tonic-gate restore %g0,%g0,%o0 1109*0Sstevel@tonic-gate 1110*0Sstevel@tonic-gate.type bn_mul_comba4,#function 1111*0Sstevel@tonic-gate.size bn_mul_comba4,(.-bn_mul_comba4) 1112*0Sstevel@tonic-gate 1113*0Sstevel@tonic-gate.align 32 1114*0Sstevel@tonic-gate 1115*0Sstevel@tonic-gate.global bn_sqr_comba8 1116*0Sstevel@tonic-gatebn_sqr_comba8: 1117*0Sstevel@tonic-gate save %sp,FRAME_SIZE,%sp 1118*0Sstevel@tonic-gate mov 1,t_2 1119*0Sstevel@tonic-gate lduw ap(0),a_0 1120*0Sstevel@tonic-gate sllx t_2,32,t_2 1121*0Sstevel@tonic-gate lduw ap(1),a_1 1122*0Sstevel@tonic-gate mulx a_0,a_0,t_1 !sqr_add_c(a,0,c1,c2,c3); 1123*0Sstevel@tonic-gate srlx t_1,32,c_12 1124*0Sstevel@tonic-gate stuw t_1,rp(0) !r[0]=c1; 1125*0Sstevel@tonic-gate 1126*0Sstevel@tonic-gate lduw ap(2),a_2 1127*0Sstevel@tonic-gate mulx a_0,a_1,t_1 !=!sqr_add_c2(a,1,0,c2,c3,c1); 1128*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1129*0Sstevel@tonic-gate clr c_3 1130*0Sstevel@tonic-gate bcs,a %xcc,.+8 1131*0Sstevel@tonic-gate add c_3,t_2,c_3 1132*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1133*0Sstevel@tonic-gate bcs,a %xcc,.+8 1134*0Sstevel@tonic-gate add c_3,t_2,c_3 1135*0Sstevel@tonic-gate srlx t_1,32,c_12 1136*0Sstevel@tonic-gate stuw t_1,rp(1) !r[1]=c2; 1137*0Sstevel@tonic-gate or c_12,c_3,c_12 1138*0Sstevel@tonic-gate 1139*0Sstevel@tonic-gate mulx a_2,a_0,t_1 !sqr_add_c2(a,2,0,c3,c1,c2); 1140*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1141*0Sstevel@tonic-gate clr c_3 1142*0Sstevel@tonic-gate bcs,a %xcc,.+8 1143*0Sstevel@tonic-gate add c_3,t_2,c_3 1144*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1145*0Sstevel@tonic-gate bcs,a %xcc,.+8 1146*0Sstevel@tonic-gate add c_3,t_2,c_3 1147*0Sstevel@tonic-gate lduw ap(3),a_3 1148*0Sstevel@tonic-gate mulx a_1,a_1,t_1 !sqr_add_c(a,1,c3,c1,c2); 1149*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1150*0Sstevel@tonic-gate bcs,a %xcc,.+8 1151*0Sstevel@tonic-gate add c_3,t_2,c_3 1152*0Sstevel@tonic-gate srlx t_1,32,c_12 1153*0Sstevel@tonic-gate stuw t_1,rp(2) !r[2]=c3; 1154*0Sstevel@tonic-gate or c_12,c_3,c_12 1155*0Sstevel@tonic-gate 1156*0Sstevel@tonic-gate mulx a_0,a_3,t_1 !sqr_add_c2(a,3,0,c1,c2,c3); 1157*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1158*0Sstevel@tonic-gate clr c_3 1159*0Sstevel@tonic-gate bcs,a %xcc,.+8 1160*0Sstevel@tonic-gate add c_3,t_2,c_3 1161*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1162*0Sstevel@tonic-gate bcs,a %xcc,.+8 1163*0Sstevel@tonic-gate add c_3,t_2,c_3 1164*0Sstevel@tonic-gate lduw ap(4),a_4 1165*0Sstevel@tonic-gate mulx a_1,a_2,t_1 !sqr_add_c2(a,2,1,c1,c2,c3); 1166*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1167*0Sstevel@tonic-gate bcs,a %xcc,.+8 1168*0Sstevel@tonic-gate add c_3,t_2,c_3 1169*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1170*0Sstevel@tonic-gate bcs,a %xcc,.+8 1171*0Sstevel@tonic-gate add c_3,t_2,c_3 1172*0Sstevel@tonic-gate srlx t_1,32,c_12 1173*0Sstevel@tonic-gate st t_1,rp(3) !r[3]=c1; 1174*0Sstevel@tonic-gate or c_12,c_3,c_12 1175*0Sstevel@tonic-gate 1176*0Sstevel@tonic-gate mulx a_4,a_0,t_1 !sqr_add_c2(a,4,0,c2,c3,c1); 1177*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1178*0Sstevel@tonic-gate clr c_3 1179*0Sstevel@tonic-gate bcs,a %xcc,.+8 1180*0Sstevel@tonic-gate add c_3,t_2,c_3 1181*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1182*0Sstevel@tonic-gate bcs,a %xcc,.+8 1183*0Sstevel@tonic-gate add c_3,t_2,c_3 1184*0Sstevel@tonic-gate mulx a_3,a_1,t_1 !sqr_add_c2(a,3,1,c2,c3,c1); 1185*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1186*0Sstevel@tonic-gate bcs,a %xcc,.+8 1187*0Sstevel@tonic-gate add c_3,t_2,c_3 1188*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1189*0Sstevel@tonic-gate bcs,a %xcc,.+8 1190*0Sstevel@tonic-gate add c_3,t_2,c_3 1191*0Sstevel@tonic-gate lduw ap(5),a_5 1192*0Sstevel@tonic-gate mulx a_2,a_2,t_1 !sqr_add_c(a,2,c2,c3,c1); 1193*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1194*0Sstevel@tonic-gate bcs,a %xcc,.+8 1195*0Sstevel@tonic-gate add c_3,t_2,c_3 1196*0Sstevel@tonic-gate srlx t_1,32,c_12 1197*0Sstevel@tonic-gate stuw t_1,rp(4) !r[4]=c2; 1198*0Sstevel@tonic-gate or c_12,c_3,c_12 1199*0Sstevel@tonic-gate 1200*0Sstevel@tonic-gate mulx a_0,a_5,t_1 !sqr_add_c2(a,5,0,c3,c1,c2); 1201*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1202*0Sstevel@tonic-gate clr c_3 1203*0Sstevel@tonic-gate bcs,a %xcc,.+8 1204*0Sstevel@tonic-gate add c_3,t_2,c_3 1205*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1206*0Sstevel@tonic-gate bcs,a %xcc,.+8 1207*0Sstevel@tonic-gate add c_3,t_2,c_3 1208*0Sstevel@tonic-gate mulx a_1,a_4,t_1 !sqr_add_c2(a,4,1,c3,c1,c2); 1209*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1210*0Sstevel@tonic-gate bcs,a %xcc,.+8 1211*0Sstevel@tonic-gate add c_3,t_2,c_3 1212*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1213*0Sstevel@tonic-gate bcs,a %xcc,.+8 1214*0Sstevel@tonic-gate add c_3,t_2,c_3 1215*0Sstevel@tonic-gate lduw ap(6),a_6 1216*0Sstevel@tonic-gate mulx a_2,a_3,t_1 !sqr_add_c2(a,3,2,c3,c1,c2); 1217*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1218*0Sstevel@tonic-gate bcs,a %xcc,.+8 1219*0Sstevel@tonic-gate add c_3,t_2,c_3 1220*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1221*0Sstevel@tonic-gate bcs,a %xcc,.+8 1222*0Sstevel@tonic-gate add c_3,t_2,c_3 1223*0Sstevel@tonic-gate srlx t_1,32,c_12 1224*0Sstevel@tonic-gate stuw t_1,rp(5) !r[5]=c3; 1225*0Sstevel@tonic-gate or c_12,c_3,c_12 1226*0Sstevel@tonic-gate 1227*0Sstevel@tonic-gate mulx a_6,a_0,t_1 !sqr_add_c2(a,6,0,c1,c2,c3); 1228*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1229*0Sstevel@tonic-gate clr c_3 1230*0Sstevel@tonic-gate bcs,a %xcc,.+8 1231*0Sstevel@tonic-gate add c_3,t_2,c_3 1232*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1233*0Sstevel@tonic-gate bcs,a %xcc,.+8 1234*0Sstevel@tonic-gate add c_3,t_2,c_3 1235*0Sstevel@tonic-gate mulx a_5,a_1,t_1 !sqr_add_c2(a,5,1,c1,c2,c3); 1236*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1237*0Sstevel@tonic-gate bcs,a %xcc,.+8 1238*0Sstevel@tonic-gate add c_3,t_2,c_3 1239*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1240*0Sstevel@tonic-gate bcs,a %xcc,.+8 1241*0Sstevel@tonic-gate add c_3,t_2,c_3 1242*0Sstevel@tonic-gate mulx a_4,a_2,t_1 !sqr_add_c2(a,4,2,c1,c2,c3); 1243*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1244*0Sstevel@tonic-gate bcs,a %xcc,.+8 1245*0Sstevel@tonic-gate add c_3,t_2,c_3 1246*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1247*0Sstevel@tonic-gate bcs,a %xcc,.+8 1248*0Sstevel@tonic-gate add c_3,t_2,c_3 1249*0Sstevel@tonic-gate lduw ap(7),a_7 1250*0Sstevel@tonic-gate mulx a_3,a_3,t_1 !=!sqr_add_c(a,3,c1,c2,c3); 1251*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1252*0Sstevel@tonic-gate bcs,a %xcc,.+8 1253*0Sstevel@tonic-gate add c_3,t_2,c_3 1254*0Sstevel@tonic-gate srlx t_1,32,c_12 1255*0Sstevel@tonic-gate stuw t_1,rp(6) !r[6]=c1; 1256*0Sstevel@tonic-gate or c_12,c_3,c_12 1257*0Sstevel@tonic-gate 1258*0Sstevel@tonic-gate mulx a_0,a_7,t_1 !sqr_add_c2(a,7,0,c2,c3,c1); 1259*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1260*0Sstevel@tonic-gate clr c_3 1261*0Sstevel@tonic-gate bcs,a %xcc,.+8 1262*0Sstevel@tonic-gate add c_3,t_2,c_3 1263*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1264*0Sstevel@tonic-gate bcs,a %xcc,.+8 1265*0Sstevel@tonic-gate add c_3,t_2,c_3 1266*0Sstevel@tonic-gate mulx a_1,a_6,t_1 !sqr_add_c2(a,6,1,c2,c3,c1); 1267*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1268*0Sstevel@tonic-gate bcs,a %xcc,.+8 1269*0Sstevel@tonic-gate add c_3,t_2,c_3 1270*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1271*0Sstevel@tonic-gate bcs,a %xcc,.+8 1272*0Sstevel@tonic-gate add c_3,t_2,c_3 1273*0Sstevel@tonic-gate mulx a_2,a_5,t_1 !sqr_add_c2(a,5,2,c2,c3,c1); 1274*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1275*0Sstevel@tonic-gate bcs,a %xcc,.+8 1276*0Sstevel@tonic-gate add c_3,t_2,c_3 1277*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1278*0Sstevel@tonic-gate bcs,a %xcc,.+8 1279*0Sstevel@tonic-gate add c_3,t_2,c_3 1280*0Sstevel@tonic-gate mulx a_3,a_4,t_1 !sqr_add_c2(a,4,3,c2,c3,c1); 1281*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1282*0Sstevel@tonic-gate bcs,a %xcc,.+8 1283*0Sstevel@tonic-gate add c_3,t_2,c_3 1284*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1285*0Sstevel@tonic-gate bcs,a %xcc,.+8 1286*0Sstevel@tonic-gate add c_3,t_2,c_3 1287*0Sstevel@tonic-gate srlx t_1,32,c_12 1288*0Sstevel@tonic-gate stuw t_1,rp(7) !r[7]=c2; 1289*0Sstevel@tonic-gate or c_12,c_3,c_12 1290*0Sstevel@tonic-gate 1291*0Sstevel@tonic-gate mulx a_7,a_1,t_1 !sqr_add_c2(a,7,1,c3,c1,c2); 1292*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1293*0Sstevel@tonic-gate clr c_3 1294*0Sstevel@tonic-gate bcs,a %xcc,.+8 1295*0Sstevel@tonic-gate add c_3,t_2,c_3 1296*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1297*0Sstevel@tonic-gate bcs,a %xcc,.+8 1298*0Sstevel@tonic-gate add c_3,t_2,c_3 1299*0Sstevel@tonic-gate mulx a_6,a_2,t_1 !sqr_add_c2(a,6,2,c3,c1,c2); 1300*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1301*0Sstevel@tonic-gate bcs,a %xcc,.+8 1302*0Sstevel@tonic-gate add c_3,t_2,c_3 1303*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1304*0Sstevel@tonic-gate bcs,a %xcc,.+8 1305*0Sstevel@tonic-gate add c_3,t_2,c_3 1306*0Sstevel@tonic-gate mulx a_5,a_3,t_1 !sqr_add_c2(a,5,3,c3,c1,c2); 1307*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1308*0Sstevel@tonic-gate bcs,a %xcc,.+8 1309*0Sstevel@tonic-gate add c_3,t_2,c_3 1310*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1311*0Sstevel@tonic-gate bcs,a %xcc,.+8 1312*0Sstevel@tonic-gate add c_3,t_2,c_3 1313*0Sstevel@tonic-gate mulx a_4,a_4,t_1 !sqr_add_c(a,4,c3,c1,c2); 1314*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1315*0Sstevel@tonic-gate bcs,a %xcc,.+8 1316*0Sstevel@tonic-gate add c_3,t_2,c_3 1317*0Sstevel@tonic-gate srlx t_1,32,c_12 1318*0Sstevel@tonic-gate stuw t_1,rp(8) !r[8]=c3; 1319*0Sstevel@tonic-gate or c_12,c_3,c_12 1320*0Sstevel@tonic-gate 1321*0Sstevel@tonic-gate mulx a_2,a_7,t_1 !sqr_add_c2(a,7,2,c1,c2,c3); 1322*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1323*0Sstevel@tonic-gate clr c_3 1324*0Sstevel@tonic-gate bcs,a %xcc,.+8 1325*0Sstevel@tonic-gate add c_3,t_2,c_3 1326*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1327*0Sstevel@tonic-gate bcs,a %xcc,.+8 1328*0Sstevel@tonic-gate add c_3,t_2,c_3 1329*0Sstevel@tonic-gate mulx a_3,a_6,t_1 !sqr_add_c2(a,6,3,c1,c2,c3); 1330*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1331*0Sstevel@tonic-gate bcs,a %xcc,.+8 1332*0Sstevel@tonic-gate add c_3,t_2,c_3 1333*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1334*0Sstevel@tonic-gate bcs,a %xcc,.+8 1335*0Sstevel@tonic-gate add c_3,t_2,c_3 1336*0Sstevel@tonic-gate mulx a_4,a_5,t_1 !sqr_add_c2(a,5,4,c1,c2,c3); 1337*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1338*0Sstevel@tonic-gate bcs,a %xcc,.+8 1339*0Sstevel@tonic-gate add c_3,t_2,c_3 1340*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1341*0Sstevel@tonic-gate bcs,a %xcc,.+8 1342*0Sstevel@tonic-gate add c_3,t_2,c_3 1343*0Sstevel@tonic-gate srlx t_1,32,c_12 1344*0Sstevel@tonic-gate stuw t_1,rp(9) !r[9]=c1; 1345*0Sstevel@tonic-gate or c_12,c_3,c_12 1346*0Sstevel@tonic-gate 1347*0Sstevel@tonic-gate mulx a_7,a_3,t_1 !sqr_add_c2(a,7,3,c2,c3,c1); 1348*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1349*0Sstevel@tonic-gate clr c_3 1350*0Sstevel@tonic-gate bcs,a %xcc,.+8 1351*0Sstevel@tonic-gate add c_3,t_2,c_3 1352*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1353*0Sstevel@tonic-gate bcs,a %xcc,.+8 1354*0Sstevel@tonic-gate add c_3,t_2,c_3 1355*0Sstevel@tonic-gate mulx a_6,a_4,t_1 !sqr_add_c2(a,6,4,c2,c3,c1); 1356*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1357*0Sstevel@tonic-gate bcs,a %xcc,.+8 1358*0Sstevel@tonic-gate add c_3,t_2,c_3 1359*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1360*0Sstevel@tonic-gate bcs,a %xcc,.+8 1361*0Sstevel@tonic-gate add c_3,t_2,c_3 1362*0Sstevel@tonic-gate mulx a_5,a_5,t_1 !sqr_add_c(a,5,c2,c3,c1); 1363*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1364*0Sstevel@tonic-gate bcs,a %xcc,.+8 1365*0Sstevel@tonic-gate add c_3,t_2,c_3 1366*0Sstevel@tonic-gate srlx t_1,32,c_12 1367*0Sstevel@tonic-gate stuw t_1,rp(10) !r[10]=c2; 1368*0Sstevel@tonic-gate or c_12,c_3,c_12 1369*0Sstevel@tonic-gate 1370*0Sstevel@tonic-gate mulx a_4,a_7,t_1 !sqr_add_c2(a,7,4,c3,c1,c2); 1371*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1372*0Sstevel@tonic-gate clr c_3 1373*0Sstevel@tonic-gate bcs,a %xcc,.+8 1374*0Sstevel@tonic-gate add c_3,t_2,c_3 1375*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1376*0Sstevel@tonic-gate bcs,a %xcc,.+8 1377*0Sstevel@tonic-gate add c_3,t_2,c_3 1378*0Sstevel@tonic-gate mulx a_5,a_6,t_1 !sqr_add_c2(a,6,5,c3,c1,c2); 1379*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1380*0Sstevel@tonic-gate bcs,a %xcc,.+8 1381*0Sstevel@tonic-gate add c_3,t_2,c_3 1382*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1383*0Sstevel@tonic-gate bcs,a %xcc,.+8 1384*0Sstevel@tonic-gate add c_3,t_2,c_3 1385*0Sstevel@tonic-gate srlx t_1,32,c_12 1386*0Sstevel@tonic-gate stuw t_1,rp(11) !r[11]=c3; 1387*0Sstevel@tonic-gate or c_12,c_3,c_12 1388*0Sstevel@tonic-gate 1389*0Sstevel@tonic-gate mulx a_7,a_5,t_1 !sqr_add_c2(a,7,5,c1,c2,c3); 1390*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1391*0Sstevel@tonic-gate clr c_3 1392*0Sstevel@tonic-gate bcs,a %xcc,.+8 1393*0Sstevel@tonic-gate add c_3,t_2,c_3 1394*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1395*0Sstevel@tonic-gate bcs,a %xcc,.+8 1396*0Sstevel@tonic-gate add c_3,t_2,c_3 1397*0Sstevel@tonic-gate mulx a_6,a_6,t_1 !sqr_add_c(a,6,c1,c2,c3); 1398*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1399*0Sstevel@tonic-gate bcs,a %xcc,.+8 1400*0Sstevel@tonic-gate add c_3,t_2,c_3 1401*0Sstevel@tonic-gate srlx t_1,32,c_12 1402*0Sstevel@tonic-gate stuw t_1,rp(12) !r[12]=c1; 1403*0Sstevel@tonic-gate or c_12,c_3,c_12 1404*0Sstevel@tonic-gate 1405*0Sstevel@tonic-gate mulx a_6,a_7,t_1 !sqr_add_c2(a,7,6,c2,c3,c1); 1406*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1407*0Sstevel@tonic-gate clr c_3 1408*0Sstevel@tonic-gate bcs,a %xcc,.+8 1409*0Sstevel@tonic-gate add c_3,t_2,c_3 1410*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1411*0Sstevel@tonic-gate bcs,a %xcc,.+8 1412*0Sstevel@tonic-gate add c_3,t_2,c_3 1413*0Sstevel@tonic-gate srlx t_1,32,c_12 1414*0Sstevel@tonic-gate stuw t_1,rp(13) !r[13]=c2; 1415*0Sstevel@tonic-gate or c_12,c_3,c_12 1416*0Sstevel@tonic-gate 1417*0Sstevel@tonic-gate mulx a_7,a_7,t_1 !sqr_add_c(a,7,c3,c1,c2); 1418*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1419*0Sstevel@tonic-gate srlx t_1,32,c_12 1420*0Sstevel@tonic-gate stuw t_1,rp(14) !r[14]=c3; 1421*0Sstevel@tonic-gate stuw c_12,rp(15) !r[15]=c1; 1422*0Sstevel@tonic-gate 1423*0Sstevel@tonic-gate ret 1424*0Sstevel@tonic-gate restore %g0,%g0,%o0 1425*0Sstevel@tonic-gate 1426*0Sstevel@tonic-gate.type bn_sqr_comba8,#function 1427*0Sstevel@tonic-gate.size bn_sqr_comba8,(.-bn_sqr_comba8) 1428*0Sstevel@tonic-gate 1429*0Sstevel@tonic-gate.align 32 1430*0Sstevel@tonic-gate 1431*0Sstevel@tonic-gate.global bn_sqr_comba4 1432*0Sstevel@tonic-gate/* 1433*0Sstevel@tonic-gate * void bn_sqr_comba4(r,a) 1434*0Sstevel@tonic-gate * BN_ULONG *r,*a; 1435*0Sstevel@tonic-gate */ 1436*0Sstevel@tonic-gatebn_sqr_comba4: 1437*0Sstevel@tonic-gate save %sp,FRAME_SIZE,%sp 1438*0Sstevel@tonic-gate mov 1,t_2 1439*0Sstevel@tonic-gate lduw ap(0),a_0 1440*0Sstevel@tonic-gate sllx t_2,32,t_2 1441*0Sstevel@tonic-gate lduw ap(1),a_1 1442*0Sstevel@tonic-gate mulx a_0,a_0,t_1 !sqr_add_c(a,0,c1,c2,c3); 1443*0Sstevel@tonic-gate srlx t_1,32,c_12 1444*0Sstevel@tonic-gate stuw t_1,rp(0) !r[0]=c1; 1445*0Sstevel@tonic-gate 1446*0Sstevel@tonic-gate lduw ap(2),a_2 1447*0Sstevel@tonic-gate mulx a_0,a_1,t_1 !sqr_add_c2(a,1,0,c2,c3,c1); 1448*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1449*0Sstevel@tonic-gate clr c_3 1450*0Sstevel@tonic-gate bcs,a %xcc,.+8 1451*0Sstevel@tonic-gate add c_3,t_2,c_3 1452*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1453*0Sstevel@tonic-gate bcs,a %xcc,.+8 1454*0Sstevel@tonic-gate add c_3,t_2,c_3 1455*0Sstevel@tonic-gate srlx t_1,32,c_12 1456*0Sstevel@tonic-gate stuw t_1,rp(1) !r[1]=c2; 1457*0Sstevel@tonic-gate or c_12,c_3,c_12 1458*0Sstevel@tonic-gate 1459*0Sstevel@tonic-gate mulx a_2,a_0,t_1 !sqr_add_c2(a,2,0,c3,c1,c2); 1460*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1461*0Sstevel@tonic-gate clr c_3 1462*0Sstevel@tonic-gate bcs,a %xcc,.+8 1463*0Sstevel@tonic-gate add c_3,t_2,c_3 1464*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1465*0Sstevel@tonic-gate bcs,a %xcc,.+8 1466*0Sstevel@tonic-gate add c_3,t_2,c_3 1467*0Sstevel@tonic-gate lduw ap(3),a_3 1468*0Sstevel@tonic-gate mulx a_1,a_1,t_1 !sqr_add_c(a,1,c3,c1,c2); 1469*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1470*0Sstevel@tonic-gate bcs,a %xcc,.+8 1471*0Sstevel@tonic-gate add c_3,t_2,c_3 1472*0Sstevel@tonic-gate srlx t_1,32,c_12 1473*0Sstevel@tonic-gate stuw t_1,rp(2) !r[2]=c3; 1474*0Sstevel@tonic-gate or c_12,c_3,c_12 1475*0Sstevel@tonic-gate 1476*0Sstevel@tonic-gate mulx a_0,a_3,t_1 !sqr_add_c2(a,3,0,c1,c2,c3); 1477*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1478*0Sstevel@tonic-gate clr c_3 1479*0Sstevel@tonic-gate bcs,a %xcc,.+8 1480*0Sstevel@tonic-gate add c_3,t_2,c_3 1481*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1482*0Sstevel@tonic-gate bcs,a %xcc,.+8 1483*0Sstevel@tonic-gate add c_3,t_2,c_3 1484*0Sstevel@tonic-gate mulx a_1,a_2,t_1 !sqr_add_c2(a,2,1,c1,c2,c3); 1485*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1486*0Sstevel@tonic-gate bcs,a %xcc,.+8 1487*0Sstevel@tonic-gate add c_3,t_2,c_3 1488*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1489*0Sstevel@tonic-gate bcs,a %xcc,.+8 1490*0Sstevel@tonic-gate add c_3,t_2,c_3 1491*0Sstevel@tonic-gate srlx t_1,32,c_12 1492*0Sstevel@tonic-gate stuw t_1,rp(3) !r[3]=c1; 1493*0Sstevel@tonic-gate or c_12,c_3,c_12 1494*0Sstevel@tonic-gate 1495*0Sstevel@tonic-gate mulx a_3,a_1,t_1 !sqr_add_c2(a,3,1,c2,c3,c1); 1496*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1497*0Sstevel@tonic-gate clr c_3 1498*0Sstevel@tonic-gate bcs,a %xcc,.+8 1499*0Sstevel@tonic-gate add c_3,t_2,c_3 1500*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1501*0Sstevel@tonic-gate bcs,a %xcc,.+8 1502*0Sstevel@tonic-gate add c_3,t_2,c_3 1503*0Sstevel@tonic-gate mulx a_2,a_2,t_1 !sqr_add_c(a,2,c2,c3,c1); 1504*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1505*0Sstevel@tonic-gate bcs,a %xcc,.+8 1506*0Sstevel@tonic-gate add c_3,t_2,c_3 1507*0Sstevel@tonic-gate srlx t_1,32,c_12 1508*0Sstevel@tonic-gate stuw t_1,rp(4) !r[4]=c2; 1509*0Sstevel@tonic-gate or c_12,c_3,c_12 1510*0Sstevel@tonic-gate 1511*0Sstevel@tonic-gate mulx a_2,a_3,t_1 !sqr_add_c2(a,3,2,c3,c1,c2); 1512*0Sstevel@tonic-gate addcc c_12,t_1,c_12 1513*0Sstevel@tonic-gate clr c_3 1514*0Sstevel@tonic-gate bcs,a %xcc,.+8 1515*0Sstevel@tonic-gate add c_3,t_2,c_3 1516*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1517*0Sstevel@tonic-gate bcs,a %xcc,.+8 1518*0Sstevel@tonic-gate add c_3,t_2,c_3 1519*0Sstevel@tonic-gate srlx t_1,32,c_12 1520*0Sstevel@tonic-gate stuw t_1,rp(5) !r[5]=c3; 1521*0Sstevel@tonic-gate or c_12,c_3,c_12 1522*0Sstevel@tonic-gate 1523*0Sstevel@tonic-gate mulx a_3,a_3,t_1 !sqr_add_c(a,3,c1,c2,c3); 1524*0Sstevel@tonic-gate addcc c_12,t_1,t_1 1525*0Sstevel@tonic-gate srlx t_1,32,c_12 1526*0Sstevel@tonic-gate stuw t_1,rp(6) !r[6]=c1; 1527*0Sstevel@tonic-gate stuw c_12,rp(7) !r[7]=c2; 1528*0Sstevel@tonic-gate 1529*0Sstevel@tonic-gate ret 1530*0Sstevel@tonic-gate restore %g0,%g0,%o0 1531*0Sstevel@tonic-gate 1532*0Sstevel@tonic-gate.type bn_sqr_comba4,#function 1533*0Sstevel@tonic-gate.size bn_sqr_comba4,(.-bn_sqr_comba4) 1534*0Sstevel@tonic-gate 1535*0Sstevel@tonic-gate.align 32 1536