xref: /onnv-gate/usr/src/common/openssl/crypto/bn/asm/sparcv8plus.S (revision 0:68f95e015346)
1*0Sstevel@tonic-gate.ident	"sparcv8plus.s, Version 1.4"
2*0Sstevel@tonic-gate.ident	"SPARC v9 ISA artwork by Andy Polyakov <appro@fy.chalmers.se>"
3*0Sstevel@tonic-gate
4*0Sstevel@tonic-gate/*
5*0Sstevel@tonic-gate * ====================================================================
6*0Sstevel@tonic-gate * Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL
7*0Sstevel@tonic-gate * project.
8*0Sstevel@tonic-gate *
9*0Sstevel@tonic-gate * Rights for redistribution and usage in source and binary forms are
10*0Sstevel@tonic-gate * granted according to the OpenSSL license. Warranty of any kind is
11*0Sstevel@tonic-gate * disclaimed.
12*0Sstevel@tonic-gate * ====================================================================
13*0Sstevel@tonic-gate */
14*0Sstevel@tonic-gate
15*0Sstevel@tonic-gate/*
16*0Sstevel@tonic-gate * This is my modest contributon to OpenSSL project (see
17*0Sstevel@tonic-gate * http://www.openssl.org/ for more information about it) and is
18*0Sstevel@tonic-gate * a drop-in UltraSPARC ISA replacement for crypto/bn/bn_asm.c
19*0Sstevel@tonic-gate * module. For updates see http://fy.chalmers.se/~appro/hpe/.
20*0Sstevel@tonic-gate *
21*0Sstevel@tonic-gate * Questions-n-answers.
22*0Sstevel@tonic-gate *
23*0Sstevel@tonic-gate * Q. How to compile?
24*0Sstevel@tonic-gate * A. With SC4.x/SC5.x:
25*0Sstevel@tonic-gate *
26*0Sstevel@tonic-gate *	cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o
27*0Sstevel@tonic-gate *
28*0Sstevel@tonic-gate *    and with gcc:
29*0Sstevel@tonic-gate *
30*0Sstevel@tonic-gate *	gcc -mcpu=ultrasparc -c bn_asm.sparc.v8plus.S -o bn_asm.o
31*0Sstevel@tonic-gate *
32*0Sstevel@tonic-gate *    or if above fails (it does if you have gas installed):
33*0Sstevel@tonic-gate *
34*0Sstevel@tonic-gate *	gcc -E bn_asm.sparc.v8plus.S | as -xarch=v8plus /dev/fd/0 -o bn_asm.o
35*0Sstevel@tonic-gate *
36*0Sstevel@tonic-gate *    Quick-n-dirty way to fuse the module into the library.
37*0Sstevel@tonic-gate *    Provided that the library is already configured and built
38*0Sstevel@tonic-gate *    (in 0.9.2 case with no-asm option):
39*0Sstevel@tonic-gate *
40*0Sstevel@tonic-gate *	# cd crypto/bn
41*0Sstevel@tonic-gate *	# cp /some/place/bn_asm.sparc.v8plus.S .
42*0Sstevel@tonic-gate *	# cc -xarch=v8plus -c bn_asm.sparc.v8plus.S -o bn_asm.o
43*0Sstevel@tonic-gate *	# make
44*0Sstevel@tonic-gate *	# cd ../..
45*0Sstevel@tonic-gate *	# make; make test
46*0Sstevel@tonic-gate *
47*0Sstevel@tonic-gate *    Quick-n-dirty way to get rid of it:
48*0Sstevel@tonic-gate *
49*0Sstevel@tonic-gate *	# cd crypto/bn
50*0Sstevel@tonic-gate *	# touch bn_asm.c
51*0Sstevel@tonic-gate *	# make
52*0Sstevel@tonic-gate *	# cd ../..
53*0Sstevel@tonic-gate *	# make; make test
54*0Sstevel@tonic-gate *
55*0Sstevel@tonic-gate * Q. V8plus achitecture? What kind of beast is that?
56*0Sstevel@tonic-gate * A. Well, it's rather a programming model than an architecture...
57*0Sstevel@tonic-gate *    It's actually v9-compliant, i.e. *any* UltraSPARC, CPU under
58*0Sstevel@tonic-gate *    special conditions, namely when kernel doesn't preserve upper
59*0Sstevel@tonic-gate *    32 bits of otherwise 64-bit registers during a context switch.
60*0Sstevel@tonic-gate *
61*0Sstevel@tonic-gate * Q. Why just UltraSPARC? What about SuperSPARC?
62*0Sstevel@tonic-gate * A. Original release did target UltraSPARC only. Now SuperSPARC
63*0Sstevel@tonic-gate *    version is provided along. Both version share bn_*comba[48]
64*0Sstevel@tonic-gate *    implementations (see comment later in code for explanation).
65*0Sstevel@tonic-gate *    But what's so special about this UltraSPARC implementation?
66*0Sstevel@tonic-gate *    Why didn't I let compiler do the job? Trouble is that most of
67*0Sstevel@tonic-gate *    available compilers (well, SC5.0 is the only exception) don't
68*0Sstevel@tonic-gate *    attempt to take advantage of UltraSPARC's 64-bitness under
69*0Sstevel@tonic-gate *    32-bit kernels even though it's perfectly possible (see next
70*0Sstevel@tonic-gate *    question).
71*0Sstevel@tonic-gate *
72*0Sstevel@tonic-gate * Q. 64-bit registers under 32-bit kernels? Didn't you just say it
73*0Sstevel@tonic-gate *    doesn't work?
74*0Sstevel@tonic-gate * A. You can't adress *all* registers as 64-bit wide:-( The catch is
75*0Sstevel@tonic-gate *    that you actually may rely upon %o0-%o5 and %g1-%g4 being fully
76*0Sstevel@tonic-gate *    preserved if you're in a leaf function, i.e. such never calling
77*0Sstevel@tonic-gate *    any other functions. All functions in this module are leaf and
78*0Sstevel@tonic-gate *    10 registers is a handful. And as a matter of fact none-"comba"
79*0Sstevel@tonic-gate *    routines don't require even that much and I could even afford to
80*0Sstevel@tonic-gate *    not allocate own stack frame for 'em:-)
81*0Sstevel@tonic-gate *
82*0Sstevel@tonic-gate * Q. What about 64-bit kernels?
83*0Sstevel@tonic-gate * A. What about 'em? Just kidding:-) Pure 64-bit version is currently
84*0Sstevel@tonic-gate *    under evaluation and development...
85*0Sstevel@tonic-gate *
86*0Sstevel@tonic-gate * Q. What about shared libraries?
87*0Sstevel@tonic-gate * A. What about 'em? Kidding again:-) Code does *not* contain any
88*0Sstevel@tonic-gate *    code position dependencies and it's safe to include it into
89*0Sstevel@tonic-gate *    shared library as is.
90*0Sstevel@tonic-gate *
91*0Sstevel@tonic-gate * Q. How much faster does it go?
92*0Sstevel@tonic-gate * A. Do you have a good benchmark? In either case below is what I
93*0Sstevel@tonic-gate *    experience with crypto/bn/expspeed.c test program:
94*0Sstevel@tonic-gate *
95*0Sstevel@tonic-gate *	v8plus module on U10/300MHz against bn_asm.c compiled with:
96*0Sstevel@tonic-gate *
97*0Sstevel@tonic-gate *	cc-5.0 -xarch=v8plus -xO5 -xdepend	+7-12%
98*0Sstevel@tonic-gate *	cc-4.2 -xarch=v8plus -xO5 -xdepend	+25-35%
99*0Sstevel@tonic-gate *	egcs-1.1.2 -mcpu=ultrasparc -O3		+35-45%
100*0Sstevel@tonic-gate *
101*0Sstevel@tonic-gate *	v8 module on SS10/60MHz against bn_asm.c compiled with:
102*0Sstevel@tonic-gate *
103*0Sstevel@tonic-gate *	cc-5.0 -xarch=v8 -xO5 -xdepend		+7-10%
104*0Sstevel@tonic-gate *	cc-4.2 -xarch=v8 -xO5 -xdepend		+10%
105*0Sstevel@tonic-gate *	egcs-1.1.2 -mv8 -O3			+35-45%
106*0Sstevel@tonic-gate *
107*0Sstevel@tonic-gate *    As you can see it's damn hard to beat the new Sun C compiler
108*0Sstevel@tonic-gate *    and it's in first place GNU C users who will appreciate this
109*0Sstevel@tonic-gate *    assembler implementation:-)
110*0Sstevel@tonic-gate */
111*0Sstevel@tonic-gate
112*0Sstevel@tonic-gate/*
113*0Sstevel@tonic-gate * Revision history.
114*0Sstevel@tonic-gate *
115*0Sstevel@tonic-gate * 1.0	- initial release;
116*0Sstevel@tonic-gate * 1.1	- new loop unrolling model(*);
117*0Sstevel@tonic-gate *	- some more fine tuning;
118*0Sstevel@tonic-gate * 1.2	- made gas friendly;
119*0Sstevel@tonic-gate *	- updates to documentation concerning v9;
120*0Sstevel@tonic-gate *	- new performance comparison matrix;
121*0Sstevel@tonic-gate * 1.3	- fixed problem with /usr/ccs/lib/cpp;
122*0Sstevel@tonic-gate * 1.4	- native V9 bn_*_comba[48] implementation (15% more efficient)
123*0Sstevel@tonic-gate *	  resulting in slight overall performance kick;
124*0Sstevel@tonic-gate *	- some retunes;
125*0Sstevel@tonic-gate *	- support for GNU as added;
126*0Sstevel@tonic-gate *
127*0Sstevel@tonic-gate * (*)	Originally unrolled loop looked like this:
128*0Sstevel@tonic-gate *	    for (;;) {
129*0Sstevel@tonic-gate *		op(p+0); if (--n==0) break;
130*0Sstevel@tonic-gate *		op(p+1); if (--n==0) break;
131*0Sstevel@tonic-gate *		op(p+2); if (--n==0) break;
132*0Sstevel@tonic-gate *		op(p+3); if (--n==0) break;
133*0Sstevel@tonic-gate *		p+=4;
134*0Sstevel@tonic-gate *	    }
135*0Sstevel@tonic-gate *	I unroll according to following:
136*0Sstevel@tonic-gate *	    while (n&~3) {
137*0Sstevel@tonic-gate *		op(p+0); op(p+1); op(p+2); op(p+3);
138*0Sstevel@tonic-gate *		p+=4; n=-4;
139*0Sstevel@tonic-gate *	    }
140*0Sstevel@tonic-gate *	    if (n) {
141*0Sstevel@tonic-gate *		op(p+0); if (--n==0) return;
142*0Sstevel@tonic-gate *		op(p+2); if (--n==0) return;
143*0Sstevel@tonic-gate *		op(p+3); return;
144*0Sstevel@tonic-gate *	    }
145*0Sstevel@tonic-gate */
146*0Sstevel@tonic-gate
147*0Sstevel@tonic-gate/*
148*0Sstevel@tonic-gate * GNU assembler can't stand stuw:-(
149*0Sstevel@tonic-gate */
150*0Sstevel@tonic-gate#define stuw st
151*0Sstevel@tonic-gate
152*0Sstevel@tonic-gate.section	".text",#alloc,#execinstr
153*0Sstevel@tonic-gate.file		"bn_asm.sparc.v8plus.S"
154*0Sstevel@tonic-gate
155*0Sstevel@tonic-gate.align	32
156*0Sstevel@tonic-gate
157*0Sstevel@tonic-gate.global bn_mul_add_words
158*0Sstevel@tonic-gate/*
159*0Sstevel@tonic-gate * BN_ULONG bn_mul_add_words(rp,ap,num,w)
160*0Sstevel@tonic-gate * BN_ULONG *rp,*ap;
161*0Sstevel@tonic-gate * int num;
162*0Sstevel@tonic-gate * BN_ULONG w;
163*0Sstevel@tonic-gate */
164*0Sstevel@tonic-gatebn_mul_add_words:
165*0Sstevel@tonic-gate	brgz,a	%o2,.L_bn_mul_add_words_proceed
166*0Sstevel@tonic-gate	lduw	[%o1],%g2
167*0Sstevel@tonic-gate	retl
168*0Sstevel@tonic-gate	clr	%o0
169*0Sstevel@tonic-gate
170*0Sstevel@tonic-gate.L_bn_mul_add_words_proceed:
171*0Sstevel@tonic-gate	srl	%o3,%g0,%o3	! clruw	%o3
172*0Sstevel@tonic-gate	andcc	%o2,-4,%g0
173*0Sstevel@tonic-gate	bz,pn	%icc,.L_bn_mul_add_words_tail
174*0Sstevel@tonic-gate	clr	%o5
175*0Sstevel@tonic-gate
176*0Sstevel@tonic-gate.L_bn_mul_add_words_loop:	! wow! 32 aligned!
177*0Sstevel@tonic-gate	lduw	[%o0],%g1
178*0Sstevel@tonic-gate	lduw	[%o1+4],%g3
179*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
180*0Sstevel@tonic-gate	add	%g1,%o5,%o4
181*0Sstevel@tonic-gate	nop
182*0Sstevel@tonic-gate	add	%o4,%g2,%o4
183*0Sstevel@tonic-gate	stuw	%o4,[%o0]
184*0Sstevel@tonic-gate	srlx	%o4,32,%o5
185*0Sstevel@tonic-gate
186*0Sstevel@tonic-gate	lduw	[%o0+4],%g1
187*0Sstevel@tonic-gate	lduw	[%o1+8],%g2
188*0Sstevel@tonic-gate	mulx	%o3,%g3,%g3
189*0Sstevel@tonic-gate	add	%g1,%o5,%o4
190*0Sstevel@tonic-gate	dec	4,%o2
191*0Sstevel@tonic-gate	add	%o4,%g3,%o4
192*0Sstevel@tonic-gate	stuw	%o4,[%o0+4]
193*0Sstevel@tonic-gate	srlx	%o4,32,%o5
194*0Sstevel@tonic-gate
195*0Sstevel@tonic-gate	lduw	[%o0+8],%g1
196*0Sstevel@tonic-gate	lduw	[%o1+12],%g3
197*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
198*0Sstevel@tonic-gate	add	%g1,%o5,%o4
199*0Sstevel@tonic-gate	inc	16,%o1
200*0Sstevel@tonic-gate	add	%o4,%g2,%o4
201*0Sstevel@tonic-gate	stuw	%o4,[%o0+8]
202*0Sstevel@tonic-gate	srlx	%o4,32,%o5
203*0Sstevel@tonic-gate
204*0Sstevel@tonic-gate	lduw	[%o0+12],%g1
205*0Sstevel@tonic-gate	mulx	%o3,%g3,%g3
206*0Sstevel@tonic-gate	add	%g1,%o5,%o4
207*0Sstevel@tonic-gate	inc	16,%o0
208*0Sstevel@tonic-gate	add	%o4,%g3,%o4
209*0Sstevel@tonic-gate	andcc	%o2,-4,%g0
210*0Sstevel@tonic-gate	stuw	%o4,[%o0-4]
211*0Sstevel@tonic-gate	srlx	%o4,32,%o5
212*0Sstevel@tonic-gate	bnz,a,pt	%icc,.L_bn_mul_add_words_loop
213*0Sstevel@tonic-gate	lduw	[%o1],%g2
214*0Sstevel@tonic-gate
215*0Sstevel@tonic-gate	brnz,a,pn	%o2,.L_bn_mul_add_words_tail
216*0Sstevel@tonic-gate	lduw	[%o1],%g2
217*0Sstevel@tonic-gate.L_bn_mul_add_words_return:
218*0Sstevel@tonic-gate	retl
219*0Sstevel@tonic-gate	mov	%o5,%o0
220*0Sstevel@tonic-gate
221*0Sstevel@tonic-gate.L_bn_mul_add_words_tail:
222*0Sstevel@tonic-gate	lduw	[%o0],%g1
223*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
224*0Sstevel@tonic-gate	add	%g1,%o5,%o4
225*0Sstevel@tonic-gate	dec	%o2
226*0Sstevel@tonic-gate	add	%o4,%g2,%o4
227*0Sstevel@tonic-gate	srlx	%o4,32,%o5
228*0Sstevel@tonic-gate	brz,pt	%o2,.L_bn_mul_add_words_return
229*0Sstevel@tonic-gate	stuw	%o4,[%o0]
230*0Sstevel@tonic-gate
231*0Sstevel@tonic-gate	lduw	[%o1+4],%g2
232*0Sstevel@tonic-gate	lduw	[%o0+4],%g1
233*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
234*0Sstevel@tonic-gate	add	%g1,%o5,%o4
235*0Sstevel@tonic-gate	dec	%o2
236*0Sstevel@tonic-gate	add	%o4,%g2,%o4
237*0Sstevel@tonic-gate	srlx	%o4,32,%o5
238*0Sstevel@tonic-gate	brz,pt	%o2,.L_bn_mul_add_words_return
239*0Sstevel@tonic-gate	stuw	%o4,[%o0+4]
240*0Sstevel@tonic-gate
241*0Sstevel@tonic-gate	lduw	[%o1+8],%g2
242*0Sstevel@tonic-gate	lduw	[%o0+8],%g1
243*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
244*0Sstevel@tonic-gate	add	%g1,%o5,%o4
245*0Sstevel@tonic-gate	add	%o4,%g2,%o4
246*0Sstevel@tonic-gate	stuw	%o4,[%o0+8]
247*0Sstevel@tonic-gate	retl
248*0Sstevel@tonic-gate	srlx	%o4,32,%o0
249*0Sstevel@tonic-gate
250*0Sstevel@tonic-gate.type	bn_mul_add_words,#function
251*0Sstevel@tonic-gate.size	bn_mul_add_words,(.-bn_mul_add_words)
252*0Sstevel@tonic-gate
253*0Sstevel@tonic-gate.align	32
254*0Sstevel@tonic-gate
255*0Sstevel@tonic-gate.global bn_mul_words
256*0Sstevel@tonic-gate/*
257*0Sstevel@tonic-gate * BN_ULONG bn_mul_words(rp,ap,num,w)
258*0Sstevel@tonic-gate * BN_ULONG *rp,*ap;
259*0Sstevel@tonic-gate * int num;
260*0Sstevel@tonic-gate * BN_ULONG w;
261*0Sstevel@tonic-gate */
262*0Sstevel@tonic-gatebn_mul_words:
263*0Sstevel@tonic-gate	brgz,a	%o2,.L_bn_mul_words_proceeed
264*0Sstevel@tonic-gate	lduw	[%o1],%g2
265*0Sstevel@tonic-gate	retl
266*0Sstevel@tonic-gate	clr	%o0
267*0Sstevel@tonic-gate
268*0Sstevel@tonic-gate.L_bn_mul_words_proceeed:
269*0Sstevel@tonic-gate	srl	%o3,%g0,%o3	! clruw	%o3
270*0Sstevel@tonic-gate	andcc	%o2,-4,%g0
271*0Sstevel@tonic-gate	bz,pn	%icc,.L_bn_mul_words_tail
272*0Sstevel@tonic-gate	clr	%o5
273*0Sstevel@tonic-gate
274*0Sstevel@tonic-gate.L_bn_mul_words_loop:		! wow! 32 aligned!
275*0Sstevel@tonic-gate	lduw	[%o1+4],%g3
276*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
277*0Sstevel@tonic-gate	add	%g2,%o5,%o4
278*0Sstevel@tonic-gate	nop
279*0Sstevel@tonic-gate	stuw	%o4,[%o0]
280*0Sstevel@tonic-gate	srlx	%o4,32,%o5
281*0Sstevel@tonic-gate
282*0Sstevel@tonic-gate	lduw	[%o1+8],%g2
283*0Sstevel@tonic-gate	mulx	%o3,%g3,%g3
284*0Sstevel@tonic-gate	add	%g3,%o5,%o4
285*0Sstevel@tonic-gate	dec	4,%o2
286*0Sstevel@tonic-gate	stuw	%o4,[%o0+4]
287*0Sstevel@tonic-gate	srlx	%o4,32,%o5
288*0Sstevel@tonic-gate
289*0Sstevel@tonic-gate	lduw	[%o1+12],%g3
290*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
291*0Sstevel@tonic-gate	add	%g2,%o5,%o4
292*0Sstevel@tonic-gate	inc	16,%o1
293*0Sstevel@tonic-gate	stuw	%o4,[%o0+8]
294*0Sstevel@tonic-gate	srlx	%o4,32,%o5
295*0Sstevel@tonic-gate
296*0Sstevel@tonic-gate	mulx	%o3,%g3,%g3
297*0Sstevel@tonic-gate	add	%g3,%o5,%o4
298*0Sstevel@tonic-gate	inc	16,%o0
299*0Sstevel@tonic-gate	stuw	%o4,[%o0-4]
300*0Sstevel@tonic-gate	srlx	%o4,32,%o5
301*0Sstevel@tonic-gate	andcc	%o2,-4,%g0
302*0Sstevel@tonic-gate	bnz,a,pt	%icc,.L_bn_mul_words_loop
303*0Sstevel@tonic-gate	lduw	[%o1],%g2
304*0Sstevel@tonic-gate	nop
305*0Sstevel@tonic-gate	nop
306*0Sstevel@tonic-gate
307*0Sstevel@tonic-gate	brnz,a,pn	%o2,.L_bn_mul_words_tail
308*0Sstevel@tonic-gate	lduw	[%o1],%g2
309*0Sstevel@tonic-gate.L_bn_mul_words_return:
310*0Sstevel@tonic-gate	retl
311*0Sstevel@tonic-gate	mov	%o5,%o0
312*0Sstevel@tonic-gate
313*0Sstevel@tonic-gate.L_bn_mul_words_tail:
314*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
315*0Sstevel@tonic-gate	add	%g2,%o5,%o4
316*0Sstevel@tonic-gate	dec	%o2
317*0Sstevel@tonic-gate	srlx	%o4,32,%o5
318*0Sstevel@tonic-gate	brz,pt	%o2,.L_bn_mul_words_return
319*0Sstevel@tonic-gate	stuw	%o4,[%o0]
320*0Sstevel@tonic-gate
321*0Sstevel@tonic-gate	lduw	[%o1+4],%g2
322*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
323*0Sstevel@tonic-gate	add	%g2,%o5,%o4
324*0Sstevel@tonic-gate	dec	%o2
325*0Sstevel@tonic-gate	srlx	%o4,32,%o5
326*0Sstevel@tonic-gate	brz,pt	%o2,.L_bn_mul_words_return
327*0Sstevel@tonic-gate	stuw	%o4,[%o0+4]
328*0Sstevel@tonic-gate
329*0Sstevel@tonic-gate	lduw	[%o1+8],%g2
330*0Sstevel@tonic-gate	mulx	%o3,%g2,%g2
331*0Sstevel@tonic-gate	add	%g2,%o5,%o4
332*0Sstevel@tonic-gate	stuw	%o4,[%o0+8]
333*0Sstevel@tonic-gate	retl
334*0Sstevel@tonic-gate	srlx	%o4,32,%o0
335*0Sstevel@tonic-gate
336*0Sstevel@tonic-gate.type	bn_mul_words,#function
337*0Sstevel@tonic-gate.size	bn_mul_words,(.-bn_mul_words)
338*0Sstevel@tonic-gate
339*0Sstevel@tonic-gate.align  32
340*0Sstevel@tonic-gate.global	bn_sqr_words
341*0Sstevel@tonic-gate/*
342*0Sstevel@tonic-gate * void bn_sqr_words(r,a,n)
343*0Sstevel@tonic-gate * BN_ULONG *r,*a;
344*0Sstevel@tonic-gate * int n;
345*0Sstevel@tonic-gate */
346*0Sstevel@tonic-gatebn_sqr_words:
347*0Sstevel@tonic-gate	brgz,a	%o2,.L_bn_sqr_words_proceeed
348*0Sstevel@tonic-gate	lduw	[%o1],%g2
349*0Sstevel@tonic-gate	retl
350*0Sstevel@tonic-gate	clr	%o0
351*0Sstevel@tonic-gate
352*0Sstevel@tonic-gate.L_bn_sqr_words_proceeed:
353*0Sstevel@tonic-gate	andcc	%o2,-4,%g0
354*0Sstevel@tonic-gate	nop
355*0Sstevel@tonic-gate	bz,pn	%icc,.L_bn_sqr_words_tail
356*0Sstevel@tonic-gate	nop
357*0Sstevel@tonic-gate
358*0Sstevel@tonic-gate.L_bn_sqr_words_loop:		! wow! 32 aligned!
359*0Sstevel@tonic-gate	lduw	[%o1+4],%g3
360*0Sstevel@tonic-gate	mulx	%g2,%g2,%o4
361*0Sstevel@tonic-gate	stuw	%o4,[%o0]
362*0Sstevel@tonic-gate	srlx	%o4,32,%o5
363*0Sstevel@tonic-gate	stuw	%o5,[%o0+4]
364*0Sstevel@tonic-gate	nop
365*0Sstevel@tonic-gate
366*0Sstevel@tonic-gate	lduw	[%o1+8],%g2
367*0Sstevel@tonic-gate	mulx	%g3,%g3,%o4
368*0Sstevel@tonic-gate	dec	4,%o2
369*0Sstevel@tonic-gate	stuw	%o4,[%o0+8]
370*0Sstevel@tonic-gate	srlx	%o4,32,%o5
371*0Sstevel@tonic-gate	stuw	%o5,[%o0+12]
372*0Sstevel@tonic-gate
373*0Sstevel@tonic-gate	lduw	[%o1+12],%g3
374*0Sstevel@tonic-gate	mulx	%g2,%g2,%o4
375*0Sstevel@tonic-gate	srlx	%o4,32,%o5
376*0Sstevel@tonic-gate	stuw	%o4,[%o0+16]
377*0Sstevel@tonic-gate	inc	16,%o1
378*0Sstevel@tonic-gate	stuw	%o5,[%o0+20]
379*0Sstevel@tonic-gate
380*0Sstevel@tonic-gate	mulx	%g3,%g3,%o4
381*0Sstevel@tonic-gate	inc	32,%o0
382*0Sstevel@tonic-gate	stuw	%o4,[%o0-8]
383*0Sstevel@tonic-gate	srlx	%o4,32,%o5
384*0Sstevel@tonic-gate	andcc	%o2,-4,%g2
385*0Sstevel@tonic-gate	stuw	%o5,[%o0-4]
386*0Sstevel@tonic-gate	bnz,a,pt	%icc,.L_bn_sqr_words_loop
387*0Sstevel@tonic-gate	lduw	[%o1],%g2
388*0Sstevel@tonic-gate	nop
389*0Sstevel@tonic-gate
390*0Sstevel@tonic-gate	brnz,a,pn	%o2,.L_bn_sqr_words_tail
391*0Sstevel@tonic-gate	lduw	[%o1],%g2
392*0Sstevel@tonic-gate.L_bn_sqr_words_return:
393*0Sstevel@tonic-gate	retl
394*0Sstevel@tonic-gate	clr	%o0
395*0Sstevel@tonic-gate
396*0Sstevel@tonic-gate.L_bn_sqr_words_tail:
397*0Sstevel@tonic-gate	mulx	%g2,%g2,%o4
398*0Sstevel@tonic-gate	dec	%o2
399*0Sstevel@tonic-gate	stuw	%o4,[%o0]
400*0Sstevel@tonic-gate	srlx	%o4,32,%o5
401*0Sstevel@tonic-gate	brz,pt	%o2,.L_bn_sqr_words_return
402*0Sstevel@tonic-gate	stuw	%o5,[%o0+4]
403*0Sstevel@tonic-gate
404*0Sstevel@tonic-gate	lduw	[%o1+4],%g2
405*0Sstevel@tonic-gate	mulx	%g2,%g2,%o4
406*0Sstevel@tonic-gate	dec	%o2
407*0Sstevel@tonic-gate	stuw	%o4,[%o0+8]
408*0Sstevel@tonic-gate	srlx	%o4,32,%o5
409*0Sstevel@tonic-gate	brz,pt	%o2,.L_bn_sqr_words_return
410*0Sstevel@tonic-gate	stuw	%o5,[%o0+12]
411*0Sstevel@tonic-gate
412*0Sstevel@tonic-gate	lduw	[%o1+8],%g2
413*0Sstevel@tonic-gate	mulx	%g2,%g2,%o4
414*0Sstevel@tonic-gate	srlx	%o4,32,%o5
415*0Sstevel@tonic-gate	stuw	%o4,[%o0+16]
416*0Sstevel@tonic-gate	stuw	%o5,[%o0+20]
417*0Sstevel@tonic-gate	retl
418*0Sstevel@tonic-gate	clr	%o0
419*0Sstevel@tonic-gate
420*0Sstevel@tonic-gate.type	bn_sqr_words,#function
421*0Sstevel@tonic-gate.size	bn_sqr_words,(.-bn_sqr_words)
422*0Sstevel@tonic-gate
423*0Sstevel@tonic-gate.align	32
424*0Sstevel@tonic-gate.global bn_div_words
425*0Sstevel@tonic-gate/*
426*0Sstevel@tonic-gate * BN_ULONG bn_div_words(h,l,d)
427*0Sstevel@tonic-gate * BN_ULONG h,l,d;
428*0Sstevel@tonic-gate */
429*0Sstevel@tonic-gatebn_div_words:
430*0Sstevel@tonic-gate	sllx	%o0,32,%o0
431*0Sstevel@tonic-gate	or	%o0,%o1,%o0
432*0Sstevel@tonic-gate	udivx	%o0,%o2,%o0
433*0Sstevel@tonic-gate	retl
434*0Sstevel@tonic-gate	srl	%o0,%g0,%o0	! clruw	%o0
435*0Sstevel@tonic-gate
436*0Sstevel@tonic-gate.type	bn_div_words,#function
437*0Sstevel@tonic-gate.size	bn_div_words,(.-bn_div_words)
438*0Sstevel@tonic-gate
439*0Sstevel@tonic-gate.align	32
440*0Sstevel@tonic-gate
441*0Sstevel@tonic-gate.global bn_add_words
442*0Sstevel@tonic-gate/*
443*0Sstevel@tonic-gate * BN_ULONG bn_add_words(rp,ap,bp,n)
444*0Sstevel@tonic-gate * BN_ULONG *rp,*ap,*bp;
445*0Sstevel@tonic-gate * int n;
446*0Sstevel@tonic-gate */
447*0Sstevel@tonic-gatebn_add_words:
448*0Sstevel@tonic-gate	brgz,a	%o3,.L_bn_add_words_proceed
449*0Sstevel@tonic-gate	lduw	[%o1],%o4
450*0Sstevel@tonic-gate	retl
451*0Sstevel@tonic-gate	clr	%o0
452*0Sstevel@tonic-gate
453*0Sstevel@tonic-gate.L_bn_add_words_proceed:
454*0Sstevel@tonic-gate	andcc	%o3,-4,%g0
455*0Sstevel@tonic-gate	bz,pn	%icc,.L_bn_add_words_tail
456*0Sstevel@tonic-gate	addcc	%g0,0,%g0	! clear carry flag
457*0Sstevel@tonic-gate	nop
458*0Sstevel@tonic-gate
459*0Sstevel@tonic-gate.L_bn_add_words_loop:		! wow! 32 aligned!
460*0Sstevel@tonic-gate	dec	4,%o3
461*0Sstevel@tonic-gate	lduw	[%o2],%o5
462*0Sstevel@tonic-gate	lduw	[%o1+4],%g1
463*0Sstevel@tonic-gate	lduw	[%o2+4],%g2
464*0Sstevel@tonic-gate	lduw	[%o1+8],%g3
465*0Sstevel@tonic-gate	lduw	[%o2+8],%g4
466*0Sstevel@tonic-gate	addccc	%o5,%o4,%o5
467*0Sstevel@tonic-gate	stuw	%o5,[%o0]
468*0Sstevel@tonic-gate
469*0Sstevel@tonic-gate	lduw	[%o1+12],%o4
470*0Sstevel@tonic-gate	lduw	[%o2+12],%o5
471*0Sstevel@tonic-gate	inc	16,%o1
472*0Sstevel@tonic-gate	addccc	%g1,%g2,%g1
473*0Sstevel@tonic-gate	stuw	%g1,[%o0+4]
474*0Sstevel@tonic-gate
475*0Sstevel@tonic-gate	inc	16,%o2
476*0Sstevel@tonic-gate	addccc	%g3,%g4,%g3
477*0Sstevel@tonic-gate	stuw	%g3,[%o0+8]
478*0Sstevel@tonic-gate
479*0Sstevel@tonic-gate	inc	16,%o0
480*0Sstevel@tonic-gate	addccc	%o5,%o4,%o5
481*0Sstevel@tonic-gate	stuw	%o5,[%o0-4]
482*0Sstevel@tonic-gate	and	%o3,-4,%g1
483*0Sstevel@tonic-gate	brnz,a,pt	%g1,.L_bn_add_words_loop
484*0Sstevel@tonic-gate	lduw	[%o1],%o4
485*0Sstevel@tonic-gate
486*0Sstevel@tonic-gate	brnz,a,pn	%o3,.L_bn_add_words_tail
487*0Sstevel@tonic-gate	lduw	[%o1],%o4
488*0Sstevel@tonic-gate.L_bn_add_words_return:
489*0Sstevel@tonic-gate	clr	%o0
490*0Sstevel@tonic-gate	retl
491*0Sstevel@tonic-gate	movcs	%icc,1,%o0
492*0Sstevel@tonic-gate	nop
493*0Sstevel@tonic-gate
494*0Sstevel@tonic-gate.L_bn_add_words_tail:
495*0Sstevel@tonic-gate	lduw	[%o2],%o5
496*0Sstevel@tonic-gate	dec	%o3
497*0Sstevel@tonic-gate	addccc	%o5,%o4,%o5
498*0Sstevel@tonic-gate	brz,pt	%o3,.L_bn_add_words_return
499*0Sstevel@tonic-gate	stuw	%o5,[%o0]
500*0Sstevel@tonic-gate
501*0Sstevel@tonic-gate	lduw	[%o1+4],%o4
502*0Sstevel@tonic-gate	lduw	[%o2+4],%o5
503*0Sstevel@tonic-gate	dec	%o3
504*0Sstevel@tonic-gate	addccc	%o5,%o4,%o5
505*0Sstevel@tonic-gate	brz,pt	%o3,.L_bn_add_words_return
506*0Sstevel@tonic-gate	stuw	%o5,[%o0+4]
507*0Sstevel@tonic-gate
508*0Sstevel@tonic-gate	lduw	[%o1+8],%o4
509*0Sstevel@tonic-gate	lduw	[%o2+8],%o5
510*0Sstevel@tonic-gate	addccc	%o5,%o4,%o5
511*0Sstevel@tonic-gate	stuw	%o5,[%o0+8]
512*0Sstevel@tonic-gate	clr	%o0
513*0Sstevel@tonic-gate	retl
514*0Sstevel@tonic-gate	movcs	%icc,1,%o0
515*0Sstevel@tonic-gate
516*0Sstevel@tonic-gate.type	bn_add_words,#function
517*0Sstevel@tonic-gate.size	bn_add_words,(.-bn_add_words)
518*0Sstevel@tonic-gate
519*0Sstevel@tonic-gate.global bn_sub_words
520*0Sstevel@tonic-gate/*
521*0Sstevel@tonic-gate * BN_ULONG bn_sub_words(rp,ap,bp,n)
522*0Sstevel@tonic-gate * BN_ULONG *rp,*ap,*bp;
523*0Sstevel@tonic-gate * int n;
524*0Sstevel@tonic-gate */
525*0Sstevel@tonic-gatebn_sub_words:
526*0Sstevel@tonic-gate	brgz,a	%o3,.L_bn_sub_words_proceed
527*0Sstevel@tonic-gate	lduw	[%o1],%o4
528*0Sstevel@tonic-gate	retl
529*0Sstevel@tonic-gate	clr	%o0
530*0Sstevel@tonic-gate
531*0Sstevel@tonic-gate.L_bn_sub_words_proceed:
532*0Sstevel@tonic-gate	andcc	%o3,-4,%g0
533*0Sstevel@tonic-gate	bz,pn	%icc,.L_bn_sub_words_tail
534*0Sstevel@tonic-gate	addcc	%g0,0,%g0	! clear carry flag
535*0Sstevel@tonic-gate	nop
536*0Sstevel@tonic-gate
537*0Sstevel@tonic-gate.L_bn_sub_words_loop:		! wow! 32 aligned!
538*0Sstevel@tonic-gate	dec	4,%o3
539*0Sstevel@tonic-gate	lduw	[%o2],%o5
540*0Sstevel@tonic-gate	lduw	[%o1+4],%g1
541*0Sstevel@tonic-gate	lduw	[%o2+4],%g2
542*0Sstevel@tonic-gate	lduw	[%o1+8],%g3
543*0Sstevel@tonic-gate	lduw	[%o2+8],%g4
544*0Sstevel@tonic-gate	subccc	%o4,%o5,%o5
545*0Sstevel@tonic-gate	stuw	%o5,[%o0]
546*0Sstevel@tonic-gate
547*0Sstevel@tonic-gate	lduw	[%o1+12],%o4
548*0Sstevel@tonic-gate	lduw	[%o2+12],%o5
549*0Sstevel@tonic-gate	inc	16,%o1
550*0Sstevel@tonic-gate	subccc	%g1,%g2,%g2
551*0Sstevel@tonic-gate	stuw	%g2,[%o0+4]
552*0Sstevel@tonic-gate
553*0Sstevel@tonic-gate	inc	16,%o2
554*0Sstevel@tonic-gate	subccc	%g3,%g4,%g4
555*0Sstevel@tonic-gate	stuw	%g4,[%o0+8]
556*0Sstevel@tonic-gate
557*0Sstevel@tonic-gate	inc	16,%o0
558*0Sstevel@tonic-gate	subccc	%o4,%o5,%o5
559*0Sstevel@tonic-gate	stuw	%o5,[%o0-4]
560*0Sstevel@tonic-gate	and	%o3,-4,%g1
561*0Sstevel@tonic-gate	brnz,a,pt	%g1,.L_bn_sub_words_loop
562*0Sstevel@tonic-gate	lduw	[%o1],%o4
563*0Sstevel@tonic-gate
564*0Sstevel@tonic-gate	brnz,a,pn	%o3,.L_bn_sub_words_tail
565*0Sstevel@tonic-gate	lduw	[%o1],%o4
566*0Sstevel@tonic-gate.L_bn_sub_words_return:
567*0Sstevel@tonic-gate	clr	%o0
568*0Sstevel@tonic-gate	retl
569*0Sstevel@tonic-gate	movcs	%icc,1,%o0
570*0Sstevel@tonic-gate	nop
571*0Sstevel@tonic-gate
572*0Sstevel@tonic-gate.L_bn_sub_words_tail:		! wow! 32 aligned!
573*0Sstevel@tonic-gate	lduw	[%o2],%o5
574*0Sstevel@tonic-gate	dec	%o3
575*0Sstevel@tonic-gate	subccc	%o4,%o5,%o5
576*0Sstevel@tonic-gate	brz,pt	%o3,.L_bn_sub_words_return
577*0Sstevel@tonic-gate	stuw	%o5,[%o0]
578*0Sstevel@tonic-gate
579*0Sstevel@tonic-gate	lduw	[%o1+4],%o4
580*0Sstevel@tonic-gate	lduw	[%o2+4],%o5
581*0Sstevel@tonic-gate	dec	%o3
582*0Sstevel@tonic-gate	subccc	%o4,%o5,%o5
583*0Sstevel@tonic-gate	brz,pt	%o3,.L_bn_sub_words_return
584*0Sstevel@tonic-gate	stuw	%o5,[%o0+4]
585*0Sstevel@tonic-gate
586*0Sstevel@tonic-gate	lduw	[%o1+8],%o4
587*0Sstevel@tonic-gate	lduw	[%o2+8],%o5
588*0Sstevel@tonic-gate	subccc	%o4,%o5,%o5
589*0Sstevel@tonic-gate	stuw	%o5,[%o0+8]
590*0Sstevel@tonic-gate	clr	%o0
591*0Sstevel@tonic-gate	retl
592*0Sstevel@tonic-gate	movcs	%icc,1,%o0
593*0Sstevel@tonic-gate
594*0Sstevel@tonic-gate.type	bn_sub_words,#function
595*0Sstevel@tonic-gate.size	bn_sub_words,(.-bn_sub_words)
596*0Sstevel@tonic-gate
597*0Sstevel@tonic-gate/*
598*0Sstevel@tonic-gate * Code below depends on the fact that upper parts of the %l0-%l7
599*0Sstevel@tonic-gate * and %i0-%i7 are zeroed by kernel after context switch. In
600*0Sstevel@tonic-gate * previous versions this comment stated that "the trouble is that
601*0Sstevel@tonic-gate * it's not feasible to implement the mumbo-jumbo in less V9
602*0Sstevel@tonic-gate * instructions:-(" which apparently isn't true thanks to
603*0Sstevel@tonic-gate * 'bcs,a %xcc,.+8; inc %rd' pair. But the performance improvement
604*0Sstevel@tonic-gate * results not from the shorter code, but from elimination of
605*0Sstevel@tonic-gate * multicycle none-pairable 'rd %y,%rd' instructions.
606*0Sstevel@tonic-gate *
607*0Sstevel@tonic-gate *							Andy.
608*0Sstevel@tonic-gate */
609*0Sstevel@tonic-gate
610*0Sstevel@tonic-gate#define FRAME_SIZE	-96
611*0Sstevel@tonic-gate
612*0Sstevel@tonic-gate/*
613*0Sstevel@tonic-gate * Here is register usage map for *all* routines below.
614*0Sstevel@tonic-gate */
615*0Sstevel@tonic-gate#define t_1	%o0
616*0Sstevel@tonic-gate#define	t_2	%o1
617*0Sstevel@tonic-gate#define c_12	%o2
618*0Sstevel@tonic-gate#define c_3	%o3
619*0Sstevel@tonic-gate
620*0Sstevel@tonic-gate#define ap(I)	[%i1+4*I]
621*0Sstevel@tonic-gate#define bp(I)	[%i2+4*I]
622*0Sstevel@tonic-gate#define rp(I)	[%i0+4*I]
623*0Sstevel@tonic-gate
624*0Sstevel@tonic-gate#define	a_0	%l0
625*0Sstevel@tonic-gate#define	a_1	%l1
626*0Sstevel@tonic-gate#define	a_2	%l2
627*0Sstevel@tonic-gate#define	a_3	%l3
628*0Sstevel@tonic-gate#define	a_4	%l4
629*0Sstevel@tonic-gate#define	a_5	%l5
630*0Sstevel@tonic-gate#define	a_6	%l6
631*0Sstevel@tonic-gate#define	a_7	%l7
632*0Sstevel@tonic-gate
633*0Sstevel@tonic-gate#define	b_0	%i3
634*0Sstevel@tonic-gate#define	b_1	%i4
635*0Sstevel@tonic-gate#define	b_2	%i5
636*0Sstevel@tonic-gate#define	b_3	%o4
637*0Sstevel@tonic-gate#define	b_4	%o5
638*0Sstevel@tonic-gate#define	b_5	%o7
639*0Sstevel@tonic-gate#define	b_6	%g1
640*0Sstevel@tonic-gate#define	b_7	%g4
641*0Sstevel@tonic-gate
642*0Sstevel@tonic-gate.align	32
643*0Sstevel@tonic-gate.global bn_mul_comba8
644*0Sstevel@tonic-gate/*
645*0Sstevel@tonic-gate * void bn_mul_comba8(r,a,b)
646*0Sstevel@tonic-gate * BN_ULONG *r,*a,*b;
647*0Sstevel@tonic-gate */
648*0Sstevel@tonic-gatebn_mul_comba8:
649*0Sstevel@tonic-gate	save	%sp,FRAME_SIZE,%sp
650*0Sstevel@tonic-gate	mov	1,t_2
651*0Sstevel@tonic-gate	lduw	ap(0),a_0
652*0Sstevel@tonic-gate	sllx	t_2,32,t_2
653*0Sstevel@tonic-gate	lduw	bp(0),b_0	!=
654*0Sstevel@tonic-gate	lduw	bp(1),b_1
655*0Sstevel@tonic-gate	mulx	a_0,b_0,t_1	!mul_add_c(a[0],b[0],c1,c2,c3);
656*0Sstevel@tonic-gate	srlx	t_1,32,c_12
657*0Sstevel@tonic-gate	stuw	t_1,rp(0)	!=!r[0]=c1;
658*0Sstevel@tonic-gate
659*0Sstevel@tonic-gate	lduw	ap(1),a_1
660*0Sstevel@tonic-gate	mulx	a_0,b_1,t_1	!mul_add_c(a[0],b[1],c2,c3,c1);
661*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
662*0Sstevel@tonic-gate	clr	c_3		!=
663*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
664*0Sstevel@tonic-gate	add	c_3,t_2,c_3
665*0Sstevel@tonic-gate	lduw	ap(2),a_2
666*0Sstevel@tonic-gate	mulx	a_1,b_0,t_1	!=!mul_add_c(a[1],b[0],c2,c3,c1);
667*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
668*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
669*0Sstevel@tonic-gate	add	c_3,t_2,c_3
670*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
671*0Sstevel@tonic-gate	stuw	t_1,rp(1)	!r[1]=c2;
672*0Sstevel@tonic-gate	or	c_12,c_3,c_12
673*0Sstevel@tonic-gate
674*0Sstevel@tonic-gate	mulx	a_2,b_0,t_1	!mul_add_c(a[2],b[0],c3,c1,c2);
675*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
676*0Sstevel@tonic-gate	clr	c_3
677*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
678*0Sstevel@tonic-gate	add	c_3,t_2,c_3
679*0Sstevel@tonic-gate	lduw	bp(2),b_2	!=
680*0Sstevel@tonic-gate	mulx	a_1,b_1,t_1	!mul_add_c(a[1],b[1],c3,c1,c2);
681*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
682*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
683*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
684*0Sstevel@tonic-gate	lduw	bp(3),b_3
685*0Sstevel@tonic-gate	mulx	a_0,b_2,t_1	!mul_add_c(a[0],b[2],c3,c1,c2);
686*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
687*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
688*0Sstevel@tonic-gate	add	c_3,t_2,c_3
689*0Sstevel@tonic-gate	srlx	t_1,32,c_12
690*0Sstevel@tonic-gate	stuw	t_1,rp(2)	!r[2]=c3;
691*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
692*0Sstevel@tonic-gate
693*0Sstevel@tonic-gate	mulx	a_0,b_3,t_1	!mul_add_c(a[0],b[3],c1,c2,c3);
694*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
695*0Sstevel@tonic-gate	clr	c_3
696*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
697*0Sstevel@tonic-gate	add	c_3,t_2,c_3
698*0Sstevel@tonic-gate	mulx	a_1,b_2,t_1	!=!mul_add_c(a[1],b[2],c1,c2,c3);
699*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
700*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
701*0Sstevel@tonic-gate	add	c_3,t_2,c_3
702*0Sstevel@tonic-gate	lduw	ap(3),a_3
703*0Sstevel@tonic-gate	mulx	a_2,b_1,t_1	!mul_add_c(a[2],b[1],c1,c2,c3);
704*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
705*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
706*0Sstevel@tonic-gate	add	c_3,t_2,c_3
707*0Sstevel@tonic-gate	lduw	ap(4),a_4
708*0Sstevel@tonic-gate	mulx	a_3,b_0,t_1	!=!mul_add_c(a[3],b[0],c1,c2,c3);!=
709*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
710*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
711*0Sstevel@tonic-gate	add	c_3,t_2,c_3
712*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
713*0Sstevel@tonic-gate	stuw	t_1,rp(3)	!r[3]=c1;
714*0Sstevel@tonic-gate	or	c_12,c_3,c_12
715*0Sstevel@tonic-gate
716*0Sstevel@tonic-gate	mulx	a_4,b_0,t_1	!mul_add_c(a[4],b[0],c2,c3,c1);
717*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
718*0Sstevel@tonic-gate	clr	c_3
719*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
720*0Sstevel@tonic-gate	add	c_3,t_2,c_3
721*0Sstevel@tonic-gate	mulx	a_3,b_1,t_1	!=!mul_add_c(a[3],b[1],c2,c3,c1);
722*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
723*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
724*0Sstevel@tonic-gate	add	c_3,t_2,c_3
725*0Sstevel@tonic-gate	mulx	a_2,b_2,t_1	!=!mul_add_c(a[2],b[2],c2,c3,c1);
726*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
727*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
728*0Sstevel@tonic-gate	add	c_3,t_2,c_3
729*0Sstevel@tonic-gate	lduw	bp(4),b_4	!=
730*0Sstevel@tonic-gate	mulx	a_1,b_3,t_1	!mul_add_c(a[1],b[3],c2,c3,c1);
731*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
732*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
733*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
734*0Sstevel@tonic-gate	lduw	bp(5),b_5
735*0Sstevel@tonic-gate	mulx	a_0,b_4,t_1	!mul_add_c(a[0],b[4],c2,c3,c1);
736*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
737*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
738*0Sstevel@tonic-gate	add	c_3,t_2,c_3
739*0Sstevel@tonic-gate	srlx	t_1,32,c_12
740*0Sstevel@tonic-gate	stuw	t_1,rp(4)	!r[4]=c2;
741*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
742*0Sstevel@tonic-gate
743*0Sstevel@tonic-gate	mulx	a_0,b_5,t_1	!mul_add_c(a[0],b[5],c3,c1,c2);
744*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
745*0Sstevel@tonic-gate	clr	c_3
746*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
747*0Sstevel@tonic-gate	add	c_3,t_2,c_3
748*0Sstevel@tonic-gate	mulx	a_1,b_4,t_1	!mul_add_c(a[1],b[4],c3,c1,c2);
749*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
750*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
751*0Sstevel@tonic-gate	add	c_3,t_2,c_3
752*0Sstevel@tonic-gate	mulx	a_2,b_3,t_1	!mul_add_c(a[2],b[3],c3,c1,c2);
753*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
754*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
755*0Sstevel@tonic-gate	add	c_3,t_2,c_3
756*0Sstevel@tonic-gate	mulx	a_3,b_2,t_1	!mul_add_c(a[3],b[2],c3,c1,c2);
757*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
758*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
759*0Sstevel@tonic-gate	add	c_3,t_2,c_3
760*0Sstevel@tonic-gate	lduw	ap(5),a_5
761*0Sstevel@tonic-gate	mulx	a_4,b_1,t_1	!mul_add_c(a[4],b[1],c3,c1,c2);
762*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
763*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
764*0Sstevel@tonic-gate	add	c_3,t_2,c_3
765*0Sstevel@tonic-gate	lduw	ap(6),a_6
766*0Sstevel@tonic-gate	mulx	a_5,b_0,t_1	!=!mul_add_c(a[5],b[0],c3,c1,c2);
767*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
768*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
769*0Sstevel@tonic-gate	add	c_3,t_2,c_3
770*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
771*0Sstevel@tonic-gate	stuw	t_1,rp(5)	!r[5]=c3;
772*0Sstevel@tonic-gate	or	c_12,c_3,c_12
773*0Sstevel@tonic-gate
774*0Sstevel@tonic-gate	mulx	a_6,b_0,t_1	!mul_add_c(a[6],b[0],c1,c2,c3);
775*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
776*0Sstevel@tonic-gate	clr	c_3
777*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
778*0Sstevel@tonic-gate	add	c_3,t_2,c_3
779*0Sstevel@tonic-gate	mulx	a_5,b_1,t_1	!=!mul_add_c(a[5],b[1],c1,c2,c3);
780*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
781*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
782*0Sstevel@tonic-gate	add	c_3,t_2,c_3
783*0Sstevel@tonic-gate	mulx	a_4,b_2,t_1	!=!mul_add_c(a[4],b[2],c1,c2,c3);
784*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
785*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
786*0Sstevel@tonic-gate	add	c_3,t_2,c_3
787*0Sstevel@tonic-gate	mulx	a_3,b_3,t_1	!=!mul_add_c(a[3],b[3],c1,c2,c3);
788*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
789*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
790*0Sstevel@tonic-gate	add	c_3,t_2,c_3
791*0Sstevel@tonic-gate	mulx	a_2,b_4,t_1	!=!mul_add_c(a[2],b[4],c1,c2,c3);
792*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
793*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
794*0Sstevel@tonic-gate	add	c_3,t_2,c_3
795*0Sstevel@tonic-gate	lduw	bp(6),b_6	!=
796*0Sstevel@tonic-gate	mulx	a_1,b_5,t_1	!mul_add_c(a[1],b[5],c1,c2,c3);
797*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
798*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
799*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
800*0Sstevel@tonic-gate	lduw	bp(7),b_7
801*0Sstevel@tonic-gate	mulx	a_0,b_6,t_1	!mul_add_c(a[0],b[6],c1,c2,c3);
802*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
803*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
804*0Sstevel@tonic-gate	add	c_3,t_2,c_3
805*0Sstevel@tonic-gate	srlx	t_1,32,c_12
806*0Sstevel@tonic-gate	stuw	t_1,rp(6)	!r[6]=c1;
807*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
808*0Sstevel@tonic-gate
809*0Sstevel@tonic-gate	mulx	a_0,b_7,t_1	!mul_add_c(a[0],b[7],c2,c3,c1);
810*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
811*0Sstevel@tonic-gate	clr	c_3
812*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
813*0Sstevel@tonic-gate	add	c_3,t_2,c_3
814*0Sstevel@tonic-gate	mulx	a_1,b_6,t_1	!mul_add_c(a[1],b[6],c2,c3,c1);
815*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
816*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
817*0Sstevel@tonic-gate	add	c_3,t_2,c_3
818*0Sstevel@tonic-gate	mulx	a_2,b_5,t_1	!mul_add_c(a[2],b[5],c2,c3,c1);
819*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
820*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
821*0Sstevel@tonic-gate	add	c_3,t_2,c_3
822*0Sstevel@tonic-gate	mulx	a_3,b_4,t_1	!mul_add_c(a[3],b[4],c2,c3,c1);
823*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
824*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
825*0Sstevel@tonic-gate	add	c_3,t_2,c_3
826*0Sstevel@tonic-gate	mulx	a_4,b_3,t_1	!mul_add_c(a[4],b[3],c2,c3,c1);
827*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
828*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
829*0Sstevel@tonic-gate	add	c_3,t_2,c_3
830*0Sstevel@tonic-gate	mulx	a_5,b_2,t_1	!mul_add_c(a[5],b[2],c2,c3,c1);
831*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
832*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
833*0Sstevel@tonic-gate	add	c_3,t_2,c_3
834*0Sstevel@tonic-gate	lduw	ap(7),a_7
835*0Sstevel@tonic-gate	mulx	a_6,b_1,t_1	!=!mul_add_c(a[6],b[1],c2,c3,c1);
836*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
837*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
838*0Sstevel@tonic-gate	add	c_3,t_2,c_3
839*0Sstevel@tonic-gate	mulx	a_7,b_0,t_1	!=!mul_add_c(a[7],b[0],c2,c3,c1);
840*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
841*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
842*0Sstevel@tonic-gate	add	c_3,t_2,c_3
843*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
844*0Sstevel@tonic-gate	stuw	t_1,rp(7)	!r[7]=c2;
845*0Sstevel@tonic-gate	or	c_12,c_3,c_12
846*0Sstevel@tonic-gate
847*0Sstevel@tonic-gate	mulx	a_7,b_1,t_1	!=!mul_add_c(a[7],b[1],c3,c1,c2);
848*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
849*0Sstevel@tonic-gate	clr	c_3
850*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
851*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
852*0Sstevel@tonic-gate	mulx	a_6,b_2,t_1	!mul_add_c(a[6],b[2],c3,c1,c2);
853*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
854*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
855*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
856*0Sstevel@tonic-gate	mulx	a_5,b_3,t_1	!mul_add_c(a[5],b[3],c3,c1,c2);
857*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
858*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
859*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
860*0Sstevel@tonic-gate	mulx	a_4,b_4,t_1	!mul_add_c(a[4],b[4],c3,c1,c2);
861*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
862*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
863*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
864*0Sstevel@tonic-gate	mulx	a_3,b_5,t_1	!mul_add_c(a[3],b[5],c3,c1,c2);
865*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
866*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
867*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
868*0Sstevel@tonic-gate	mulx	a_2,b_6,t_1	!mul_add_c(a[2],b[6],c3,c1,c2);
869*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
870*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
871*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
872*0Sstevel@tonic-gate	mulx	a_1,b_7,t_1	!mul_add_c(a[1],b[7],c3,c1,c2);
873*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
874*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
875*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
876*0Sstevel@tonic-gate	srlx	t_1,32,c_12
877*0Sstevel@tonic-gate	stuw	t_1,rp(8)	!r[8]=c3;
878*0Sstevel@tonic-gate	or	c_12,c_3,c_12
879*0Sstevel@tonic-gate
880*0Sstevel@tonic-gate	mulx	a_2,b_7,t_1	!=!mul_add_c(a[2],b[7],c1,c2,c3);
881*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
882*0Sstevel@tonic-gate	clr	c_3
883*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
884*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
885*0Sstevel@tonic-gate	mulx	a_3,b_6,t_1	!mul_add_c(a[3],b[6],c1,c2,c3);
886*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
887*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
888*0Sstevel@tonic-gate	add	c_3,t_2,c_3
889*0Sstevel@tonic-gate	mulx	a_4,b_5,t_1	!mul_add_c(a[4],b[5],c1,c2,c3);
890*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
891*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
892*0Sstevel@tonic-gate	add	c_3,t_2,c_3
893*0Sstevel@tonic-gate	mulx	a_5,b_4,t_1	!mul_add_c(a[5],b[4],c1,c2,c3);
894*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
895*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
896*0Sstevel@tonic-gate	add	c_3,t_2,c_3
897*0Sstevel@tonic-gate	mulx	a_6,b_3,t_1	!mul_add_c(a[6],b[3],c1,c2,c3);
898*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
899*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
900*0Sstevel@tonic-gate	add	c_3,t_2,c_3
901*0Sstevel@tonic-gate	mulx	a_7,b_2,t_1	!mul_add_c(a[7],b[2],c1,c2,c3);
902*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
903*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
904*0Sstevel@tonic-gate	add	c_3,t_2,c_3
905*0Sstevel@tonic-gate	srlx	t_1,32,c_12
906*0Sstevel@tonic-gate	stuw	t_1,rp(9)	!r[9]=c1;
907*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
908*0Sstevel@tonic-gate
909*0Sstevel@tonic-gate	mulx	a_7,b_3,t_1	!mul_add_c(a[7],b[3],c2,c3,c1);
910*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
911*0Sstevel@tonic-gate	clr	c_3
912*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
913*0Sstevel@tonic-gate	add	c_3,t_2,c_3
914*0Sstevel@tonic-gate	mulx	a_6,b_4,t_1	!mul_add_c(a[6],b[4],c2,c3,c1);
915*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
916*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
917*0Sstevel@tonic-gate	add	c_3,t_2,c_3
918*0Sstevel@tonic-gate	mulx	a_5,b_5,t_1	!mul_add_c(a[5],b[5],c2,c3,c1);
919*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
920*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
921*0Sstevel@tonic-gate	add	c_3,t_2,c_3
922*0Sstevel@tonic-gate	mulx	a_4,b_6,t_1	!mul_add_c(a[4],b[6],c2,c3,c1);
923*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
924*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
925*0Sstevel@tonic-gate	add	c_3,t_2,c_3
926*0Sstevel@tonic-gate	mulx	a_3,b_7,t_1	!mul_add_c(a[3],b[7],c2,c3,c1);
927*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
928*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
929*0Sstevel@tonic-gate	add	c_3,t_2,c_3
930*0Sstevel@tonic-gate	srlx	t_1,32,c_12
931*0Sstevel@tonic-gate	stuw	t_1,rp(10)	!r[10]=c2;
932*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
933*0Sstevel@tonic-gate
934*0Sstevel@tonic-gate	mulx	a_4,b_7,t_1	!mul_add_c(a[4],b[7],c3,c1,c2);
935*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
936*0Sstevel@tonic-gate	clr	c_3
937*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
938*0Sstevel@tonic-gate	add	c_3,t_2,c_3
939*0Sstevel@tonic-gate	mulx	a_5,b_6,t_1	!mul_add_c(a[5],b[6],c3,c1,c2);
940*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
941*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
942*0Sstevel@tonic-gate	add	c_3,t_2,c_3
943*0Sstevel@tonic-gate	mulx	a_6,b_5,t_1	!mul_add_c(a[6],b[5],c3,c1,c2);
944*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
945*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
946*0Sstevel@tonic-gate	add	c_3,t_2,c_3
947*0Sstevel@tonic-gate	mulx	a_7,b_4,t_1	!mul_add_c(a[7],b[4],c3,c1,c2);
948*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
949*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
950*0Sstevel@tonic-gate	add	c_3,t_2,c_3
951*0Sstevel@tonic-gate	srlx	t_1,32,c_12
952*0Sstevel@tonic-gate	stuw	t_1,rp(11)	!r[11]=c3;
953*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
954*0Sstevel@tonic-gate
955*0Sstevel@tonic-gate	mulx	a_7,b_5,t_1	!mul_add_c(a[7],b[5],c1,c2,c3);
956*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
957*0Sstevel@tonic-gate	clr	c_3
958*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
959*0Sstevel@tonic-gate	add	c_3,t_2,c_3
960*0Sstevel@tonic-gate	mulx	a_6,b_6,t_1	!mul_add_c(a[6],b[6],c1,c2,c3);
961*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
962*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
963*0Sstevel@tonic-gate	add	c_3,t_2,c_3
964*0Sstevel@tonic-gate	mulx	a_5,b_7,t_1	!mul_add_c(a[5],b[7],c1,c2,c3);
965*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
966*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
967*0Sstevel@tonic-gate	add	c_3,t_2,c_3
968*0Sstevel@tonic-gate	srlx	t_1,32,c_12
969*0Sstevel@tonic-gate	stuw	t_1,rp(12)	!r[12]=c1;
970*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
971*0Sstevel@tonic-gate
972*0Sstevel@tonic-gate	mulx	a_6,b_7,t_1	!mul_add_c(a[6],b[7],c2,c3,c1);
973*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
974*0Sstevel@tonic-gate	clr	c_3
975*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
976*0Sstevel@tonic-gate	add	c_3,t_2,c_3
977*0Sstevel@tonic-gate	mulx	a_7,b_6,t_1	!mul_add_c(a[7],b[6],c2,c3,c1);
978*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
979*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
980*0Sstevel@tonic-gate	add	c_3,t_2,c_3
981*0Sstevel@tonic-gate	srlx	t_1,32,c_12
982*0Sstevel@tonic-gate	st	t_1,rp(13)	!r[13]=c2;
983*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
984*0Sstevel@tonic-gate
985*0Sstevel@tonic-gate	mulx	a_7,b_7,t_1	!mul_add_c(a[7],b[7],c3,c1,c2);
986*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
987*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
988*0Sstevel@tonic-gate	stuw	t_1,rp(14)	!r[14]=c3;
989*0Sstevel@tonic-gate	stuw	c_12,rp(15)	!r[15]=c1;
990*0Sstevel@tonic-gate
991*0Sstevel@tonic-gate	ret
992*0Sstevel@tonic-gate	restore	%g0,%g0,%o0	!=
993*0Sstevel@tonic-gate
994*0Sstevel@tonic-gate.type	bn_mul_comba8,#function
995*0Sstevel@tonic-gate.size	bn_mul_comba8,(.-bn_mul_comba8)
996*0Sstevel@tonic-gate
997*0Sstevel@tonic-gate.align	32
998*0Sstevel@tonic-gate
999*0Sstevel@tonic-gate.global bn_mul_comba4
1000*0Sstevel@tonic-gate/*
1001*0Sstevel@tonic-gate * void bn_mul_comba4(r,a,b)
1002*0Sstevel@tonic-gate * BN_ULONG *r,*a,*b;
1003*0Sstevel@tonic-gate */
1004*0Sstevel@tonic-gatebn_mul_comba4:
1005*0Sstevel@tonic-gate	save	%sp,FRAME_SIZE,%sp
1006*0Sstevel@tonic-gate	lduw	ap(0),a_0
1007*0Sstevel@tonic-gate	mov	1,t_2
1008*0Sstevel@tonic-gate	lduw	bp(0),b_0
1009*0Sstevel@tonic-gate	sllx	t_2,32,t_2	!=
1010*0Sstevel@tonic-gate	lduw	bp(1),b_1
1011*0Sstevel@tonic-gate	mulx	a_0,b_0,t_1	!mul_add_c(a[0],b[0],c1,c2,c3);
1012*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1013*0Sstevel@tonic-gate	stuw	t_1,rp(0)	!=!r[0]=c1;
1014*0Sstevel@tonic-gate
1015*0Sstevel@tonic-gate	lduw	ap(1),a_1
1016*0Sstevel@tonic-gate	mulx	a_0,b_1,t_1	!mul_add_c(a[0],b[1],c2,c3,c1);
1017*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1018*0Sstevel@tonic-gate	clr	c_3		!=
1019*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1020*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1021*0Sstevel@tonic-gate	lduw	ap(2),a_2
1022*0Sstevel@tonic-gate	mulx	a_1,b_0,t_1	!=!mul_add_c(a[1],b[0],c2,c3,c1);
1023*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1024*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1025*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1026*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
1027*0Sstevel@tonic-gate	stuw	t_1,rp(1)	!r[1]=c2;
1028*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1029*0Sstevel@tonic-gate
1030*0Sstevel@tonic-gate	mulx	a_2,b_0,t_1	!mul_add_c(a[2],b[0],c3,c1,c2);
1031*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
1032*0Sstevel@tonic-gate	clr	c_3
1033*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1034*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1035*0Sstevel@tonic-gate	lduw	bp(2),b_2	!=
1036*0Sstevel@tonic-gate	mulx	a_1,b_1,t_1	!mul_add_c(a[1],b[1],c3,c1,c2);
1037*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1038*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1039*0Sstevel@tonic-gate	add	c_3,t_2,c_3	!=
1040*0Sstevel@tonic-gate	lduw	bp(3),b_3
1041*0Sstevel@tonic-gate	mulx	a_0,b_2,t_1	!mul_add_c(a[0],b[2],c3,c1,c2);
1042*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1043*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
1044*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1045*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1046*0Sstevel@tonic-gate	stuw	t_1,rp(2)	!r[2]=c3;
1047*0Sstevel@tonic-gate	or	c_12,c_3,c_12	!=
1048*0Sstevel@tonic-gate
1049*0Sstevel@tonic-gate	mulx	a_0,b_3,t_1	!mul_add_c(a[0],b[3],c1,c2,c3);
1050*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1051*0Sstevel@tonic-gate	clr	c_3
1052*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
1053*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1054*0Sstevel@tonic-gate	mulx	a_1,b_2,t_1	!mul_add_c(a[1],b[2],c1,c2,c3);
1055*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1056*0Sstevel@tonic-gate	bcs,a	%xcc,.+8	!=
1057*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1058*0Sstevel@tonic-gate	lduw	ap(3),a_3
1059*0Sstevel@tonic-gate	mulx	a_2,b_1,t_1	!mul_add_c(a[2],b[1],c1,c2,c3);
1060*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
1061*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1062*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1063*0Sstevel@tonic-gate	mulx	a_3,b_0,t_1	!mul_add_c(a[3],b[0],c1,c2,c3);!=
1064*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1	!=
1065*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1066*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1067*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1068*0Sstevel@tonic-gate	stuw	t_1,rp(3)	!=!r[3]=c1;
1069*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1070*0Sstevel@tonic-gate
1071*0Sstevel@tonic-gate	mulx	a_3,b_1,t_1	!mul_add_c(a[3],b[1],c2,c3,c1);
1072*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1073*0Sstevel@tonic-gate	clr	c_3		!=
1074*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1075*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1076*0Sstevel@tonic-gate	mulx	a_2,b_2,t_1	!mul_add_c(a[2],b[2],c2,c3,c1);
1077*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12	!=
1078*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1079*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1080*0Sstevel@tonic-gate	mulx	a_1,b_3,t_1	!mul_add_c(a[1],b[3],c2,c3,c1);
1081*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1	!=
1082*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1083*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1084*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1085*0Sstevel@tonic-gate	stuw	t_1,rp(4)	!=!r[4]=c2;
1086*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1087*0Sstevel@tonic-gate
1088*0Sstevel@tonic-gate	mulx	a_2,b_3,t_1	!mul_add_c(a[2],b[3],c3,c1,c2);
1089*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1090*0Sstevel@tonic-gate	clr	c_3		!=
1091*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1092*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1093*0Sstevel@tonic-gate	mulx	a_3,b_2,t_1	!mul_add_c(a[3],b[2],c3,c1,c2);
1094*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1	!=
1095*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1096*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1097*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1098*0Sstevel@tonic-gate	stuw	t_1,rp(5)	!=!r[5]=c3;
1099*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1100*0Sstevel@tonic-gate
1101*0Sstevel@tonic-gate	mulx	a_3,b_3,t_1	!mul_add_c(a[3],b[3],c1,c2,c3);
1102*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1103*0Sstevel@tonic-gate	srlx	t_1,32,c_12	!=
1104*0Sstevel@tonic-gate	stuw	t_1,rp(6)	!r[6]=c1;
1105*0Sstevel@tonic-gate	stuw	c_12,rp(7)	!r[7]=c2;
1106*0Sstevel@tonic-gate
1107*0Sstevel@tonic-gate	ret
1108*0Sstevel@tonic-gate	restore	%g0,%g0,%o0
1109*0Sstevel@tonic-gate
1110*0Sstevel@tonic-gate.type	bn_mul_comba4,#function
1111*0Sstevel@tonic-gate.size	bn_mul_comba4,(.-bn_mul_comba4)
1112*0Sstevel@tonic-gate
1113*0Sstevel@tonic-gate.align	32
1114*0Sstevel@tonic-gate
1115*0Sstevel@tonic-gate.global bn_sqr_comba8
1116*0Sstevel@tonic-gatebn_sqr_comba8:
1117*0Sstevel@tonic-gate	save	%sp,FRAME_SIZE,%sp
1118*0Sstevel@tonic-gate	mov	1,t_2
1119*0Sstevel@tonic-gate	lduw	ap(0),a_0
1120*0Sstevel@tonic-gate	sllx	t_2,32,t_2
1121*0Sstevel@tonic-gate	lduw	ap(1),a_1
1122*0Sstevel@tonic-gate	mulx	a_0,a_0,t_1	!sqr_add_c(a,0,c1,c2,c3);
1123*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1124*0Sstevel@tonic-gate	stuw	t_1,rp(0)	!r[0]=c1;
1125*0Sstevel@tonic-gate
1126*0Sstevel@tonic-gate	lduw	ap(2),a_2
1127*0Sstevel@tonic-gate	mulx	a_0,a_1,t_1	!=!sqr_add_c2(a,1,0,c2,c3,c1);
1128*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1129*0Sstevel@tonic-gate	clr	c_3
1130*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1131*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1132*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1133*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1134*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1135*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1136*0Sstevel@tonic-gate	stuw	t_1,rp(1)	!r[1]=c2;
1137*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1138*0Sstevel@tonic-gate
1139*0Sstevel@tonic-gate	mulx	a_2,a_0,t_1	!sqr_add_c2(a,2,0,c3,c1,c2);
1140*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1141*0Sstevel@tonic-gate	clr	c_3
1142*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1143*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1144*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1145*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1146*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1147*0Sstevel@tonic-gate	lduw	ap(3),a_3
1148*0Sstevel@tonic-gate	mulx	a_1,a_1,t_1	!sqr_add_c(a,1,c3,c1,c2);
1149*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1150*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1151*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1152*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1153*0Sstevel@tonic-gate	stuw	t_1,rp(2)	!r[2]=c3;
1154*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1155*0Sstevel@tonic-gate
1156*0Sstevel@tonic-gate	mulx	a_0,a_3,t_1	!sqr_add_c2(a,3,0,c1,c2,c3);
1157*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1158*0Sstevel@tonic-gate	clr	c_3
1159*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1160*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1161*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1162*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1163*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1164*0Sstevel@tonic-gate	lduw	ap(4),a_4
1165*0Sstevel@tonic-gate	mulx	a_1,a_2,t_1	!sqr_add_c2(a,2,1,c1,c2,c3);
1166*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1167*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1168*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1169*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1170*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1171*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1172*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1173*0Sstevel@tonic-gate	st	t_1,rp(3)	!r[3]=c1;
1174*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1175*0Sstevel@tonic-gate
1176*0Sstevel@tonic-gate	mulx	a_4,a_0,t_1	!sqr_add_c2(a,4,0,c2,c3,c1);
1177*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1178*0Sstevel@tonic-gate	clr	c_3
1179*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1180*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1181*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1182*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1183*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1184*0Sstevel@tonic-gate	mulx	a_3,a_1,t_1	!sqr_add_c2(a,3,1,c2,c3,c1);
1185*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1186*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1187*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1188*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1189*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1190*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1191*0Sstevel@tonic-gate	lduw	ap(5),a_5
1192*0Sstevel@tonic-gate	mulx	a_2,a_2,t_1	!sqr_add_c(a,2,c2,c3,c1);
1193*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1194*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1195*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1196*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1197*0Sstevel@tonic-gate	stuw	t_1,rp(4)	!r[4]=c2;
1198*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1199*0Sstevel@tonic-gate
1200*0Sstevel@tonic-gate	mulx	a_0,a_5,t_1	!sqr_add_c2(a,5,0,c3,c1,c2);
1201*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1202*0Sstevel@tonic-gate	clr	c_3
1203*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1204*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1205*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1206*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1207*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1208*0Sstevel@tonic-gate	mulx	a_1,a_4,t_1	!sqr_add_c2(a,4,1,c3,c1,c2);
1209*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1210*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1211*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1212*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1213*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1214*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1215*0Sstevel@tonic-gate	lduw	ap(6),a_6
1216*0Sstevel@tonic-gate	mulx	a_2,a_3,t_1	!sqr_add_c2(a,3,2,c3,c1,c2);
1217*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1218*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1219*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1220*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1221*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1222*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1223*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1224*0Sstevel@tonic-gate	stuw	t_1,rp(5)	!r[5]=c3;
1225*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1226*0Sstevel@tonic-gate
1227*0Sstevel@tonic-gate	mulx	a_6,a_0,t_1	!sqr_add_c2(a,6,0,c1,c2,c3);
1228*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1229*0Sstevel@tonic-gate	clr	c_3
1230*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1231*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1232*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1233*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1234*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1235*0Sstevel@tonic-gate	mulx	a_5,a_1,t_1	!sqr_add_c2(a,5,1,c1,c2,c3);
1236*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1237*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1238*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1239*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1240*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1241*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1242*0Sstevel@tonic-gate	mulx	a_4,a_2,t_1	!sqr_add_c2(a,4,2,c1,c2,c3);
1243*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1244*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1245*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1246*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1247*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1248*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1249*0Sstevel@tonic-gate	lduw	ap(7),a_7
1250*0Sstevel@tonic-gate	mulx	a_3,a_3,t_1	!=!sqr_add_c(a,3,c1,c2,c3);
1251*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1252*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1253*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1254*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1255*0Sstevel@tonic-gate	stuw	t_1,rp(6)	!r[6]=c1;
1256*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1257*0Sstevel@tonic-gate
1258*0Sstevel@tonic-gate	mulx	a_0,a_7,t_1	!sqr_add_c2(a,7,0,c2,c3,c1);
1259*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1260*0Sstevel@tonic-gate	clr	c_3
1261*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1262*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1263*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1264*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1265*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1266*0Sstevel@tonic-gate	mulx	a_1,a_6,t_1	!sqr_add_c2(a,6,1,c2,c3,c1);
1267*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1268*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1269*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1270*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1271*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1272*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1273*0Sstevel@tonic-gate	mulx	a_2,a_5,t_1	!sqr_add_c2(a,5,2,c2,c3,c1);
1274*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1275*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1276*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1277*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1278*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1279*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1280*0Sstevel@tonic-gate	mulx	a_3,a_4,t_1	!sqr_add_c2(a,4,3,c2,c3,c1);
1281*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1282*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1283*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1284*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1285*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1286*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1287*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1288*0Sstevel@tonic-gate	stuw	t_1,rp(7)	!r[7]=c2;
1289*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1290*0Sstevel@tonic-gate
1291*0Sstevel@tonic-gate	mulx	a_7,a_1,t_1	!sqr_add_c2(a,7,1,c3,c1,c2);
1292*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1293*0Sstevel@tonic-gate	clr	c_3
1294*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1295*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1296*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1297*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1298*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1299*0Sstevel@tonic-gate	mulx	a_6,a_2,t_1	!sqr_add_c2(a,6,2,c3,c1,c2);
1300*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1301*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1302*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1303*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1304*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1305*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1306*0Sstevel@tonic-gate	mulx	a_5,a_3,t_1	!sqr_add_c2(a,5,3,c3,c1,c2);
1307*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1308*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1309*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1310*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1311*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1312*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1313*0Sstevel@tonic-gate	mulx	a_4,a_4,t_1	!sqr_add_c(a,4,c3,c1,c2);
1314*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1315*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1316*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1317*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1318*0Sstevel@tonic-gate	stuw	t_1,rp(8)	!r[8]=c3;
1319*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1320*0Sstevel@tonic-gate
1321*0Sstevel@tonic-gate	mulx	a_2,a_7,t_1	!sqr_add_c2(a,7,2,c1,c2,c3);
1322*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1323*0Sstevel@tonic-gate	clr	c_3
1324*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1325*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1326*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1327*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1328*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1329*0Sstevel@tonic-gate	mulx	a_3,a_6,t_1	!sqr_add_c2(a,6,3,c1,c2,c3);
1330*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1331*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1332*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1333*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1334*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1335*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1336*0Sstevel@tonic-gate	mulx	a_4,a_5,t_1	!sqr_add_c2(a,5,4,c1,c2,c3);
1337*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1338*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1339*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1340*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1341*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1342*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1343*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1344*0Sstevel@tonic-gate	stuw	t_1,rp(9)	!r[9]=c1;
1345*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1346*0Sstevel@tonic-gate
1347*0Sstevel@tonic-gate	mulx	a_7,a_3,t_1	!sqr_add_c2(a,7,3,c2,c3,c1);
1348*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1349*0Sstevel@tonic-gate	clr	c_3
1350*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1351*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1352*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1353*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1354*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1355*0Sstevel@tonic-gate	mulx	a_6,a_4,t_1	!sqr_add_c2(a,6,4,c2,c3,c1);
1356*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1357*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1358*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1359*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1360*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1361*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1362*0Sstevel@tonic-gate	mulx	a_5,a_5,t_1	!sqr_add_c(a,5,c2,c3,c1);
1363*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1364*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1365*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1366*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1367*0Sstevel@tonic-gate	stuw	t_1,rp(10)	!r[10]=c2;
1368*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1369*0Sstevel@tonic-gate
1370*0Sstevel@tonic-gate	mulx	a_4,a_7,t_1	!sqr_add_c2(a,7,4,c3,c1,c2);
1371*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1372*0Sstevel@tonic-gate	clr	c_3
1373*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1374*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1375*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1376*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1377*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1378*0Sstevel@tonic-gate	mulx	a_5,a_6,t_1	!sqr_add_c2(a,6,5,c3,c1,c2);
1379*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1380*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1381*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1382*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1383*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1384*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1385*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1386*0Sstevel@tonic-gate	stuw	t_1,rp(11)	!r[11]=c3;
1387*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1388*0Sstevel@tonic-gate
1389*0Sstevel@tonic-gate	mulx	a_7,a_5,t_1	!sqr_add_c2(a,7,5,c1,c2,c3);
1390*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1391*0Sstevel@tonic-gate	clr	c_3
1392*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1393*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1394*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1395*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1396*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1397*0Sstevel@tonic-gate	mulx	a_6,a_6,t_1	!sqr_add_c(a,6,c1,c2,c3);
1398*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1399*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1400*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1401*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1402*0Sstevel@tonic-gate	stuw	t_1,rp(12)	!r[12]=c1;
1403*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1404*0Sstevel@tonic-gate
1405*0Sstevel@tonic-gate	mulx	a_6,a_7,t_1	!sqr_add_c2(a,7,6,c2,c3,c1);
1406*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1407*0Sstevel@tonic-gate	clr	c_3
1408*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1409*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1410*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1411*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1412*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1413*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1414*0Sstevel@tonic-gate	stuw	t_1,rp(13)	!r[13]=c2;
1415*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1416*0Sstevel@tonic-gate
1417*0Sstevel@tonic-gate	mulx	a_7,a_7,t_1	!sqr_add_c(a,7,c3,c1,c2);
1418*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1419*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1420*0Sstevel@tonic-gate	stuw	t_1,rp(14)	!r[14]=c3;
1421*0Sstevel@tonic-gate	stuw	c_12,rp(15)	!r[15]=c1;
1422*0Sstevel@tonic-gate
1423*0Sstevel@tonic-gate	ret
1424*0Sstevel@tonic-gate	restore	%g0,%g0,%o0
1425*0Sstevel@tonic-gate
1426*0Sstevel@tonic-gate.type	bn_sqr_comba8,#function
1427*0Sstevel@tonic-gate.size	bn_sqr_comba8,(.-bn_sqr_comba8)
1428*0Sstevel@tonic-gate
1429*0Sstevel@tonic-gate.align	32
1430*0Sstevel@tonic-gate
1431*0Sstevel@tonic-gate.global bn_sqr_comba4
1432*0Sstevel@tonic-gate/*
1433*0Sstevel@tonic-gate * void bn_sqr_comba4(r,a)
1434*0Sstevel@tonic-gate * BN_ULONG *r,*a;
1435*0Sstevel@tonic-gate */
1436*0Sstevel@tonic-gatebn_sqr_comba4:
1437*0Sstevel@tonic-gate	save	%sp,FRAME_SIZE,%sp
1438*0Sstevel@tonic-gate	mov	1,t_2
1439*0Sstevel@tonic-gate	lduw	ap(0),a_0
1440*0Sstevel@tonic-gate	sllx	t_2,32,t_2
1441*0Sstevel@tonic-gate	lduw	ap(1),a_1
1442*0Sstevel@tonic-gate	mulx	a_0,a_0,t_1	!sqr_add_c(a,0,c1,c2,c3);
1443*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1444*0Sstevel@tonic-gate	stuw	t_1,rp(0)	!r[0]=c1;
1445*0Sstevel@tonic-gate
1446*0Sstevel@tonic-gate	lduw	ap(2),a_2
1447*0Sstevel@tonic-gate	mulx	a_0,a_1,t_1	!sqr_add_c2(a,1,0,c2,c3,c1);
1448*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1449*0Sstevel@tonic-gate	clr	c_3
1450*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1451*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1452*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1453*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1454*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1455*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1456*0Sstevel@tonic-gate	stuw	t_1,rp(1)	!r[1]=c2;
1457*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1458*0Sstevel@tonic-gate
1459*0Sstevel@tonic-gate	mulx	a_2,a_0,t_1	!sqr_add_c2(a,2,0,c3,c1,c2);
1460*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1461*0Sstevel@tonic-gate	clr	c_3
1462*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1463*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1464*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1465*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1466*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1467*0Sstevel@tonic-gate	lduw	ap(3),a_3
1468*0Sstevel@tonic-gate	mulx	a_1,a_1,t_1	!sqr_add_c(a,1,c3,c1,c2);
1469*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1470*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1471*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1472*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1473*0Sstevel@tonic-gate	stuw	t_1,rp(2)	!r[2]=c3;
1474*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1475*0Sstevel@tonic-gate
1476*0Sstevel@tonic-gate	mulx	a_0,a_3,t_1	!sqr_add_c2(a,3,0,c1,c2,c3);
1477*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1478*0Sstevel@tonic-gate	clr	c_3
1479*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1480*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1481*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1482*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1483*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1484*0Sstevel@tonic-gate	mulx	a_1,a_2,t_1	!sqr_add_c2(a,2,1,c1,c2,c3);
1485*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1486*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1487*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1488*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1489*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1490*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1491*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1492*0Sstevel@tonic-gate	stuw	t_1,rp(3)	!r[3]=c1;
1493*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1494*0Sstevel@tonic-gate
1495*0Sstevel@tonic-gate	mulx	a_3,a_1,t_1	!sqr_add_c2(a,3,1,c2,c3,c1);
1496*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1497*0Sstevel@tonic-gate	clr	c_3
1498*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1499*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1500*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1501*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1502*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1503*0Sstevel@tonic-gate	mulx	a_2,a_2,t_1	!sqr_add_c(a,2,c2,c3,c1);
1504*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1505*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1506*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1507*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1508*0Sstevel@tonic-gate	stuw	t_1,rp(4)	!r[4]=c2;
1509*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1510*0Sstevel@tonic-gate
1511*0Sstevel@tonic-gate	mulx	a_2,a_3,t_1	!sqr_add_c2(a,3,2,c3,c1,c2);
1512*0Sstevel@tonic-gate	addcc	c_12,t_1,c_12
1513*0Sstevel@tonic-gate	clr	c_3
1514*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1515*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1516*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1517*0Sstevel@tonic-gate	bcs,a	%xcc,.+8
1518*0Sstevel@tonic-gate	add	c_3,t_2,c_3
1519*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1520*0Sstevel@tonic-gate	stuw	t_1,rp(5)	!r[5]=c3;
1521*0Sstevel@tonic-gate	or	c_12,c_3,c_12
1522*0Sstevel@tonic-gate
1523*0Sstevel@tonic-gate	mulx	a_3,a_3,t_1	!sqr_add_c(a,3,c1,c2,c3);
1524*0Sstevel@tonic-gate	addcc	c_12,t_1,t_1
1525*0Sstevel@tonic-gate	srlx	t_1,32,c_12
1526*0Sstevel@tonic-gate	stuw	t_1,rp(6)	!r[6]=c1;
1527*0Sstevel@tonic-gate	stuw	c_12,rp(7)	!r[7]=c2;
1528*0Sstevel@tonic-gate
1529*0Sstevel@tonic-gate	ret
1530*0Sstevel@tonic-gate	restore	%g0,%g0,%o0
1531*0Sstevel@tonic-gate
1532*0Sstevel@tonic-gate.type	bn_sqr_comba4,#function
1533*0Sstevel@tonic-gate.size	bn_sqr_comba4,(.-bn_sqr_comba4)
1534*0Sstevel@tonic-gate
1535*0Sstevel@tonic-gate.align	32
1536