1dnl Alpha ev67 mpn_popcount -- mpn bit population count. 2 3dnl Copyright 2003, 2005 Free Software Foundation, Inc. 4 5dnl This file is part of the GNU MP Library. 6dnl 7dnl The GNU MP Library is free software; you can redistribute it and/or 8dnl modify it under the terms of the GNU Lesser General Public License as 9dnl published by the Free Software Foundation; either version 3 of the 10dnl License, or (at your option) any later version. 11dnl 12dnl The GNU MP Library is distributed in the hope that it will be useful, 13dnl but WITHOUT ANY WARRANTY; without even the implied warranty of 14dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 15dnl Lesser General Public License for more details. 16dnl 17dnl You should have received a copy of the GNU Lesser General Public License 18dnl along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. 19 20include(`../config.m4') 21 22 23C ev67: 1.5 cycles/limb 24 25 26C unsigned long mpn_popcount (mp_srcptr src, mp_size_t size); 27C 28C This schedule seems necessary for the full 1.5 c/l, the IQ can't quite hide 29C all latencies, the addq's must be deferred to the next iteration. 30C 31C Since we need just 3 instructions per limb, further unrolling could approach 32C 1.0 c/l. 33C 34C The main loop processes two limbs at a time. An odd size is handled by 35C processing src[0] at the start. If the size is even that result is 36C discarded, and src[0] is repeated by the main loop. 37C 38 39ASM_START() 40PROLOGUE(mpn_popcount) 41 42 C r16 src 43 C r17 size 44 45 ldq r0, 0(r16) C L0 src[0] 46 and r17, 1, r8 C U1 1 if size odd 47 srl r17, 1, r17 C U0 size, limb pairs 48 49 s8addq r8, r16, r16 C L1 src++ if size odd 50 ctpop r0, r0 C U0 51 beq r17, L(one) C U1 if size==1 52 53 cmoveq r8, r31, r0 C L discard first limb if size even 54 clr r3 C L 55 56 clr r4 C L 57 unop C U 58 unop C L 59 unop C U 60 61 62 ALIGN(16) 63L(top): 64 C r0 total accumulating 65 C r3 pop 0 66 C r4 pop 1 67 C r16 src, incrementing 68 C r17 size, decrementing 69 70 ldq r1, 0(r16) C L 71 ldq r2, 8(r16) C L 72 lda r16, 16(r16) C U 73 lda r17, -1(r17) C U 74 75 addq r0, r3, r0 C L 76 addq r0, r4, r0 C L 77 ctpop r1, r3 C U0 78 ctpop r2, r4 C U0 79 80 ldl r31, 512(r16) C L prefetch 81 bne r17, L(top) C U 82 83 84 addq r0, r3, r0 C L 85 addq r0, r4, r0 C U 86L(one): 87 ret r31, (r26), 1 C L0 88 89EPILOGUE() 90ASM_END() 91