1Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc. 2 3This file is part of the GNU MP Library. 4 5The GNU MP Library is free software; you can redistribute it and/or modify 6it under the terms of the GNU Lesser General Public License as published by 7the Free Software Foundation; either version 3 of the License, or (at your 8option) any later version. 9 10The GNU MP Library is distributed in the hope that it will be useful, but 11WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 12or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public 13License for more details. 14 15You should have received a copy of the GNU Lesser General Public License 16along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. 17 18 19 20 21 22 23This directory contains mpn functions for various HP PA-RISC chips. Code 24that runs faster on the PA7100 and later implementations, is in the pa7100 25directory. 26 27RELEVANT OPTIMIZATION ISSUES 28 29 Load and Store timing 30 31On the PA7000 no memory instructions can issue the two cycles after a store. 32For the PA7100, this is reduced to one cycle. 33 34The PA7100 has a lookup-free cache, so it helps to schedule loads and the 35dependent instruction really far from each other. 36 37STATUS 38 391. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the 40 instructions below (but some sw pipelining is needed to avoid the 41 xmpyu-fstds delay): 42 43 fldds s1_ptr 44 45 xmpyu 46 fstds N(%r30) 47 xmpyu 48 fstds N(%r30) 49 50 ldws N(%r30) 51 ldws N(%r30) 52 ldws N(%r30) 53 ldws N(%r30) 54 55 addc 56 stws res_ptr 57 addc 58 stws res_ptr 59 60 addib Loop 61 622. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb 63 (asymptotically) on the PA7100, using the instructions below. With proper 64 sw pipelining and the unrolling level below, the speed becomes 8 65 cycles/limb. 66 67 fldds s1_ptr 68 fldds s1_ptr 69 70 xmpyu 71 fstds N(%r30) 72 xmpyu 73 fstds N(%r30) 74 xmpyu 75 fstds N(%r30) 76 xmpyu 77 fstds N(%r30) 78 79 ldws N(%r30) 80 ldws N(%r30) 81 ldws N(%r30) 82 ldws N(%r30) 83 ldws N(%r30) 84 ldws N(%r30) 85 ldws N(%r30) 86 ldws N(%r30) 87 addc 88 addc 89 addc 90 addc 91 addc %r0,%r0,cy-limb 92 93 ldws res_ptr 94 ldws res_ptr 95 ldws res_ptr 96 ldws res_ptr 97 add 98 stws res_ptr 99 addc 100 stws res_ptr 101 addc 102 stws res_ptr 103 addc 104 stws res_ptr 105 106 addib 107 1083. For the PA8000 we have to stick to using 32-bit limbs before compiler 109 support emerges. But we want to use 64-bit operations whenever possible, 110 in particular for loads and stores. It is possible to handle mpn_add_n 111 efficiently by rotating (when s1/s2 are aligned), masking+bit field 112 inserting when (they are not). The speed should double compared to the 113 code used today. 114 115 116 117 118LABEL SYNTAX 119 120The HP-UX assembler takes labels starting in column 0 with no colon, 121 122 L$loop ldws,mb -4(0,%r25),%r22 123 124Gas on hppa GNU/Linux however requires a colon, 125 126 L$loop: ldws,mb -4(0,%r25),%r22 127 128This is covered by using LDEF() from asm-defs.m4. An alternative would be 129to use ".label" which is accepted by both, 130 131 .label L$loop 132 ldws,mb -4(0,%r25),%r22 133 134but that's not as nice to look at, not if you're used to assembler code 135having labels in column 0. 136 137 138 139 140REFERENCES 141 142Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998, 143part number 92432-90012. 144 145 146 147---------------- 148Local variables: 149mode: text 150fill-column: 76 151End: 152