1Copyright 2002, 2005 Free Software Foundation, Inc. 2 3This file is part of the GNU MP Library. 4 5The GNU MP Library is free software; you can redistribute it and/or modify 6it under the terms of either: 7 8 * the GNU Lesser General Public License as published by the Free 9 Software Foundation; either version 3 of the License, or (at your 10 option) any later version. 11 12or 13 14 * the GNU General Public License as published by the Free Software 15 Foundation; either version 2 of the License, or (at your option) any 16 later version. 17 18or both in parallel, as here. 19 20The GNU MP Library is distributed in the hope that it will be useful, but 21WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 22or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 23for more details. 24 25You should have received copies of the GNU General Public License and the 26GNU Lesser General Public License along with the GNU MP Library. If not, 27see https://www.gnu.org/licenses/. 28 29 30 31 32 33This directory contains assembly code for nails-enabled 21264. The code is not 34very well optimized. 35 36For addmul_N, as N grows larger, we could make multiple loads together, then do 37about 3.3 i/c. 10 cycles after the last load, we can increase to 4 i/c. This 38would surely allow addmul_4 to run at 2 c/l, but the same should be possible 39also for addmul_3 and perhaps even addmul_2. 40 41 42 current fair best 43Routine c/l unroll c/l unroll c/l i/c 44mul_1 3.25 2.75 2.75 3.273 45addmul_1 4.0 4 3.5 4 14 3.25 3.385 46addmul_2 4.0 1 2.5 2 10 2.25 3.333 47addmul_3 3.0 1 2.33 2 14 2 3.333 48addmul_4 2.5 1 2.125 2 17 2 3.135 49 50addmul_5 2 1 10 51addmul_6 2 1 12 52addmul_7 2 1 14 53 54(The "best" column doesn't account for bookkeeping instructions and 55thereby assumes infinite unrolling.) 56 57Basecase usages: 58 591 addmul_1 602 addmul_2 613 addmul_3 624 addmul_4 635 addmul_3 + addmul_2 2.3998 646 addmul_4 + addmul_2 657 addmul_4 + addmul_3 66