xref: /netbsd-src/external/lgpl3/gmp/dist/mpn/alpha/ev6/nails/README (revision eceb233b9bd0dfebb902ed73b531ae6964fa3f9b)
1Copyright 2002, 2005 Free Software Foundation, Inc.
2
3This file is part of the GNU MP Library.
4
5The GNU MP Library is free software; you can redistribute it and/or modify
6it under the terms of either:
7
8  * the GNU Lesser General Public License as published by the Free
9    Software Foundation; either version 3 of the License, or (at your
10    option) any later version.
11
12or
13
14  * the GNU General Public License as published by the Free Software
15    Foundation; either version 2 of the License, or (at your option) any
16    later version.
17
18or both in parallel, as here.
19
20The GNU MP Library is distributed in the hope that it will be useful, but
21WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
22or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
23for more details.
24
25You should have received copies of the GNU General Public License and the
26GNU Lesser General Public License along with the GNU MP Library.  If not,
27see https://www.gnu.org/licenses/.
28
29
30
31
32
33This directory contains assembly code for nails-enabled 21264.  The code is not
34very well optimized.
35
36For addmul_N, as N grows larger, we could make multiple loads together, then do
37about 3.3 i/c.  10 cycles after the last load, we can increase to 4 i/c.  This
38would surely allow addmul_4 to run at 2 c/l, but the same should be possible
39also for addmul_3 and perhaps even addmul_2.
40
41
42		current		fair		best
43Routine		c/l  unroll	c/l  unroll	c/l  i/c
44mul_1		3.25		2.75		2.75 3.273
45addmul_1	4.0	4	3.5	4 14	3.25 3.385
46addmul_2	4.0	1	2.5	2 10	2.25 3.333
47addmul_3	3.0	1	2.33	2 14	2    3.333
48addmul_4	2.5	1	2.125	2 17	2    3.135
49
50addmul_5			2	1 10
51addmul_6			2	1 12
52addmul_7			2	1 14
53
54(The "best" column doesn't account for bookkeeping instructions and
55thereby assumes infinite unrolling.)
56
57Basecase usages:
58
591	 addmul_1
602	 addmul_2
613	 addmul_3
624	 addmul_4
635	 addmul_3 + addmul_2	2.3998
646	 addmul_4 + addmul_2
657	 addmul_4 + addmul_3
66