xref: /netbsd-src/external/lgpl3/gmp/dist/doc/projects.html (revision 946379e7b37692fc43f68eb0d1c10daa0a7f3b6c)
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2<html>
3<head>
4  <title>GMP Development Projects</title>
5  <link rel="shortcut icon" href="favicon.ico">
6  <link rel="stylesheet" href="gmp.css">
7  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
8</head>
9
10<center>
11  <h1>
12    GMP Development Projects
13  </h1>
14</center>
15
16<font size=-1>
17<pre>
18Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2008, 2009, 2010, 2011
19Free Software Foundation, Inc.
20
21This file is part of the GNU MP Library.
22
23The GNU MP Library is free software; you can redistribute it and/or modify
24it under the terms of the GNU Lesser General Public License as published
25by the Free Software Foundation; either version 3 of the License, or (at
26your option) any later version.
27
28The GNU MP Library is distributed in the hope that it will be useful, but
29WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
30or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
31License for more details.
32
33You should have received a copy of the GNU Lesser General Public License
34along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.
35</pre>
36</font>
37
38<hr>
39<!-- NB. timestamp updated automatically by emacs -->
40  This file current as of 5 Dec 2011.  An up-to-date version is available at
41  <a href="http://gmplib.org/projects.html">http://gmplib.org/projects.html</a>.
42  Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
43
44<p> This file lists projects suitable for volunteers.  Please see the
45    <a href="tasks.html">tasks file</a> for smaller tasks.
46
47<p> If you want to work on any of the projects below, please let
48    gmp-devel<font>@</font>gmplib.org know.  If you want to help with a project
49    that already somebody else is working on, you will get in touch through
50    gmp-devel<font>@</font>gmplib.org.  (There are no email addresses of
51    volunteers below, due to spamming problems.)
52
53<ul>
54<li> <strong>Faster multiplication</strong>
55
56  <ol>
57
58    <li> Work on the algorithm selection code for unbalanced multiplication.
59
60    <li> Implement an FFT variant computing the coefficients mod m different
61	 limb size primes of the form l*2^k+1. i.e., compute m separate FFTs.
62	 The wanted coefficients will at the end be found by lifting with CRT
63	 (Chinese Remainder Theorem).  If we let m = 3, i.e., use 3 primes, we
64	 can split the operands into coefficients at limb boundaries, and if
65	 our machine uses b-bit limbs, we can multiply numbers with close to
66	 2^b limbs without coefficient overflow.  For smaller multiplication,
67	 we might perhaps let m = 1, and instead of splitting our operands at
68	 limb boundaries, split them in much smaller pieces.  We might also use
69	 4 or more primes, and split operands into bigger than b-bit chunks.
70	 By using more primes, the gain in shorter transform length, but lose
71	 in having to do more FFTs, but that is a slight total save.  We then
72	 lose in more expensive CRT. <br><br>
73
74	 <p> [We now have two implementations of this algorithm, one by Tommy
75	 F�rnqvist and one by Niels M�ller.]
76
77    <li> Work on short products.  Our mullo and mulmid are probably K, but we
78         lack mulhi.
79
80  </ol>
81
82  <p> Another possibility would be an optimized cube.  In the basecase that
83      should definitely be able to save cross-products in a similar fashion to
84      squaring, but some investigation might be needed for how best to adapt
85      the higher-order algorithms.  Not sure whether cubing or further small
86      powers have any particularly important uses though.
87
88
89<li> <strong>Assembly routines</strong>
90
91  <p> Write new and improve existing assembly routines.  The tests/devel
92      programs and the tune/speed.c and tune/many.pl programs are useful for
93      testing and timing the routines you write.  See the README files in those
94      directories for more information.
95
96  <p> Please make sure your new routines are fast for these three situations:
97      <ol>
98	<li> Small operands of less than, say, 10 limbs.
99	<li> Medium size operands, that fit into the cache.
100	<li> Huge operands that does not fit into the cache.
101      </ol>
102
103  <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
104      mpn_sqr_basecase.  The latter two don't exist for all machines, while
105      mpn_addmul_1 exists for almost all machines.
106
107  <p> Standard techniques for these routines are unrolling, software
108      pipelining, and specialization for common operand values.  For machines
109      with poor integer multiplication, it is sometimes possible to remedy the
110      situation using floating-point operations or SIMD operations such as MMX
111      (x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc).
112
113  <p> Using floating-point operations is interesting but somewhat tricky.
114      Since IEEE double has 53 bit of mantissa, one has to split the operands
115      in small pieces, so that no intermediates are greater than 2^53.  For
116      32-bit computers, splitting one operand into 16-bit pieces works.  For
117      64-bit machines, one operand can be split into 21-bit pieces and the
118      other into 32-bit pieces.  (A 64-bit operand can be split into just three
119      21-bit pieces if one allows the split operands to be negative!)
120
121
122<li> <strong>Faster sqrt</strong>
123
124  <p> The current code uses divisions, which are reasonably fast, but it'd be
125      possible to use only multiplications by computing 1/sqrt(A) using this
126      iteration:
127      <pre>
128				    2
129		   x   = x  (3 &minus; A x )/2
130		    i+1	  i	    i  </pre>
131      The square root can then be computed like this:
132      <pre>
133		     sqrt(A) = A x
134				  n  </pre>
135  <p> That final multiply might be the full size of the input (though it might
136      only need the high half of that), so there may or may not be any speedup
137      overall.
138
139  <p> We should probably allow a special exponent-like parameter, to speed
140      computations of a precise square root of a small number in mpf and mpfr.
141
142
143<li> <strong>Nth root</strong>
144
145  <p> Improve mpn_rootrem.  The current code is not too bad, but its time
146      complexity is a function of the input, while it is possible to make
147      the <i>average</i> complexity a function of the output.
148
149
150<li> <strong>Fat binaries</strong>
151
152  <p> Add more functions to the set of fat functions.
153
154  <p> The speed of multipliciaton is today highly dependent on combination
155  functions like <code>addlsh1_n</code>.  A fat binary will never use any such
156  functions, since they are classified as optional.  Ideally, we should use
157  them, but making the current compile-time selections of optional functions
158  become run-time selections for fat binaries.
159
160  <p> If we make fat binaries work really well, we should move away frm tehe
161  current configure scheme (at least by default) and instead include all code
162  always.
163
164
165<li> <strong>Exceptions</strong>
166
167  <p> Some sort of scheme for exceptions handling would be desirable.
168      Presently the only thing documented is that divide by zero in GMP
169      functions provokes a deliberate machine divide by zero (on those systems
170      where such a thing exists at least).  The global <code>gmp_errno</code>
171      is not actually documented, except for the old <code>gmp_randinit</code>
172      function.  Being currently just a plain global means it's not
173      thread-safe.
174
175  <p> The basic choices for exceptions are returning an error code or having a
176      handler function to be called.  The disadvantage of error returns is they
177      have to be checked, leading to tedious and rarely executed code, and
178      strictly speaking such a scheme wouldn't be source or binary compatible.
179      The disadvantage of a handler function is that a <code>longjmp</code> or
180      similar recovery from it may be difficult.  A combination would be
181      possible, for instance by allowing the handler to return an error code.
182
183  <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
184      normally be detected at the start of functions, so exception handling
185      would have a clean state.  What's worth considering though is that the
186      GMP function detecting the exception may have been called via some third
187      party library or self contained application module, and hence have
188      various bits of state to be cleaned up above it.  It'd be highly
189      desirable for an exceptions scheme to allow for such cleanups.
190
191  <p> The C++ destructor mechanism could help with cleanups both internally and
192      externally, but being a plain C library we don't want to depend on that.
193
194  <p> A C++ <code>throw</code> might be a good optional extra exceptions
195      mechanism, perhaps under a build option.  For
196      GCC <code>-fexceptions</code> will add the necessary frame information to
197      plain C code, or GMP could be compiled as C++.
198
199  <p> Out-of-memory exceptions are expected to be handled by the
200      <code>mp_set_memory_functions</code> routines, rather than being a
201      prospective part of divide-by-zero etc.  Some similar considerations
202      apply but what differs is that out-of-memory can arise deep within GMP
203      internals.  Even fundamental routines like <code>mpn_add_n</code> and
204      <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
205      vector systems).  Allowing for an error code return would require an
206      awful lot of checking internally.  Perhaps it'd still be worthwhile, but
207      it'd be a lot of changes and the extra code would probably be rather
208      rarely executed in normal usages.
209
210  <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
211      general, lead to memory leaks and may leave GMP variables operated on in
212      inconsistent states.  Maybe it'd be possible to record recovery
213      information for use by the relevant allocate or reallocate function, but
214      that too would be a lot of changes.
215
216  <p> One scheme for out-of-memory would be to note that all GMP allocations go
217      through the <code>mp_set_memory_functions</code> routines.  So if the
218      application has an intended <code>setjmp</code> recovery point it can
219      record memory activity by GMP and abandon space allocated and variables
220      initialized after that point.  This might be as simple as directing the
221      allocation functions to a separate pool, but in general would have the
222      disadvantage of needing application-level bookkeeping on top of the
223      normal system <code>malloc</code>.  An advantage however is that it needs
224      nothing from GMP itself and on that basis doesn't burden applications not
225      needing recovery.  Note that there's probably some details to be worked
226      out here about reallocs of existing variables, and perhaps about copying
227      or swapping between "permanent" and "temporary" variables.
228
229  <p> Applications desiring a fine-grained error control, for instance a
230      language interpreter, would very possibly not be well served by a scheme
231      requiring <code>longjmp</code>.  Wrapping every GMP function call with a
232      <code>setjmp</code> would be very inconvenient.
233
234  <p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN,
235      a special value indicating an out-of-memory or other failure.  This would
236      be similar to NaNs in mpfr.  Unfortunately such a scheme could only be
237      used by programs prepared to handle such special values, since for
238      instance a program waiting for some condition to be satisfied could
239      become an infinite loop if it wasn't also watching for NaNs.  The work to
240      implement this would be significant too, lots of checking of inputs and
241      intermediate results.  And if <code>mpn</code> routines were to
242      participate in this (which they would have to internally) a lot of new
243      return values would need to be added, since of course there's no
244      <code>mpz_t</code> etc structure for them to indicate failure in.
245
246  <p> Stack overflow is another possible exception, but perhaps not one that
247      can be easily detected in general.  On i386 GNU/Linux for instance GCC
248      normally doesn't generate stack probes for an <code>alloca</code>, but
249      merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
250      miss the stack redzone and hit arbitrary data.  GMP stack usage is
251      normally a function of operand size, which might be enough for some
252      applications to know they'll be safe.  Otherwise a fixed maximum usage
253      can probably be obtained by building with
254      <code>--enable-alloca=malloc-reentrant</code> (or
255      <code>notreentrant</code>).  Arranging the default to be
256      <code>alloca</code> only on blocks up to a certain size and
257      <code>malloc</code> thereafter might be a better approach and would have
258      the advantage of not having calculations limited by available stack.
259
260  <p> Actually recovering from stack overflow is of course another problem.  It
261      might be possible to catch a <code>SIGSEGV</code> in the stack redzone
262      and do something in a <code>sigaltstack</code>, on systems which have
263      that, but recovery might otherwise not be possible.  This is worth
264      bearing in mind because there's no point worrying about tight and careful
265      out-of-memory recovery if an out-of-stack is fatal.
266
267  <p> Operand overflow is another exception to be addressed.  It's easy for
268      instance to ask <code>mpz_pow_ui</code> for a result bigger than an
269      <code>mpz_t</code> can possibly represent.  Currently overflows in limb
270      or byte count calculations will go undetected.  Often they'll still end
271      up asking the memory functions for blocks bigger than available memory,
272      but that's by no means certain and results are unpredictable in general.
273      It'd be desirable to tighten up such size calculations.  Probably only
274      selected routines would need checks, if it's assumed say that no input
275      will be more than half of all memory and hence size additions like say
276      <code>mpz_mul</code> won't overflow.
277
278
279<li> <strong>Performance Tool</strong>
280
281  <p> It'd be nice to have some sort of tool for getting an overview of
282      performance.  Clearly a great many things could be done, but some primary
283      uses would be,
284
285      <ol>
286	<li> Checking speed variations between compilers.
287	<li> Checking relative performance between systems or CPUs.
288      </ol>
289
290  <p> A combination of measuring some fundamental routines and some
291      representative application routines might satisfy these.
292
293  <p> The tune/time.c routines would be the easiest way to get good accurate
294      measurements on lots of different systems.  The high level
295      <code>speed_measure</code> may or may not suit, but the basic
296      <code>speed_starttime</code> and <code>speed_endtime</code> would cover
297      lots of portability and accuracy questions.
298
299
300<li> <strong>Using <code>restrict</code></strong>
301
302  <p> There might be some value in judicious use of C99 style
303      <code>restrict</code> on various pointers, but this would need some
304      careful thought about what it implies for the various operand overlaps
305      permitted in GMP.
306
307  <p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but
308      expressing tighter (or perhaps looser) requirements.  Might be worth
309      investigating that before using <code>restrict</code> unconditionally.
310
311  <p> Loops are presumably where the greatest benefit would be had, by allowing
312      the compiler to advance reads ahead of writes, perhaps as part of loop
313      unrolling.  However critical loops are generally coded in assembler, so
314      there might not be very much to gain.  And on Cray systems the explicit
315      use of <code>_Pragma</code> gives an equivalent effect.
316
317  <p> One thing to note is that Microsoft C headers (on ia64 at least) contain
318      <code>__declspec(restrict)</code>, so a <code>#define</code> of
319      <code>restrict</code> should be avoided.  It might be wisest to setup a
320      <code>gmp_restrict</code>.
321
322
323
324<li> <strong>Factorial</strong>
325
326  <p> Rewrite for simplicty and speed.  Work is in progress.
327
328
329<li> <strong>Binomial Coefficients</strong>
330
331  <p> Rewrite for simplicty and speed.  Work is in progress.
332
333
334<li> <strong>Prime Testing</strong>
335
336  <p> GMP is not really a number theory library and probably shouldn't have
337      large amounts of code dedicated to sophisticated prime testing
338      algorithms, but basic things well-implemented would suit.  Tests offering
339      certainty are probably all too big or too slow (or both!) to justify
340      inclusion in the main library.  Demo programs showing some possibilities
341      would be good though.
342
343  <p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is
344      rather specific to the Miller-Rabin tests of the current implementation.
345      Better would be some sort of parameter asking perhaps for a maximum
346      chance 1/2^x of a probable prime in fact being composite.  If
347      applications follow the advice that the present reps gives 1/4^reps
348      chance then perhaps such a change is unnecessary, but an explicitly
349      described 1/2^x would allow for changes in the implementation or even for
350      new proofs about the theory.
351
352  <p> <code>mpz_probab_prime_p</code> always initializes a new
353      <code>gmp_randstate_t</code> for randomized tests, which unfortunately
354      means it's not really very random and in particular always runs the same
355      tests for a given input.  Perhaps a new interface could accept an rstate
356      to use, so successive tests could increase confidence in the result.
357
358  <p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the
359      trial divisions.  And since the various prime factors are constants, the
360      remainder can be tested with something like
361<pre>
362#define MP_LIMB_DIVISIBLE_7_P(n) \
363  ((n) * MODLIMB_INVERSE_7 &lt;= MP_LIMB_T_MAX/7)
364</pre>
365      Which would help compilers that don't know how to optimize divisions by
366      constants, and is even an improvement on current gcc 3.2 code.  This
367      technique works for any modulus, see Granlund and Montgomery "Division by
368      Invariant Integers" section 9.
369
370  <p> The trial divisions are done with primes generated and grouped at
371      runtime.  This could instead be a table of data, with pre-calculated
372      inverses too.  Storing deltas, ie. amounts to add, rather than actual
373      primes would save space.  <code>udiv_qrnnd_preinv</code> style inverses
374      can be made to exist by adding dummy factors of 2 if necessary.  Some
375      thought needs to be given as to how big such a table should be, based on
376      how much dividing would be profitable for what sort of size inputs.  The
377      data could be shared by the perfect power testing.
378
379  <p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor
380      of N must be == 1 mod 4, saving half the work in trial dividing.  (If
381      x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the
382      jacobi symbol (-1/p)=1.  But also (-1/p)=(-1)^((p-1)/2), hence must have
383      p==1 mod 4.)  But knowing whether sqrt(-1) mod N exists is not too easy.
384      A strong pseudoprime test can reveal one, so perhaps such a test could be
385      inserted part way though the dividing.
386
387  <p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a
388      quadratic pseudoprime test taking about 3x longer than a plain test, but
389      with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests
390      would offer only (1/4)^3 == 1/64).  Such a test needs completely random
391      parameters to satisfy the theory, though single-limb values would run
392      faster.  It's probably best to do at least one plain Miller-Rabin before
393      any quadratic tests, since that can identify composites in less total
394      time.
395
396  <p> Some thought needs to be given to the structure of which tests (trial
397      division, Miller-Rabin, quadratic) and how many are done, based on what
398      sort of inputs we expect, with a view to minimizing average time.
399
400  <p> It might be a good idea to break out subroutines for the various tests,
401      so that an application can combine them in ways it prefers, if sensible
402      defaults in <code>mpz_probab_prime_p</code> don't suit.  In particular
403      this would let applications skip tests it knew would be unprofitable,
404      like trial dividing when an input is already known to have no small
405      factors.
406
407  <p> For small inputs, combinations of theory and explicit search make it
408      relatively easy to offer certainty.  For instance numbers up to 2^32
409      could be handled with a strong pseudoprime test and table lookup.  But
410      it's rather doubtful whether a smallnum prime test belongs in a bignum
411      library.  Perhaps if it had other internal uses.
412
413  <p> An <code>mpz_nthprime</code> might be cute, but is almost certainly
414      impractical for anything but small n.
415
416
417<li> <strong>Intra-Library Calls</strong>
418
419  <p> On various systems, calls within libgmp still go through the PLT, TOC or
420      other mechanism, which makes the code bigger and slower than it needs to
421      be.
422
423  <p> The theory would be to have all GMP intra-library calls resolved directly
424      to the routines in the library.  An application wouldn't be able to
425      replace a routine, the way it can normally, but there seems no good
426      reason to do that, in normal circumstances.
427
428  <p> The <code>visibility</code> attribute in recent gcc is good for this,
429      because it lets gcc omit unnecessary GOT pointer setups or whatever if it
430      finds all calls are local and there's no global data references.
431      Documented entrypoints would be <code>protected</code>, and purely
432      internal things not wanted by test programs or anything can be
433      <code>internal</code>.
434
435  <p> Unfortunately, on i386 it seems <code>protected</code> ends up causing
436      text segment relocations within libgmp.so, meaning the library code can't
437      be shared between processes, defeating the purpose of a shared library.
438      Perhaps this is just a gremlin in binutils (debian packaged
439      2.13.90.0.16-1).
440
441  <p> The linker can be told directly (with a link script, or options) to do
442      the same sort of thing.  This doesn't change the code emitted by gcc of
443      course, but it does mean calls are resolved directly to their targets,
444      avoiding a PLT entry.
445
446  <p> Keeping symbols private to libgmp.so is probably a good thing in general
447      too, to stop anyone even attempting to access them.  But some
448      undocumented things will need or want to be kept visible, for use by
449      mpfr, or the test and tune programs.  Libtool has a standard option for
450      selecting public symbols (used now for libmp).
451
452
453<li> <strong>Math functions for the mpf layer</strong>
454
455  <p> Implement the functions of math.h for the GMP mpf layer!	Check the book
456      "Pi and the AGM" by Borwein and Borwein for ideas how to do this.  These
457      functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2,
458      cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
459
460  <p> Note that the <a href="http://mpfr.org">mpfr</a> functions already
461  provide these functions, and that we usually recommend new programs to use
462  mpfr instead of mpf.
463</ul>
464<hr>
465
466</body>
467</html>
468
469<!--
470Local variables:
471eval: (add-hook 'write-file-hooks 'time-stamp)
472time-stamp-start: "This file current as of "
473time-stamp-format: "%:d %3b %:y"
474time-stamp-end: "\\."
475time-stamp-line-limit: 50
476End:
477-->
478