xref: /netbsd-src/external/lgpl3/gmp/dist/doc/projects.html (revision ce54336801cf28877c3414aa2fcb251dddd543a2)
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2<html>
3<head>
4  <title>GMP Development Projects</title>
5  <link rel="shortcut icon" href="favicon.ico">
6  <link rel="stylesheet" href="gmp.css">
7  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8</head>
9
10<center>
11  <h1>
12    GMP Development Projects
13  </h1>
14</center>
15
16<font size=-1>
17<pre>
18Copyright 2000-2006, 2008-2011 Free Software Foundation, Inc.
19
20This file is part of the GNU MP Library.
21
22The GNU MP Library is free software; you can redistribute it and/or modify
23it under the terms of either:
24
25  * the GNU Lesser General Public License as published by the Free
26    Software Foundation; either version 3 of the License, or (at your
27    option) any later version.
28
29or
30
31  * the GNU General Public License as published by the Free Software
32    Foundation; either version 2 of the License, or (at your option) any
33    later version.
34
35or both in parallel, as here.
36
37The GNU MP Library is distributed in the hope that it will be useful, but
38WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
39or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
40for more details.
41
42You should have received copies of the GNU General Public License and the
43GNU Lesser General Public License along with the GNU MP Library.  If not,
44see https://www.gnu.org/licenses/.
45</pre>
46</font>
47
48<hr>
49<!-- NB. timestamp updated automatically by emacs -->
50  This file current as of 29 Jan 2014.  An up-to-date version is available at
51  <a href="https://gmplib.org/projects.html">https://gmplib.org/projects.html</a>.
52  Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
53
54<p> This file lists projects suitable for volunteers.  Please see the
55    <a href="tasks.html">tasks file</a> for smaller tasks.
56
57<p> If you want to work on any of the projects below, please let
58    gmp-devel<font>@</font>gmplib.org know.  If you want to help with a project
59    that already somebody else is working on, you will get in touch through
60    gmp-devel<font>@</font>gmplib.org.  (There are no email addresses of
61    volunteers below, due to spamming problems.)
62
63<ul>
64<li> <strong>Faster multiplication</strong>
65
66  <ol>
67
68    <li> Work on the algorithm selection code for unbalanced multiplication.
69
70    <li> Implement an FFT variant computing the coefficients mod m different
71	 limb size primes of the form l*2^k+1. i.e., compute m separate FFTs.
72	 The wanted coefficients will at the end be found by lifting with CRT
73	 (Chinese Remainder Theorem).  If we let m = 3, i.e., use 3 primes, we
74	 can split the operands into coefficients at limb boundaries, and if
75	 our machine uses b-bit limbs, we can multiply numbers with close to
76	 2^b limbs without coefficient overflow.  For smaller multiplication,
77	 we might perhaps let m = 1, and instead of splitting our operands at
78	 limb boundaries, split them in much smaller pieces.  We might also use
79	 4 or more primes, and split operands into bigger than b-bit chunks.
80	 By using more primes, the gain in shorter transform length, but lose
81	 in having to do more FFTs, but that is a slight total save.  We then
82	 lose in more expensive CRT. <br><br>
83
84	 <p> [We now have two implementations of this algorithm, one by Tommy
85	 Färnqvist and one by Niels Möller.]
86
87    <li> Work on short products.  Our mullo and mulmid are probably K, but we
88         lack mulhi.
89
90  </ol>
91
92  <p> Another possibility would be an optimized cube.  In the basecase that
93      should definitely be able to save cross-products in a similar fashion to
94      squaring, but some investigation might be needed for how best to adapt
95      the higher-order algorithms.  Not sure whether cubing or further small
96      powers have any particularly important uses though.
97
98
99<li> <strong>Assembly routines</strong>
100
101  <p> Write new and improve existing assembly routines.  The tests/devel
102      programs and the tune/speed.c and tune/many.pl programs are useful for
103      testing and timing the routines you write.  See the README files in those
104      directories for more information.
105
106  <p> Please make sure your new routines are fast for these three situations:
107      <ol>
108	<li> Small operands of less than, say, 10 limbs.
109	<li> Medium size operands, that fit into the cache.
110	<li> Huge operands that does not fit into the cache.
111      </ol>
112
113  <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
114      mpn_sqr_basecase.  The latter two don't exist for all machines, while
115      mpn_addmul_1 exists for almost all machines.
116
117  <p> Standard techniques for these routines are unrolling, software
118      pipelining, and specialization for common operand values.  For machines
119      with poor integer multiplication, it is sometimes possible to remedy the
120      situation using floating-point operations or SIMD operations such as MMX
121      (x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc).
122
123  <p> Using floating-point operations is interesting but somewhat tricky.
124      Since IEEE double has 53 bit of mantissa, one has to split the operands
125      in small pieces, so that no intermediates are greater than 2^53.  For
126      32-bit computers, splitting one operand into 16-bit pieces works.  For
127      64-bit machines, one operand can be split into 21-bit pieces and the
128      other into 32-bit pieces.  (A 64-bit operand can be split into just three
129      21-bit pieces if one allows the split operands to be negative!)
130
131
132<li> <strong>Faster sqrt</strong>
133
134  <p> The current code uses divisions, which are reasonably fast, but it'd be
135      possible to use only multiplications by computing 1/sqrt(A) using this
136      iteration:
137      <pre>
138				    2
139		   x   = x  (3 &minus; A x )/2
140		    i+1	  i	    i  </pre>
141      The square root can then be computed like this:
142      <pre>
143		     sqrt(A) = A x
144				  n  </pre>
145  <p> That final multiply might be the full size of the input (though it might
146      only need the high half of that), so there may or may not be any speedup
147      overall.
148
149  <p> We should probably allow a special exponent-like parameter, to speed
150      computations of a precise square root of a small number in mpf and mpfr.
151
152
153<li> <strong>Nth root</strong>
154
155  <p> Improve mpn_rootrem.  The current code is not too bad, but its time
156      complexity is a function of the input, while it is possible to make
157      the <i>average</i> complexity a function of the output.
158
159
160<li> <strong>Fat binaries</strong>
161
162  <p> Add more functions to the set of fat functions.
163
164  <p> The speed of multiplication is today highly dependent on combination
165  functions like <code>addlsh1_n</code>.  A fat binary will never use any such
166  functions, since they are classified as optional.  Ideally, we should use
167  them, but making the current compile-time selections of optional functions
168  become run-time selections for fat binaries.
169
170  <p> If we make fat binaries work really well, we should move away frm tehe
171  current configure scheme (at least by default) and instead include all code
172  always.
173
174
175<li> <strong>Exceptions</strong>
176
177  <p> Some sort of scheme for exceptions handling would be desirable.
178      Presently the only thing documented is that divide by zero in GMP
179      functions provokes a deliberate machine divide by zero (on those systems
180      where such a thing exists at least).  The global <code>gmp_errno</code>
181      is not actually documented, except for the old <code>gmp_randinit</code>
182      function.  Being currently just a plain global means it's not
183      thread-safe.
184
185  <p> The basic choices for exceptions are returning an error code or having a
186      handler function to be called.  The disadvantage of error returns is they
187      have to be checked, leading to tedious and rarely executed code, and
188      strictly speaking such a scheme wouldn't be source or binary compatible.
189      The disadvantage of a handler function is that a <code>longjmp</code> or
190      similar recovery from it may be difficult.  A combination would be
191      possible, for instance by allowing the handler to return an error code.
192
193  <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
194      normally be detected at the start of functions, so exception handling
195      would have a clean state.  What's worth considering though is that the
196      GMP function detecting the exception may have been called via some third
197      party library or self contained application module, and hence have
198      various bits of state to be cleaned up above it.  It'd be highly
199      desirable for an exceptions scheme to allow for such cleanups.
200
201  <p> The C++ destructor mechanism could help with cleanups both internally and
202      externally, but being a plain C library we don't want to depend on that.
203
204  <p> A C++ <code>throw</code> might be a good optional extra exceptions
205      mechanism, perhaps under a build option.  For
206      GCC <code>-fexceptions</code> will add the necessary frame information to
207      plain C code, or GMP could be compiled as C++.
208
209  <p> Out-of-memory exceptions are expected to be handled by the
210      <code>mp_set_memory_functions</code> routines, rather than being a
211      prospective part of divide-by-zero etc.  Some similar considerations
212      apply but what differs is that out-of-memory can arise deep within GMP
213      internals.  Even fundamental routines like <code>mpn_add_n</code> and
214      <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
215      vector systems).  Allowing for an error code return would require an
216      awful lot of checking internally.  Perhaps it'd still be worthwhile, but
217      it'd be a lot of changes and the extra code would probably be rather
218      rarely executed in normal usages.
219
220  <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
221      general, lead to memory leaks and may leave GMP variables operated on in
222      inconsistent states.  Maybe it'd be possible to record recovery
223      information for use by the relevant allocate or reallocate function, but
224      that too would be a lot of changes.
225
226  <p> One scheme for out-of-memory would be to note that all GMP allocations go
227      through the <code>mp_set_memory_functions</code> routines.  So if the
228      application has an intended <code>setjmp</code> recovery point it can
229      record memory activity by GMP and abandon space allocated and variables
230      initialized after that point.  This might be as simple as directing the
231      allocation functions to a separate pool, but in general would have the
232      disadvantage of needing application-level bookkeeping on top of the
233      normal system <code>malloc</code>.  An advantage however is that it needs
234      nothing from GMP itself and on that basis doesn't burden applications not
235      needing recovery.  Note that there's probably some details to be worked
236      out here about reallocs of existing variables, and perhaps about copying
237      or swapping between "permanent" and "temporary" variables.
238
239  <p> Applications desiring a fine-grained error control, for instance a
240      language interpreter, would very possibly not be well served by a scheme
241      requiring <code>longjmp</code>.  Wrapping every GMP function call with a
242      <code>setjmp</code> would be very inconvenient.
243
244  <p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN,
245      a special value indicating an out-of-memory or other failure.  This would
246      be similar to NaNs in mpfr.  Unfortunately such a scheme could only be
247      used by programs prepared to handle such special values, since for
248      instance a program waiting for some condition to be satisfied could
249      become an infinite loop if it wasn't also watching for NaNs.  The work to
250      implement this would be significant too, lots of checking of inputs and
251      intermediate results.  And if <code>mpn</code> routines were to
252      participate in this (which they would have to internally) a lot of new
253      return values would need to be added, since of course there's no
254      <code>mpz_t</code> etc structure for them to indicate failure in.
255
256  <p> Stack overflow is another possible exception, but perhaps not one that
257      can be easily detected in general.  On i386 GNU/Linux for instance GCC
258      normally doesn't generate stack probes for an <code>alloca</code>, but
259      merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
260      miss the stack redzone and hit arbitrary data.  GMP stack usage is
261      normally a function of operand size, which might be enough for some
262      applications to know they'll be safe.  Otherwise a fixed maximum usage
263      can probably be obtained by building with
264      <code>--enable-alloca=malloc-reentrant</code> (or
265      <code>notreentrant</code>).  Arranging the default to be
266      <code>alloca</code> only on blocks up to a certain size and
267      <code>malloc</code> thereafter might be a better approach and would have
268      the advantage of not having calculations limited by available stack.
269
270  <p> Actually recovering from stack overflow is of course another problem.  It
271      might be possible to catch a <code>SIGSEGV</code> in the stack redzone
272      and do something in a <code>sigaltstack</code>, on systems which have
273      that, but recovery might otherwise not be possible.  This is worth
274      bearing in mind because there's no point worrying about tight and careful
275      out-of-memory recovery if an out-of-stack is fatal.
276
277  <p> Operand overflow is another exception to be addressed.  It's easy for
278      instance to ask <code>mpz_pow_ui</code> for a result bigger than an
279      <code>mpz_t</code> can possibly represent.  Currently overflows in limb
280      or byte count calculations will go undetected.  Often they'll still end
281      up asking the memory functions for blocks bigger than available memory,
282      but that's by no means certain and results are unpredictable in general.
283      It'd be desirable to tighten up such size calculations.  Probably only
284      selected routines would need checks, if it's assumed say that no input
285      will be more than half of all memory and hence size additions like say
286      <code>mpz_mul</code> won't overflow.
287
288
289<li> <strong>Performance Tool</strong>
290
291  <p> It'd be nice to have some sort of tool for getting an overview of
292      performance.  Clearly a great many things could be done, but some primary
293      uses would be,
294
295      <ol>
296	<li> Checking speed variations between compilers.
297	<li> Checking relative performance between systems or CPUs.
298      </ol>
299
300  <p> A combination of measuring some fundamental routines and some
301      representative application routines might satisfy these.
302
303  <p> The tune/time.c routines would be the easiest way to get good accurate
304      measurements on lots of different systems.  The high level
305      <code>speed_measure</code> may or may not suit, but the basic
306      <code>speed_starttime</code> and <code>speed_endtime</code> would cover
307      lots of portability and accuracy questions.
308
309
310<li> <strong>Using <code>restrict</code></strong>
311
312  <p> There might be some value in judicious use of C99 style
313      <code>restrict</code> on various pointers, but this would need some
314      careful thought about what it implies for the various operand overlaps
315      permitted in GMP.
316
317  <p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but
318      expressing tighter (or perhaps looser) requirements.  Might be worth
319      investigating that before using <code>restrict</code> unconditionally.
320
321  <p> Loops are presumably where the greatest benefit would be had, by allowing
322      the compiler to advance reads ahead of writes, perhaps as part of loop
323      unrolling.  However critical loops are generally coded in assembler, so
324      there might not be very much to gain.  And on Cray systems the explicit
325      use of <code>_Pragma</code> gives an equivalent effect.
326
327  <p> One thing to note is that Microsoft C headers (on ia64 at least) contain
328      <code>__declspec(restrict)</code>, so a <code>#define</code> of
329      <code>restrict</code> should be avoided.  It might be wisest to setup a
330      <code>gmp_restrict</code>.
331
332
333<li> <strong>Prime Testing</strong>
334
335  <p> GMP is not really a number theory library and probably shouldn't have
336      large amounts of code dedicated to sophisticated prime testing
337      algorithms, but basic things well-implemented would suit.  Tests offering
338      certainty are probably all too big or too slow (or both!) to justify
339      inclusion in the main library.  Demo programs showing some possibilities
340      would be good though.
341
342  <p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is
343      rather specific to the Miller-Rabin tests of the current implementation.
344      Better would be some sort of parameter asking perhaps for a maximum
345      chance 1/2^x of a probable prime in fact being composite.  If
346      applications follow the advice that the present reps gives 1/4^reps
347      chance then perhaps such a change is unnecessary, but an explicitly
348      described 1/2^x would allow for changes in the implementation or even for
349      new proofs about the theory.
350
351  <p> <code>mpz_probab_prime_p</code> always initializes a new
352      <code>gmp_randstate_t</code> for randomized tests, which unfortunately
353      means it's not really very random and in particular always runs the same
354      tests for a given input.  Perhaps a new interface could accept an rstate
355      to use, so successive tests could increase confidence in the result.
356
357  <p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the
358      trial divisions.  And since the various prime factors are constants, the
359      remainder can be tested with something like
360<pre>
361#define MP_LIMB_DIVISIBLE_7_P(n) \
362  ((n) * MODLIMB_INVERSE_7 &lt;= MP_LIMB_T_MAX/7)
363</pre>
364      Which would help compilers that don't know how to optimize divisions by
365      constants, and is even an improvement on current gcc 3.2 code.  This
366      technique works for any modulus, see Granlund and Montgomery "Division by
367      Invariant Integers" section 9.
368
369  <p> The trial divisions are done with primes generated and grouped at
370      runtime.  This could instead be a table of data, with pre-calculated
371      inverses too.  Storing deltas, ie. amounts to add, rather than actual
372      primes would save space.  <code>udiv_qrnnd_preinv</code> style inverses
373      can be made to exist by adding dummy factors of 2 if necessary.  Some
374      thought needs to be given as to how big such a table should be, based on
375      how much dividing would be profitable for what sort of size inputs.  The
376      data could be shared by the perfect power testing.
377
378  <p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor
379      of N must be == 1 mod 4, saving half the work in trial dividing.  (If
380      x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the
381      jacobi symbol (-1/p)=1.  But also (-1/p)=(-1)^((p-1)/2), hence must have
382      p==1 mod 4.)  But knowing whether sqrt(-1) mod N exists is not too easy.
383      A strong pseudoprime test can reveal one, so perhaps such a test could be
384      inserted part way though the dividing.
385
386  <p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a
387      quadratic pseudoprime test taking about 3x longer than a plain test, but
388      with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests
389      would offer only (1/4)^3 == 1/64).  Such a test needs completely random
390      parameters to satisfy the theory, though single-limb values would run
391      faster.  It's probably best to do at least one plain Miller-Rabin before
392      any quadratic tests, since that can identify composites in less total
393      time.
394
395  <p> Some thought needs to be given to the structure of which tests (trial
396      division, Miller-Rabin, quadratic) and how many are done, based on what
397      sort of inputs we expect, with a view to minimizing average time.
398
399  <p> It might be a good idea to break out subroutines for the various tests,
400      so that an application can combine them in ways it prefers, if sensible
401      defaults in <code>mpz_probab_prime_p</code> don't suit.  In particular
402      this would let applications skip tests it knew would be unprofitable,
403      like trial dividing when an input is already known to have no small
404      factors.
405
406  <p> For small inputs, combinations of theory and explicit search make it
407      relatively easy to offer certainty.  For instance numbers up to 2^32
408      could be handled with a strong pseudoprime test and table lookup.  But
409      it's rather doubtful whether a smallnum prime test belongs in a bignum
410      library.  Perhaps if it had other internal uses.
411
412  <p> An <code>mpz_nthprime</code> might be cute, but is almost certainly
413      impractical for anything but small n.
414
415
416<li> <strong>Intra-Library Calls</strong>
417
418  <p> On various systems, calls within libgmp still go through the PLT, TOC or
419      other mechanism, which makes the code bigger and slower than it needs to
420      be.
421
422  <p> The theory would be to have all GMP intra-library calls resolved directly
423      to the routines in the library.  An application wouldn't be able to
424      replace a routine, the way it can normally, but there seems no good
425      reason to do that, in normal circumstances.
426
427  <p> The <code>visibility</code> attribute in recent gcc is good for this,
428      because it lets gcc omit unnecessary GOT pointer setups or whatever if it
429      finds all calls are local and there's no global data references.
430      Documented entrypoints would be <code>protected</code>, and purely
431      internal things not wanted by test programs or anything can be
432      <code>internal</code>.
433
434  <p> Unfortunately, on i386 it seems <code>protected</code> ends up causing
435      text segment relocations within libgmp.so, meaning the library code can't
436      be shared between processes, defeating the purpose of a shared library.
437      Perhaps this is just a gremlin in binutils (debian packaged
438      2.13.90.0.16-1).
439
440  <p> The linker can be told directly (with a link script, or options) to do
441      the same sort of thing.  This doesn't change the code emitted by gcc of
442      course, but it does mean calls are resolved directly to their targets,
443      avoiding a PLT entry.
444
445  <p> Keeping symbols private to libgmp.so is probably a good thing in general
446      too, to stop anyone even attempting to access them.  But some
447      undocumented things will need or want to be kept visible, for use by
448      mpfr, or the test and tune programs.  Libtool has a standard option for
449      selecting public symbols (used now for libmp).
450
451
452<li> <strong>Math functions for the mpf layer</strong>
453
454  <p> Implement the functions of math.h for the GMP mpf layer!	Check the book
455      "Pi and the AGM" by Borwein and Borwein for ideas how to do this.  These
456      functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2,
457      cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
458
459  <p> Note that the <a href="http://mpfr.org">mpfr</a> functions already
460  provide these functions, and that we usually recommend new programs to use
461  mpfr instead of mpf.
462</ul>
463<hr>
464
465</body>
466</html>
467
468<!--
469Local variables:
470eval: (add-hook 'write-file-hooks 'time-stamp)
471time-stamp-start: "This file current as of "
472time-stamp-format: "%:d %3b %:y"
473time-stamp-end: "\\."
474time-stamp-line-limit: 50
475End:
476-->
477