1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 2<html> 3<head> 4 <title>GMP Development Projects</title> 5 <link rel="shortcut icon" href="favicon.ico"> 6 <link rel="stylesheet" href="gmp.css"> 7 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 8</head> 9 10<center> 11 <h1> 12 GMP Development Projects 13 </h1> 14</center> 15 16<font size=-1> 17<pre> 18Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2008, 2009, 2010, 2011 19Free Software Foundation, Inc. 20 21This file is part of the GNU MP Library. 22 23The GNU MP Library is free software; you can redistribute it and/or modify 24it under the terms of the GNU Lesser General Public License as published 25by the Free Software Foundation; either version 3 of the License, or (at 26your option) any later version. 27 28The GNU MP Library is distributed in the hope that it will be useful, but 29WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 30or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public 31License for more details. 32 33You should have received a copy of the GNU Lesser General Public License 34along with the GNU MP Library. If not, see http://www.gnu.org/licenses/. 35</pre> 36</font> 37 38<hr> 39<!-- NB. timestamp updated automatically by emacs --> 40 This file current as of 5 Dec 2011. An up-to-date version is available at 41 <a href="http://gmplib.org/projects.html">http://gmplib.org/projects.html</a>. 42 Please send comments about this page to gmp-devel<font>@</font>gmplib.org. 43 44<p> This file lists projects suitable for volunteers. Please see the 45 <a href="tasks.html">tasks file</a> for smaller tasks. 46 47<p> If you want to work on any of the projects below, please let 48 gmp-devel<font>@</font>gmplib.org know. If you want to help with a project 49 that already somebody else is working on, you will get in touch through 50 gmp-devel<font>@</font>gmplib.org. (There are no email addresses of 51 volunteers below, due to spamming problems.) 52 53<ul> 54<li> <strong>Faster multiplication</strong> 55 56 <ol> 57 58 <li> Work on the algorithm selection code for unbalanced multiplication. 59 60 <li> Implement an FFT variant computing the coefficients mod m different 61 limb size primes of the form l*2^k+1. i.e., compute m separate FFTs. 62 The wanted coefficients will at the end be found by lifting with CRT 63 (Chinese Remainder Theorem). If we let m = 3, i.e., use 3 primes, we 64 can split the operands into coefficients at limb boundaries, and if 65 our machine uses b-bit limbs, we can multiply numbers with close to 66 2^b limbs without coefficient overflow. For smaller multiplication, 67 we might perhaps let m = 1, and instead of splitting our operands at 68 limb boundaries, split them in much smaller pieces. We might also use 69 4 or more primes, and split operands into bigger than b-bit chunks. 70 By using more primes, the gain in shorter transform length, but lose 71 in having to do more FFTs, but that is a slight total save. We then 72 lose in more expensive CRT. <br><br> 73 74 <p> [We now have two implementations of this algorithm, one by Tommy 75 F�rnqvist and one by Niels M�ller.] 76 77 <li> Work on short products. Our mullo and mulmid are probably K, but we 78 lack mulhi. 79 80 </ol> 81 82 <p> Another possibility would be an optimized cube. In the basecase that 83 should definitely be able to save cross-products in a similar fashion to 84 squaring, but some investigation might be needed for how best to adapt 85 the higher-order algorithms. Not sure whether cubing or further small 86 powers have any particularly important uses though. 87 88 89<li> <strong>Assembly routines</strong> 90 91 <p> Write new and improve existing assembly routines. The tests/devel 92 programs and the tune/speed.c and tune/many.pl programs are useful for 93 testing and timing the routines you write. See the README files in those 94 directories for more information. 95 96 <p> Please make sure your new routines are fast for these three situations: 97 <ol> 98 <li> Small operands of less than, say, 10 limbs. 99 <li> Medium size operands, that fit into the cache. 100 <li> Huge operands that does not fit into the cache. 101 </ol> 102 103 <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and 104 mpn_sqr_basecase. The latter two don't exist for all machines, while 105 mpn_addmul_1 exists for almost all machines. 106 107 <p> Standard techniques for these routines are unrolling, software 108 pipelining, and specialization for common operand values. For machines 109 with poor integer multiplication, it is sometimes possible to remedy the 110 situation using floating-point operations or SIMD operations such as MMX 111 (x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc). 112 113 <p> Using floating-point operations is interesting but somewhat tricky. 114 Since IEEE double has 53 bit of mantissa, one has to split the operands 115 in small pieces, so that no intermediates are greater than 2^53. For 116 32-bit computers, splitting one operand into 16-bit pieces works. For 117 64-bit machines, one operand can be split into 21-bit pieces and the 118 other into 32-bit pieces. (A 64-bit operand can be split into just three 119 21-bit pieces if one allows the split operands to be negative!) 120 121 122<li> <strong>Faster sqrt</strong> 123 124 <p> The current code uses divisions, which are reasonably fast, but it'd be 125 possible to use only multiplications by computing 1/sqrt(A) using this 126 iteration: 127 <pre> 128 2 129 x = x (3 − A x )/2 130 i+1 i i </pre> 131 The square root can then be computed like this: 132 <pre> 133 sqrt(A) = A x 134 n </pre> 135 <p> That final multiply might be the full size of the input (though it might 136 only need the high half of that), so there may or may not be any speedup 137 overall. 138 139 <p> We should probably allow a special exponent-like parameter, to speed 140 computations of a precise square root of a small number in mpf and mpfr. 141 142 143<li> <strong>Nth root</strong> 144 145 <p> Improve mpn_rootrem. The current code is not too bad, but its time 146 complexity is a function of the input, while it is possible to make 147 the <i>average</i> complexity a function of the output. 148 149 150<li> <strong>Fat binaries</strong> 151 152 <p> Add more functions to the set of fat functions. 153 154 <p> The speed of multipliciaton is today highly dependent on combination 155 functions like <code>addlsh1_n</code>. A fat binary will never use any such 156 functions, since they are classified as optional. Ideally, we should use 157 them, but making the current compile-time selections of optional functions 158 become run-time selections for fat binaries. 159 160 <p> If we make fat binaries work really well, we should move away frm tehe 161 current configure scheme (at least by default) and instead include all code 162 always. 163 164 165<li> <strong>Exceptions</strong> 166 167 <p> Some sort of scheme for exceptions handling would be desirable. 168 Presently the only thing documented is that divide by zero in GMP 169 functions provokes a deliberate machine divide by zero (on those systems 170 where such a thing exists at least). The global <code>gmp_errno</code> 171 is not actually documented, except for the old <code>gmp_randinit</code> 172 function. Being currently just a plain global means it's not 173 thread-safe. 174 175 <p> The basic choices for exceptions are returning an error code or having a 176 handler function to be called. The disadvantage of error returns is they 177 have to be checked, leading to tedious and rarely executed code, and 178 strictly speaking such a scheme wouldn't be source or binary compatible. 179 The disadvantage of a handler function is that a <code>longjmp</code> or 180 similar recovery from it may be difficult. A combination would be 181 possible, for instance by allowing the handler to return an error code. 182 183 <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can 184 normally be detected at the start of functions, so exception handling 185 would have a clean state. What's worth considering though is that the 186 GMP function detecting the exception may have been called via some third 187 party library or self contained application module, and hence have 188 various bits of state to be cleaned up above it. It'd be highly 189 desirable for an exceptions scheme to allow for such cleanups. 190 191 <p> The C++ destructor mechanism could help with cleanups both internally and 192 externally, but being a plain C library we don't want to depend on that. 193 194 <p> A C++ <code>throw</code> might be a good optional extra exceptions 195 mechanism, perhaps under a build option. For 196 GCC <code>-fexceptions</code> will add the necessary frame information to 197 plain C code, or GMP could be compiled as C++. 198 199 <p> Out-of-memory exceptions are expected to be handled by the 200 <code>mp_set_memory_functions</code> routines, rather than being a 201 prospective part of divide-by-zero etc. Some similar considerations 202 apply but what differs is that out-of-memory can arise deep within GMP 203 internals. Even fundamental routines like <code>mpn_add_n</code> and 204 <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray 205 vector systems). Allowing for an error code return would require an 206 awful lot of checking internally. Perhaps it'd still be worthwhile, but 207 it'd be a lot of changes and the extra code would probably be rather 208 rarely executed in normal usages. 209 210 <p> A <code>longjmp</code> recovery for out-of-memory will currently, in 211 general, lead to memory leaks and may leave GMP variables operated on in 212 inconsistent states. Maybe it'd be possible to record recovery 213 information for use by the relevant allocate or reallocate function, but 214 that too would be a lot of changes. 215 216 <p> One scheme for out-of-memory would be to note that all GMP allocations go 217 through the <code>mp_set_memory_functions</code> routines. So if the 218 application has an intended <code>setjmp</code> recovery point it can 219 record memory activity by GMP and abandon space allocated and variables 220 initialized after that point. This might be as simple as directing the 221 allocation functions to a separate pool, but in general would have the 222 disadvantage of needing application-level bookkeeping on top of the 223 normal system <code>malloc</code>. An advantage however is that it needs 224 nothing from GMP itself and on that basis doesn't burden applications not 225 needing recovery. Note that there's probably some details to be worked 226 out here about reallocs of existing variables, and perhaps about copying 227 or swapping between "permanent" and "temporary" variables. 228 229 <p> Applications desiring a fine-grained error control, for instance a 230 language interpreter, would very possibly not be well served by a scheme 231 requiring <code>longjmp</code>. Wrapping every GMP function call with a 232 <code>setjmp</code> would be very inconvenient. 233 234 <p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN, 235 a special value indicating an out-of-memory or other failure. This would 236 be similar to NaNs in mpfr. Unfortunately such a scheme could only be 237 used by programs prepared to handle such special values, since for 238 instance a program waiting for some condition to be satisfied could 239 become an infinite loop if it wasn't also watching for NaNs. The work to 240 implement this would be significant too, lots of checking of inputs and 241 intermediate results. And if <code>mpn</code> routines were to 242 participate in this (which they would have to internally) a lot of new 243 return values would need to be added, since of course there's no 244 <code>mpz_t</code> etc structure for them to indicate failure in. 245 246 <p> Stack overflow is another possible exception, but perhaps not one that 247 can be easily detected in general. On i386 GNU/Linux for instance GCC 248 normally doesn't generate stack probes for an <code>alloca</code>, but 249 merely adjusts <code>%esp</code>. A big enough <code>alloca</code> can 250 miss the stack redzone and hit arbitrary data. GMP stack usage is 251 normally a function of operand size, which might be enough for some 252 applications to know they'll be safe. Otherwise a fixed maximum usage 253 can probably be obtained by building with 254 <code>--enable-alloca=malloc-reentrant</code> (or 255 <code>notreentrant</code>). Arranging the default to be 256 <code>alloca</code> only on blocks up to a certain size and 257 <code>malloc</code> thereafter might be a better approach and would have 258 the advantage of not having calculations limited by available stack. 259 260 <p> Actually recovering from stack overflow is of course another problem. It 261 might be possible to catch a <code>SIGSEGV</code> in the stack redzone 262 and do something in a <code>sigaltstack</code>, on systems which have 263 that, but recovery might otherwise not be possible. This is worth 264 bearing in mind because there's no point worrying about tight and careful 265 out-of-memory recovery if an out-of-stack is fatal. 266 267 <p> Operand overflow is another exception to be addressed. It's easy for 268 instance to ask <code>mpz_pow_ui</code> for a result bigger than an 269 <code>mpz_t</code> can possibly represent. Currently overflows in limb 270 or byte count calculations will go undetected. Often they'll still end 271 up asking the memory functions for blocks bigger than available memory, 272 but that's by no means certain and results are unpredictable in general. 273 It'd be desirable to tighten up such size calculations. Probably only 274 selected routines would need checks, if it's assumed say that no input 275 will be more than half of all memory and hence size additions like say 276 <code>mpz_mul</code> won't overflow. 277 278 279<li> <strong>Performance Tool</strong> 280 281 <p> It'd be nice to have some sort of tool for getting an overview of 282 performance. Clearly a great many things could be done, but some primary 283 uses would be, 284 285 <ol> 286 <li> Checking speed variations between compilers. 287 <li> Checking relative performance between systems or CPUs. 288 </ol> 289 290 <p> A combination of measuring some fundamental routines and some 291 representative application routines might satisfy these. 292 293 <p> The tune/time.c routines would be the easiest way to get good accurate 294 measurements on lots of different systems. The high level 295 <code>speed_measure</code> may or may not suit, but the basic 296 <code>speed_starttime</code> and <code>speed_endtime</code> would cover 297 lots of portability and accuracy questions. 298 299 300<li> <strong>Using <code>restrict</code></strong> 301 302 <p> There might be some value in judicious use of C99 style 303 <code>restrict</code> on various pointers, but this would need some 304 careful thought about what it implies for the various operand overlaps 305 permitted in GMP. 306 307 <p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but 308 expressing tighter (or perhaps looser) requirements. Might be worth 309 investigating that before using <code>restrict</code> unconditionally. 310 311 <p> Loops are presumably where the greatest benefit would be had, by allowing 312 the compiler to advance reads ahead of writes, perhaps as part of loop 313 unrolling. However critical loops are generally coded in assembler, so 314 there might not be very much to gain. And on Cray systems the explicit 315 use of <code>_Pragma</code> gives an equivalent effect. 316 317 <p> One thing to note is that Microsoft C headers (on ia64 at least) contain 318 <code>__declspec(restrict)</code>, so a <code>#define</code> of 319 <code>restrict</code> should be avoided. It might be wisest to setup a 320 <code>gmp_restrict</code>. 321 322 323 324<li> <strong>Factorial</strong> 325 326 <p> Rewrite for simplicty and speed. Work is in progress. 327 328 329<li> <strong>Binomial Coefficients</strong> 330 331 <p> Rewrite for simplicty and speed. Work is in progress. 332 333 334<li> <strong>Prime Testing</strong> 335 336 <p> GMP is not really a number theory library and probably shouldn't have 337 large amounts of code dedicated to sophisticated prime testing 338 algorithms, but basic things well-implemented would suit. Tests offering 339 certainty are probably all too big or too slow (or both!) to justify 340 inclusion in the main library. Demo programs showing some possibilities 341 would be good though. 342 343 <p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is 344 rather specific to the Miller-Rabin tests of the current implementation. 345 Better would be some sort of parameter asking perhaps for a maximum 346 chance 1/2^x of a probable prime in fact being composite. If 347 applications follow the advice that the present reps gives 1/4^reps 348 chance then perhaps such a change is unnecessary, but an explicitly 349 described 1/2^x would allow for changes in the implementation or even for 350 new proofs about the theory. 351 352 <p> <code>mpz_probab_prime_p</code> always initializes a new 353 <code>gmp_randstate_t</code> for randomized tests, which unfortunately 354 means it's not really very random and in particular always runs the same 355 tests for a given input. Perhaps a new interface could accept an rstate 356 to use, so successive tests could increase confidence in the result. 357 358 <p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the 359 trial divisions. And since the various prime factors are constants, the 360 remainder can be tested with something like 361<pre> 362#define MP_LIMB_DIVISIBLE_7_P(n) \ 363 ((n) * MODLIMB_INVERSE_7 <= MP_LIMB_T_MAX/7) 364</pre> 365 Which would help compilers that don't know how to optimize divisions by 366 constants, and is even an improvement on current gcc 3.2 code. This 367 technique works for any modulus, see Granlund and Montgomery "Division by 368 Invariant Integers" section 9. 369 370 <p> The trial divisions are done with primes generated and grouped at 371 runtime. This could instead be a table of data, with pre-calculated 372 inverses too. Storing deltas, ie. amounts to add, rather than actual 373 primes would save space. <code>udiv_qrnnd_preinv</code> style inverses 374 can be made to exist by adding dummy factors of 2 if necessary. Some 375 thought needs to be given as to how big such a table should be, based on 376 how much dividing would be profitable for what sort of size inputs. The 377 data could be shared by the perfect power testing. 378 379 <p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor 380 of N must be == 1 mod 4, saving half the work in trial dividing. (If 381 x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the 382 jacobi symbol (-1/p)=1. But also (-1/p)=(-1)^((p-1)/2), hence must have 383 p==1 mod 4.) But knowing whether sqrt(-1) mod N exists is not too easy. 384 A strong pseudoprime test can reveal one, so perhaps such a test could be 385 inserted part way though the dividing. 386 387 <p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a 388 quadratic pseudoprime test taking about 3x longer than a plain test, but 389 with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests 390 would offer only (1/4)^3 == 1/64). Such a test needs completely random 391 parameters to satisfy the theory, though single-limb values would run 392 faster. It's probably best to do at least one plain Miller-Rabin before 393 any quadratic tests, since that can identify composites in less total 394 time. 395 396 <p> Some thought needs to be given to the structure of which tests (trial 397 division, Miller-Rabin, quadratic) and how many are done, based on what 398 sort of inputs we expect, with a view to minimizing average time. 399 400 <p> It might be a good idea to break out subroutines for the various tests, 401 so that an application can combine them in ways it prefers, if sensible 402 defaults in <code>mpz_probab_prime_p</code> don't suit. In particular 403 this would let applications skip tests it knew would be unprofitable, 404 like trial dividing when an input is already known to have no small 405 factors. 406 407 <p> For small inputs, combinations of theory and explicit search make it 408 relatively easy to offer certainty. For instance numbers up to 2^32 409 could be handled with a strong pseudoprime test and table lookup. But 410 it's rather doubtful whether a smallnum prime test belongs in a bignum 411 library. Perhaps if it had other internal uses. 412 413 <p> An <code>mpz_nthprime</code> might be cute, but is almost certainly 414 impractical for anything but small n. 415 416 417<li> <strong>Intra-Library Calls</strong> 418 419 <p> On various systems, calls within libgmp still go through the PLT, TOC or 420 other mechanism, which makes the code bigger and slower than it needs to 421 be. 422 423 <p> The theory would be to have all GMP intra-library calls resolved directly 424 to the routines in the library. An application wouldn't be able to 425 replace a routine, the way it can normally, but there seems no good 426 reason to do that, in normal circumstances. 427 428 <p> The <code>visibility</code> attribute in recent gcc is good for this, 429 because it lets gcc omit unnecessary GOT pointer setups or whatever if it 430 finds all calls are local and there's no global data references. 431 Documented entrypoints would be <code>protected</code>, and purely 432 internal things not wanted by test programs or anything can be 433 <code>internal</code>. 434 435 <p> Unfortunately, on i386 it seems <code>protected</code> ends up causing 436 text segment relocations within libgmp.so, meaning the library code can't 437 be shared between processes, defeating the purpose of a shared library. 438 Perhaps this is just a gremlin in binutils (debian packaged 439 2.13.90.0.16-1). 440 441 <p> The linker can be told directly (with a link script, or options) to do 442 the same sort of thing. This doesn't change the code emitted by gcc of 443 course, but it does mean calls are resolved directly to their targets, 444 avoiding a PLT entry. 445 446 <p> Keeping symbols private to libgmp.so is probably a good thing in general 447 too, to stop anyone even attempting to access them. But some 448 undocumented things will need or want to be kept visible, for use by 449 mpfr, or the test and tune programs. Libtool has a standard option for 450 selecting public symbols (used now for libmp). 451 452 453<li> <strong>Math functions for the mpf layer</strong> 454 455 <p> Implement the functions of math.h for the GMP mpf layer! Check the book 456 "Pi and the AGM" by Borwein and Borwein for ideas how to do this. These 457 functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, 458 cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh. 459 460 <p> Note that the <a href="http://mpfr.org">mpfr</a> functions already 461 provide these functions, and that we usually recommend new programs to use 462 mpfr instead of mpf. 463</ul> 464<hr> 465 466</body> 467</html> 468 469<!-- 470Local variables: 471eval: (add-hook 'write-file-hooks 'time-stamp) 472time-stamp-start: "This file current as of " 473time-stamp-format: "%:d %3b %:y" 474time-stamp-end: "\\." 475time-stamp-line-limit: 50 476End: 477--> 478