1<appendix xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="appendix.contrib" xreflabel="Contributing"> 3<?dbhtml filename="appendix_contributing.html"?> 4 5<info><title> 6 Contributing 7 <indexterm> 8 <primary>Appendix</primary> 9 <secondary>Contributing</secondary> 10 </indexterm> 11</title> 12 <keywordset> 13 <keyword>ISO C++</keyword> 14 <keyword>library</keyword> 15 </keywordset> 16</info> 17 18 19 20<para> 21 The GNU C++ Library is part of GCC and follows the same development model, 22 so the general rules for 23 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/contribute.html">contributing 24 to GCC</link> apply. Active 25 contributors are assigned maintainership responsibility, and given 26 write access to the source repository. First-time contributors 27 should follow this procedure: 28</para> 29 30<section xml:id="contrib.list" xreflabel="Contributor Checklist"><info><title>Contributor Checklist</title></info> 31 32 33 <section xml:id="list.reading"><info><title>Reading</title></info> 34 35 36 <itemizedlist> 37 <listitem> 38 <para> 39 Get and read the relevant sections of the C++ language 40 specification. Copies of the full ISO 14882 standard are 41 available on line via the ISO mirror site for committee 42 members. Non-members, or those who have not paid for the 43 privilege of sitting on the committee and sustained their 44 two meeting commitment for voting rights, may get a copy of 45 the standard from their respective national standards 46 organization. In the USA, this national standards 47 organization is 48 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.ansi.org">ANSI</link>. 49 (And if you've already registered with them you can 50 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS%2fISO%2fIEC+14882-2012">buy the standard on-line</link>.) 51 </para> 52 </listitem> 53 54 <listitem> 55 <para> 56 The library working group bugs, and known defects, can 57 be obtained here: 58 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/">http://www.open-std.org/jtc1/sc22/wg21</link> 59 </para> 60 </listitem> 61 62 <listitem> 63 <para> 64 Peruse 65 the <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.gnu.org/prep/standards/">GNU 66 Coding Standards</link>, and chuckle when you hit the part 67 about <quote>Using Languages Other Than C</quote>. 68 </para> 69 </listitem> 70 71 <listitem> 72 <para> 73 Be familiar with the extensions that preceded these 74 general GNU rules. These style issues for libstdc++ can be 75 found in <link linkend="contrib.coding_style">Coding Style</link>. 76 </para> 77 </listitem> 78 79 <listitem> 80 <para> 81 And last but certainly not least, read the 82 library-specific information found in 83 <link linkend="appendix.porting">Porting and Maintenance</link>. 84 </para> 85 </listitem> 86 </itemizedlist> 87 88 </section> 89 <section xml:id="list.copyright"><info><title>Assignment</title></info> 90 91 <para> 92 See the <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/contribute.html#legal">legal prerequisites</link> for all GCC contributions. 93 </para> 94 95 <para> 96 Historically, the libstdc++ assignment form added the following 97 question: 98 </para> 99 100 <para> 101 <quote> 102 Which Belgian comic book character is better, Tintin or Asterix, and 103 why? 104 </quote> 105 </para> 106 107 <para> 108 While not strictly necessary, humoring the maintainers and answering 109 this question would be appreciated. 110 </para> 111 112 <para> 113 Please contact 114 Paolo Carlini at <email>paolo.carlini@oracle.com</email> 115 or 116 Jonathan Wakely at <email>jwakely+assign@redhat.com</email> 117 if you are confused about the assignment or have general licensing 118 questions. When requesting an assignment form from 119 <email>assign@gnu.org</email>, please CC the libstdc++ 120 maintainers above so that progress can be monitored. 121 </para> 122 </section> 123 124 <section xml:id="list.getting"><info><title>Getting Sources</title></info> 125 126 <para> 127 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/svnwrite.html">Getting write access 128 (look for "Write after approval")</link> 129 </para> 130 </section> 131 132 <section xml:id="list.patches"><info><title>Submitting Patches</title></info> 133 134 135 <para> 136 Every patch must have several pieces of information before it can be 137 properly evaluated. Ideally (and to ensure the fastest possible 138 response from the maintainers) it would have all of these pieces: 139 </para> 140 141 <itemizedlist> 142 <listitem> 143 <para> 144 A description of the bug and how your patch fixes this 145 bug. For new features a description of the feature and your 146 implementation. 147 </para> 148 </listitem> 149 150 <listitem> 151 <para> 152 A ChangeLog entry as plain text; see the various 153 ChangeLog files for format and content. If you are 154 using emacs as your editor, simply position the insertion 155 point at the beginning of your change and hit CX-4a to bring 156 up the appropriate ChangeLog entry. See--magic! Similar 157 functionality also exists for vi. 158 </para> 159 </listitem> 160 161 <listitem> 162 <para> 163 A testsuite submission or sample program that will 164 easily and simply show the existing error or test new 165 functionality. 166 </para> 167 </listitem> 168 169 <listitem> 170 <para> 171 The patch itself. If you are accessing the SVN 172 repository use <command>svn update; svn diff NEW</command>; 173 else, use <command>diff -cp OLD NEW</command> ... If your 174 version of diff does not support these options, then get the 175 latest version of GNU 176 diff. The <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/wiki/SvnTricks">SVN 177 Tricks</link> wiki page has information on customising the 178 output of <code>svn diff</code>. 179 </para> 180 </listitem> 181 182 <listitem> 183 <para> 184 When you have all these pieces, bundle them up in a 185 mail message and send it to libstdc++@gcc.gnu.org. All 186 patches and related discussion should be sent to the 187 libstdc++ mailing list. 188 </para> 189 </listitem> 190 </itemizedlist> 191 192 </section> 193 194</section> 195 196<section xml:id="contrib.organization" xreflabel="Source Organization"><info><title>Directory Layout and Source Conventions</title></info> 197 <?dbhtml filename="source_organization.html"?> 198 199 200 <para> 201 The unpacked source directory of libstdc++ contains the files 202 needed to create the GNU C++ Library. 203 </para> 204 205 <literallayout class="normal"> 206It has subdirectories: 207 208 doc 209 Files in HTML and text format that document usage, quirks of the 210 implementation, and contributor checklists. 211 212 include 213 All header files for the C++ library are within this directory, 214 modulo specific runtime-related files that are in the libsupc++ 215 directory. 216 217 include/std 218 Files meant to be found by #include <name> directives in 219 standard-conforming user programs. 220 221 include/c 222 Headers intended to directly include standard C headers. 223 [NB: this can be enabled via --enable-cheaders=c] 224 225 include/c_global 226 Headers intended to include standard C headers in 227 the global namespace, and put select names into the std:: 228 namespace. [NB: this is the default, and is the same as 229 --enable-cheaders=c_global] 230 231 include/c_std 232 Headers intended to include standard C headers 233 already in namespace std, and put select names into the std:: 234 namespace. [NB: this is the same as --enable-cheaders=c_std] 235 236 include/bits 237 Files included by standard headers and by other files in 238 the bits directory. 239 240 include/backward 241 Headers provided for backward compatibility, such as <iostream.h>. 242 They are not used in this library. 243 244 include/ext 245 Headers that define extensions to the standard library. No 246 standard header refers to any of them. 247 248 scripts 249 Scripts that are used during the configure, build, make, or test 250 process. 251 252 src 253 Files that are used in constructing the library, but are not 254 installed. 255 256 testsuites/[backward, demangle, ext, performance, thread, 17_* to 30_*] 257 Test programs are here, and may be used to begin to exercise the 258 library. Support for "make check" and "make check-install" is 259 complete, and runs through all the subdirectories here when this 260 command is issued from the build directory. Please note that 261 "make check" requires DejaGNU 1.4 or later to be installed. Please 262 note that "make check-script" calls the script mkcheck, which 263 requires bash, and which may need the paths to bash adjusted to 264 work properly, as /bin/bash is assumed. 265 266Other subdirectories contain variant versions of certain files 267that are meant to be copied or linked by the configure script. 268Currently these are: 269 270 config/abi 271 config/cpu 272 config/io 273 config/locale 274 config/os 275 276In addition, a subdirectory holds the convenience library libsupc++. 277 278 libsupc++ 279 Contains the runtime library for C++, including exception 280 handling and memory allocation and deallocation, RTTI, terminate 281 handlers, etc. 282 283Note that glibc also has a bits/ subdirectory. We will either 284need to be careful not to collide with names in its bits/ 285directory; or rename bits to (e.g.) cppbits/. 286 287In files throughout the system, lines marked with an "XXX" indicate 288a bug or incompletely-implemented feature. Lines marked "XXX MT" 289indicate a place that may require attention for multi-thread safety. 290 </literallayout> 291 292</section> 293 294<section xml:id="contrib.coding_style" xreflabel="Coding Style"><info><title>Coding Style</title></info> 295 <?dbhtml filename="source_code_style.html"?> 296 297 <para> 298 </para> 299 <section xml:id="coding_style.bad_identifiers"><info><title>Bad Identifiers</title></info> 300 301 <para> 302 Identifiers that conflict and should be avoided. 303 </para> 304 305 <literallayout class="normal"> 306 This is the list of names <quote>reserved to the 307 implementation</quote> that have been claimed by certain 308 compilers and system headers of interest, and should not be used 309 in the library. It will grow, of course. We generally are 310 interested in names that are not all-caps, except for those like 311 "_T" 312 313 For Solaris: 314 _B 315 _C 316 _L 317 _N 318 _P 319 _S 320 _U 321 _X 322 _E1 323 .. 324 _E24 325 326 Irix adds: 327 _A 328 _G 329 330 MS adds: 331 _T 332 333 BSD adds: 334 __used 335 __unused 336 __inline 337 _Complex 338 __istype 339 __maskrune 340 __tolower 341 __toupper 342 __wchar_t 343 __wint_t 344 _res 345 _res_ext 346 __tg_* 347 348 SPU adds: 349 __ea 350 351 For GCC: 352 353 [Note that this list is out of date. It applies to the old 354 name-mangling; in G++ 3.0 and higher a different name-mangling is 355 used. In addition, many of the bugs relating to G++ interpreting 356 these names as operators have been fixed.] 357 358 The full set of __* identifiers (combined from gcc/cp/lex.c and 359 gcc/cplus-dem.c) that are either old or new, but are definitely 360 recognized by the demangler, is: 361 362 __aa 363 __aad 364 __ad 365 __addr 366 __adv 367 __aer 368 __als 369 __alshift 370 __amd 371 __ami 372 __aml 373 __amu 374 __aor 375 __apl 376 __array 377 __ars 378 __arshift 379 __as 380 __bit_and 381 __bit_ior 382 __bit_not 383 __bit_xor 384 __call 385 __cl 386 __cm 387 __cn 388 __co 389 __component 390 __compound 391 __cond 392 __convert 393 __delete 394 __dl 395 __dv 396 __eq 397 __er 398 __ge 399 __gt 400 __indirect 401 __le 402 __ls 403 __lt 404 __max 405 __md 406 __method_call 407 __mi 408 __min 409 __minus 410 __ml 411 __mm 412 __mn 413 __mult 414 __mx 415 __ne 416 __negate 417 __new 418 __nop 419 __nt 420 __nw 421 __oo 422 __op 423 __or 424 __pl 425 __plus 426 __postdecrement 427 __postincrement 428 __pp 429 __pt 430 __rf 431 __rm 432 __rs 433 __sz 434 __trunc_div 435 __trunc_mod 436 __truth_andif 437 __truth_not 438 __truth_orif 439 __vc 440 __vd 441 __vn 442 443 SGI badnames: 444 __builtin_alloca 445 __builtin_fsqrt 446 __builtin_sqrt 447 __builtin_fabs 448 __builtin_dabs 449 __builtin_cast_f2i 450 __builtin_cast_i2f 451 __builtin_cast_d2ll 452 __builtin_cast_ll2d 453 __builtin_copy_dhi2i 454 __builtin_copy_i2dhi 455 __builtin_copy_dlo2i 456 __builtin_copy_i2dlo 457 __add_and_fetch 458 __sub_and_fetch 459 __or_and_fetch 460 __xor_and_fetch 461 __and_and_fetch 462 __nand_and_fetch 463 __mpy_and_fetch 464 __min_and_fetch 465 __max_and_fetch 466 __fetch_and_add 467 __fetch_and_sub 468 __fetch_and_or 469 __fetch_and_xor 470 __fetch_and_and 471 __fetch_and_nand 472 __fetch_and_mpy 473 __fetch_and_min 474 __fetch_and_max 475 __lock_test_and_set 476 __lock_release 477 __lock_acquire 478 __compare_and_swap 479 __synchronize 480 __high_multiply 481 __unix 482 __sgi 483 __linux__ 484 __i386__ 485 __i486__ 486 __cplusplus 487 __embedded_cplusplus 488 // long double conversion members mangled as __opr 489 // http://gcc.gnu.org/ml/libstdc++/1999-q4/msg00060.html 490 __opr 491 </literallayout> 492 </section> 493 494 <section xml:id="coding_style.example"><info><title>By Example</title></info> 495 496 <literallayout class="normal"> 497 This library is written to appropriate C++ coding standards. As such, 498 it is intended to precede the recommendations of the GNU Coding 499 Standard, which can be referenced in full here: 500 501 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.gnu.org/prep/standards/standards.html#Formatting">http://www.gnu.org/prep/standards/standards.html#Formatting</link> 502 503 The rest of this is also interesting reading, but skip the "Design 504 Advice" part. 505 506 The GCC coding conventions are here, and are also useful: 507 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/codingconventions.html">http://gcc.gnu.org/codingconventions.html</link> 508 509 In addition, because it doesn't seem to be stated explicitly anywhere 510 else, there is an 80 column source limit. 511 512 <filename>ChangeLog</filename> entries for member functions should use the 513 classname::member function name syntax as follows: 514 515<code> 5161999-04-15 Dennis Ritchie <dr@att.com> 517 518 * src/basic_file.cc (__basic_file::open): Fix thinko in 519 _G_HAVE_IO_FILE_OPEN bits. 520</code> 521 522 Notable areas of divergence from what may be previous local practice 523 (particularly for GNU C) include: 524 525 01. Pointers and references 526 <code> 527 char* p = "flop"; 528 char& c = *p; 529 -NOT- 530 char *p = "flop"; // wrong 531 char &c = *p; // wrong 532 </code> 533 534 Reason: In C++, definitions are mixed with executable code. Here, 535 <code>p</code> is being initialized, not <code>*p</code>. This is near-universal 536 practice among C++ programmers; it is normal for C hackers 537 to switch spontaneously as they gain experience. 538 539 02. Operator names and parentheses 540 <code> 541 operator==(type) 542 -NOT- 543 operator == (type) // wrong 544 </code> 545 546 Reason: The <code>==</code> is part of the function name. Separating 547 it makes the declaration look like an expression. 548 549 03. Function names and parentheses 550 <code> 551 void mangle() 552 -NOT- 553 void mangle () // wrong 554 </code> 555 556 Reason: no space before parentheses (except after a control-flow 557 keyword) is near-universal practice for C++. It identifies the 558 parentheses as the function-call operator or declarator, as 559 opposed to an expression or other overloaded use of parentheses. 560 561 04. Template function indentation 562 <code> 563 template<typename T> 564 void 565 template_function(args) 566 { } 567 -NOT- 568 template<class T> 569 void template_function(args) {}; 570 </code> 571 572 Reason: In class definitions, without indentation whitespace is 573 needed both above and below the declaration to distinguish 574 it visually from other members. (Also, re: "typename" 575 rather than "class".) <code>T</code> often could be <code>int</code>, which is 576 not a class. ("class", here, is an anachronism.) 577 578 05. Template class indentation 579 <code> 580 template<typename _CharT, typename _Traits> 581 class basic_ios : public ios_base 582 { 583 public: 584 // Types: 585 }; 586 -NOT- 587 template<class _CharT, class _Traits> 588 class basic_ios : public ios_base 589 { 590 public: 591 // Types: 592 }; 593 -NOT- 594 template<class _CharT, class _Traits> 595 class basic_ios : public ios_base 596 { 597 public: 598 // Types: 599 }; 600 </code> 601 602 06. Enumerators 603 <code> 604 enum 605 { 606 space = _ISspace, 607 print = _ISprint, 608 cntrl = _IScntrl 609 }; 610 -NOT- 611 enum { space = _ISspace, print = _ISprint, cntrl = _IScntrl }; 612 </code> 613 614 07. Member initialization lists 615 All one line, separate from class name. 616 617 <code> 618 gribble::gribble() 619 : _M_private_data(0), _M_more_stuff(0), _M_helper(0) 620 { } 621 -NOT- 622 gribble::gribble() : _M_private_data(0), _M_more_stuff(0), _M_helper(0) 623 { } 624 </code> 625 626 08. Try/Catch blocks 627 <code> 628 try 629 { 630 // 631 } 632 catch (...) 633 { 634 // 635 } 636 -NOT- 637 try { 638 // 639 } catch(...) { 640 // 641 } 642 </code> 643 644 09. Member functions declarations and definitions 645 Keywords such as extern, static, export, explicit, inline, etc 646 go on the line above the function name. Thus 647 648 <code> 649 virtual int 650 foo() 651 -NOT- 652 virtual int foo() 653 </code> 654 655 Reason: GNU coding conventions dictate return types for functions 656 are on a separate line than the function name and parameter list 657 for definitions. For C++, where we have member functions that can 658 be either inline definitions or declarations, keeping to this 659 standard allows all member function names for a given class to be 660 aligned to the same margin, increasing readability. 661 662 663 10. Invocation of member functions with "this->" 664 For non-uglified names, use <code>this->name</code> to call the function. 665 666 <code> 667 this->sync() 668 -NOT- 669 sync() 670 </code> 671 672 Reason: Koenig lookup. 673 674 11. Namespaces 675 <code> 676 namespace std 677 { 678 blah blah blah; 679 } // namespace std 680 681 -NOT- 682 683 namespace std { 684 blah blah blah; 685 } // namespace std 686 </code> 687 688 12. Spacing under protected and private in class declarations: 689 space above, none below 690 i.e. 691 692 <code> 693 public: 694 int foo; 695 696 -NOT- 697 public: 698 699 int foo; 700 </code> 701 702 13. Spacing WRT return statements. 703 no extra spacing before returns, no parenthesis 704 i.e. 705 706 <code> 707 } 708 return __ret; 709 710 -NOT- 711 } 712 713 return __ret; 714 715 -NOT- 716 717 } 718 return (__ret); 719 </code> 720 721 722 14. Location of global variables. 723 All global variables of class type, whether in the "user visible" 724 space (e.g., <code>cin</code>) or the implementation namespace, must be defined 725 as a character array with the appropriate alignment and then later 726 re-initialized to the correct value. 727 728 This is due to startup issues on certain platforms, such as AIX. 729 For more explanation and examples, see <filename>src/globals.cc</filename>. All such 730 variables should be contained in that file, for simplicity. 731 732 15. Exception abstractions 733 Use the exception abstractions found in <filename class="headerfile">functexcept.h</filename>, which allow 734 C++ programmers to use this library with <literal>-fno-exceptions</literal>. (Even if 735 that is rarely advisable, it's a necessary evil for backwards 736 compatibility.) 737 738 16. Exception error messages 739 All start with the name of the function where the exception is 740 thrown, and then (optional) descriptive text is added. Example: 741 742 <code> 743 __throw_logic_error(__N("basic_string::_S_construct NULL not valid")); 744 </code> 745 746 Reason: The verbose terminate handler prints out <code>exception::what()</code>, 747 as well as the typeinfo for the thrown exception. As this is the 748 default terminate handler, by putting location info into the 749 exception string, a very useful error message is printed out for 750 uncaught exceptions. So useful, in fact, that non-programmers can 751 give useful error messages, and programmers can intelligently 752 speculate what went wrong without even using a debugger. 753 754 17. The doxygen style guide to comments is a separate document, 755 see index. 756 757 The library currently has a mixture of GNU-C and modern C++ coding 758 styles. The GNU C usages will be combed out gradually. 759 760 Name patterns: 761 762 For nonstandard names appearing in Standard headers, we are constrained 763 to use names that begin with underscores. This is called "uglification". 764 The convention is: 765 766 Local and argument names: <literal>__[a-z].*</literal> 767 768 Examples: <code>__count __ix __s1</code> 769 770 Type names and template formal-argument names: <literal>_[A-Z][^_].*</literal> 771 772 Examples: <code>_Helper _CharT _N</code> 773 774 Member data and function names: <literal>_M_.*</literal> 775 776 Examples: <code>_M_num_elements _M_initialize ()</code> 777 778 Static data members, constants, and enumerations: <literal>_S_.*</literal> 779 780 Examples: <code>_S_max_elements _S_default_value</code> 781 782 Don't use names in the same scope that differ only in the prefix, 783 e.g. _S_top and _M_top. See BADNAMES for a list of forbidden names. 784 (The most tempting of these seem to be and "_T" and "__sz".) 785 786 Names must never have "__" internally; it would confuse name 787 unmanglers on some targets. Also, never use "__[0-9]", same reason. 788 789 -------------------------- 790 791 [BY EXAMPLE] 792 <code> 793 794 #ifndef _HEADER_ 795 #define _HEADER_ 1 796 797 namespace std 798 { 799 class gribble 800 { 801 public: 802 gribble() throw(); 803 804 gribble(const gribble&); 805 806 explicit 807 gribble(int __howmany); 808 809 gribble& 810 operator=(const gribble&); 811 812 virtual 813 ~gribble() throw (); 814 815 // Start with a capital letter, end with a period. 816 inline void 817 public_member(const char* __arg) const; 818 819 // In-class function definitions should be restricted to one-liners. 820 int 821 one_line() { return 0 } 822 823 int 824 two_lines(const char* arg) 825 { return strchr(arg, 'a'); } 826 827 inline int 828 three_lines(); // inline, but defined below. 829 830 // Note indentation. 831 template<typename _Formal_argument> 832 void 833 public_template() const throw(); 834 835 template<typename _Iterator> 836 void 837 other_template(); 838 839 private: 840 class _Helper; 841 842 int _M_private_data; 843 int _M_more_stuff; 844 _Helper* _M_helper; 845 int _M_private_function(); 846 847 enum _Enum 848 { 849 _S_one, 850 _S_two 851 }; 852 853 static void 854 _S_initialize_library(); 855 }; 856 857 // More-or-less-standard language features described by lack, not presence. 858 # ifndef _G_NO_LONGLONG 859 extern long long _G_global_with_a_good_long_name; // avoid globals! 860 # endif 861 862 // Avoid in-class inline definitions, define separately; 863 // likewise for member class definitions: 864 inline int 865 gribble::public_member() const 866 { int __local = 0; return __local; } 867 868 class gribble::_Helper 869 { 870 int _M_stuff; 871 872 friend class gribble; 873 }; 874 } 875 876 // Names beginning with "__": only for arguments and 877 // local variables; never use "__" in a type name, or 878 // within any name; never use "__[0-9]". 879 880 #endif /* _HEADER_ */ 881 882 883 namespace std 884 { 885 template<typename T> // notice: "typename", not "class", no space 886 long_return_value_type<with_many, args> 887 function_name(char* pointer, // "char *pointer" is wrong. 888 char* argument, 889 const Reference& ref) 890 { 891 // int a_local; /* wrong; see below. */ 892 if (test) 893 { 894 nested code 895 } 896 897 int a_local = 0; // declare variable at first use. 898 899 // char a, b, *p; /* wrong */ 900 char a = 'a'; 901 char b = a + 1; 902 char* c = "abc"; // each variable goes on its own line, always. 903 904 // except maybe here... 905 for (unsigned i = 0, mask = 1; mask; ++i, mask <<= 1) { 906 // ... 907 } 908 } 909 910 gribble::gribble() 911 : _M_private_data(0), _M_more_stuff(0), _M_helper(0) 912 { } 913 914 int 915 gribble::three_lines() 916 { 917 // doesn't fit in one line. 918 } 919 } // namespace std 920 </code> 921 </literallayout> 922 </section> 923</section> 924 925<section xml:id="contrib.design_notes" xreflabel="Design Notes"><info><title>Design Notes</title></info> 926 <?dbhtml filename="source_design_notes.html"?> 927 928 <para> 929 </para> 930 931 <literallayout class="normal"> 932 933 The Library 934 ----------- 935 936 This paper is covers two major areas: 937 938 - Features and policies not mentioned in the standard that 939 the quality of the library implementation depends on, including 940 extensions and "implementation-defined" features; 941 942 - Plans for required but unimplemented library features and 943 optimizations to them. 944 945 Overhead 946 -------- 947 948 The standard defines a large library, much larger than the standard 949 C library. A naive implementation would suffer substantial overhead 950 in compile time, executable size, and speed, rendering it unusable 951 in many (particularly embedded) applications. The alternative demands 952 care in construction, and some compiler support, but there is no 953 need for library subsets. 954 955 What are the sources of this overhead? There are four main causes: 956 957 - The library is specified almost entirely as templates, which 958 with current compilers must be included in-line, resulting in 959 very slow builds as tens or hundreds of thousands of lines 960 of function definitions are read for each user source file. 961 Indeed, the entire SGI STL, as well as the dos Reis valarray, 962 are provided purely as header files, largely for simplicity in 963 porting. Iostream/locale is (or will be) as large again. 964 965 - The library is very flexible, specifying a multitude of hooks 966 where users can insert their own code in place of defaults. 967 When these hooks are not used, any time and code expended to 968 support that flexibility is wasted. 969 970 - Templates are often described as causing to "code bloat". In 971 practice, this refers (when it refers to anything real) to several 972 independent processes. First, when a class template is manually 973 instantiated in its entirely, current compilers place the definitions 974 for all members in a single object file, so that a program linking 975 to one member gets definitions of all. Second, template functions 976 which do not actually depend on the template argument are, under 977 current compilers, generated anew for each instantiation, rather 978 than being shared with other instantiations. Third, some of the 979 flexibility mentioned above comes from virtual functions (both in 980 regular classes and template classes) which current linkers add 981 to the executable file even when they manifestly cannot be called. 982 983 - The library is specified to use a language feature, exceptions, 984 which in the current gcc compiler ABI imposes a run time and 985 code space cost to handle the possibility of exceptions even when 986 they are not used. Under the new ABI (accessed with -fnew-abi), 987 there is a space overhead and a small reduction in code efficiency 988 resulting from lost optimization opportunities associated with 989 non-local branches associated with exceptions. 990 991 What can be done to eliminate this overhead? A variety of coding 992 techniques, and compiler, linker and library improvements and 993 extensions may be used, as covered below. Most are not difficult, 994 and some are already implemented in varying degrees. 995 996 Overhead: Compilation Time 997 -------------------------- 998 999 Providing "ready-instantiated" template code in object code archives 1000 allows us to avoid generating and optimizing template instantiations 1001 in each compilation unit which uses them. However, the number of such 1002 instantiations that are useful to provide is limited, and anyway this 1003 is not enough, by itself, to minimize compilation time. In particular, 1004 it does not reduce time spent parsing conforming headers. 1005 1006 Quicker header parsing will depend on library extensions and compiler 1007 improvements. One approach is some variation on the techniques 1008 previously marketed as "pre-compiled headers", now standardized as 1009 support for the "export" keyword. "Exported" template definitions 1010 can be placed (once) in a "repository" -- really just a library, but 1011 of template definitions rather than object code -- to be drawn upon 1012 at link time when an instantiation is needed, rather than placed in 1013 header files to be parsed along with every compilation unit. 1014 1015 Until "export" is implemented we can put some of the lengthy template 1016 definitions in #if guards or alternative headers so that users can skip 1017 over the full definitions when they need only the ready-instantiated 1018 specializations. 1019 1020 To be precise, this means that certain headers which define 1021 templates which users normally use only for certain arguments 1022 can be instrumented to avoid exposing the template definitions 1023 to the compiler unless a macro is defined. For example, in 1024 <string>, we might have: 1025 1026 template <class _CharT, ... > class basic_string { 1027 ... // member declarations 1028 }; 1029 ... // operator declarations 1030 1031 #ifdef _STRICT_ISO_ 1032 # if _G_NO_TEMPLATE_EXPORT 1033 # include <bits/std_locale.h> // headers needed by definitions 1034 # ... 1035 # include <bits/string.tcc> // member and global template definitions. 1036 # endif 1037 #endif 1038 1039 Users who compile without specifying a strict-ISO-conforming flag 1040 would not see many of the template definitions they now see, and rely 1041 instead on ready-instantiated specializations in the library. This 1042 technique would be useful for the following substantial components: 1043 string, locale/iostreams, valarray. It would *not* be useful or 1044 usable with the following: containers, algorithms, iterators, 1045 allocator. Since these constitute a large (though decreasing) 1046 fraction of the library, the benefit the technique offers is 1047 limited. 1048 1049 The language specifies the semantics of the "export" keyword, but 1050 the gcc compiler does not yet support it. When it does, problems 1051 with large template inclusions can largely disappear, given some 1052 minor library reorganization, along with the need for the apparatus 1053 described above. 1054 1055 Overhead: Flexibility Cost 1056 -------------------------- 1057 1058 The library offers many places where users can specify operations 1059 to be performed by the library in place of defaults. Sometimes 1060 this seems to require that the library use a more-roundabout, and 1061 possibly slower, way to accomplish the default requirements than 1062 would be used otherwise. 1063 1064 The primary protection against this overhead is thorough compiler 1065 optimization, to crush out layers of inline function interfaces. 1066 Kuck & Associates has demonstrated the practicality of this kind 1067 of optimization. 1068 1069 The second line of defense against this overhead is explicit 1070 specialization. By defining helper function templates, and writing 1071 specialized code for the default case, overhead can be eliminated 1072 for that case without sacrificing flexibility. This takes full 1073 advantage of any ability of the optimizer to crush out degenerate 1074 code. 1075 1076 The library specifies many virtual functions which current linkers 1077 load even when they cannot be called. Some minor improvements to the 1078 compiler and to ld would eliminate any such overhead by simply 1079 omitting virtual functions that the complete program does not call. 1080 A prototype of this work has already been done. For targets where 1081 GNU ld is not used, a "pre-linker" could do the same job. 1082 1083 The main areas in the standard interface where user flexibility 1084 can result in overhead are: 1085 1086 - Allocators: Containers are specified to use user-definable 1087 allocator types and objects, making tuning for the container 1088 characteristics tricky. 1089 1090 - Locales: the standard specifies locale objects used to implement 1091 iostream operations, involving many virtual functions which use 1092 streambuf iterators. 1093 1094 - Algorithms and containers: these may be instantiated on any type, 1095 frequently duplicating code for identical operations. 1096 1097 - Iostreams and strings: users are permitted to use these on their 1098 own types, and specify the operations the stream must use on these 1099 types. 1100 1101 Note that these sources of overhead are _avoidable_. The techniques 1102 to avoid them are covered below. 1103 1104 Code Bloat 1105 ---------- 1106 1107 In the SGI STL, and in some other headers, many of the templates 1108 are defined "inline" -- either explicitly or by their placement 1109 in class definitions -- which should not be inline. This is a 1110 source of code bloat. Matt had remarked that he was relying on 1111 the compiler to recognize what was too big to benefit from inlining, 1112 and generate it out-of-line automatically. However, this also can 1113 result in code bloat except where the linker can eliminate the extra 1114 copies. 1115 1116 Fixing these cases will require an audit of all inline functions 1117 defined in the library to determine which merit inlining, and moving 1118 the rest out of line. This is an issue mainly in clauses 23, 25, and 1119 27. Of course it can be done incrementally, and we should generally 1120 accept patches that move large functions out of line and into ".tcc" 1121 files, which can later be pulled into a repository. Compiler/linker 1122 improvements to recognize very large inline functions and move them 1123 out-of-line, but shared among compilation units, could make this 1124 work unnecessary. 1125 1126 Pre-instantiating template specializations currently produces large 1127 amounts of dead code which bloats statically linked programs. The 1128 current state of the static library, libstdc++.a, is intolerable on 1129 this account, and will fuel further confused speculation about a need 1130 for a library "subset". A compiler improvement that treats each 1131 instantiated function as a separate object file, for linking purposes, 1132 would be one solution to this problem. An alternative would be to 1133 split up the manual instantiation files into dozens upon dozens of 1134 little files, each compiled separately, but an abortive attempt at 1135 this was done for <string> and, though it is far from complete, it 1136 is already a nuisance. A better interim solution (just until we have 1137 "export") is badly needed. 1138 1139 When building a shared library, the current compiler/linker cannot 1140 automatically generate the instantiations needed. This creates a 1141 miserable situation; it means any time something is changed in the 1142 library, before a shared library can be built someone must manually 1143 copy the declarations of all templates that are needed by other parts 1144 of the library to an "instantiation" file, and add it to the build 1145 system to be compiled and linked to the library. This process is 1146 readily automated, and should be automated as soon as possible. 1147 Users building their own shared libraries experience identical 1148 frustrations. 1149 1150 Sharing common aspects of template definitions among instantiations 1151 can radically reduce code bloat. The compiler could help a great 1152 deal here by recognizing when a function depends on nothing about 1153 a template parameter, or only on its size, and giving the resulting 1154 function a link-name "equate" that allows it to be shared with other 1155 instantiations. Implementation code could take advantage of the 1156 capability by factoring out code that does not depend on the template 1157 argument into separate functions to be merged by the compiler. 1158 1159 Until such a compiler optimization is implemented, much can be done 1160 manually (if tediously) in this direction. One such optimization is 1161 to derive class templates from non-template classes, and move as much 1162 implementation as possible into the base class. Another is to partial- 1163 specialize certain common instantiations, such as vector<T*>, to share 1164 code for instantiations on all types T. While these techniques work, 1165 they are far from the complete solution that a compiler improvement 1166 would afford. 1167 1168 Overhead: Expensive Language Features 1169 ------------------------------------- 1170 1171 The main "expensive" language feature used in the standard library 1172 is exception support, which requires compiling in cleanup code with 1173 static table data to locate it, and linking in library code to use 1174 the table. For small embedded programs the amount of such library 1175 code and table data is assumed by some to be excessive. Under the 1176 "new" ABI this perception is generally exaggerated, although in some 1177 cases it may actually be excessive. 1178 1179 To implement a library which does not use exceptions directly is 1180 not difficult given minor compiler support (to "turn off" exceptions 1181 and ignore exception constructs), and results in no great library 1182 maintenance difficulties. To be precise, given "-fno-exceptions", 1183 the compiler should treat "try" blocks as ordinary blocks, and 1184 "catch" blocks as dead code to ignore or eliminate. Compiler 1185 support is not strictly necessary, except in the case of "function 1186 try blocks"; otherwise the following macros almost suffice: 1187 1188 #define throw(X) 1189 #define try if (true) 1190 #define catch(X) else if (false) 1191 1192 However, there may be a need to use function try blocks in the 1193 library implementation, and use of macros in this way can make 1194 correct diagnostics impossible. Furthermore, use of this scheme 1195 would require the library to call a function to re-throw exceptions 1196 from a try block. Implementing the above semantics in the compiler 1197 is preferable. 1198 1199 Given the support above (however implemented) it only remains to 1200 replace code that "throws" with a call to a well-documented "handler" 1201 function in a separate compilation unit which may be replaced by 1202 the user. The main source of exceptions that would be difficult 1203 for users to avoid is memory allocation failures, but users can 1204 define their own memory allocation primitives that never throw. 1205 Otherwise, the complete list of such handlers, and which library 1206 functions may call them, would be needed for users to be able to 1207 implement the necessary substitutes. (Fortunately, they have the 1208 source code.) 1209 1210 Opportunities 1211 ------------- 1212 1213 The template capabilities of C++ offer enormous opportunities for 1214 optimizing common library operations, well beyond what would be 1215 considered "eliminating overhead". In particular, many operations 1216 done in Glibc with macros that depend on proprietary language 1217 extensions can be implemented in pristine Standard C++. For example, 1218 the chapter 25 algorithms, and even C library functions such as strchr, 1219 can be specialized for the case of static arrays of known (small) size. 1220 1221 Detailed optimization opportunities are identified below where 1222 the component where they would appear is discussed. Of course new 1223 opportunities will be identified during implementation. 1224 1225 Unimplemented Required Library Features 1226 --------------------------------------- 1227 1228 The standard specifies hundreds of components, grouped broadly by 1229 chapter. These are listed in excruciating detail in the CHECKLIST 1230 file. 1231 1232 17 general 1233 18 support 1234 19 diagnostics 1235 20 utilities 1236 21 string 1237 22 locale 1238 23 containers 1239 24 iterators 1240 25 algorithms 1241 26 numerics 1242 27 iostreams 1243 Annex D backward compatibility 1244 1245 Anyone participating in implementation of the library should obtain 1246 a copy of the standard, ISO 14882. People in the U.S. can obtain an 1247 electronic copy for US$18 from ANSI's web site. Those from other 1248 countries should visit http://www.iso.org/ to find out the location 1249 of their country's representation in ISO, in order to know who can 1250 sell them a copy. 1251 1252 The emphasis in the following sections is on unimplemented features 1253 and optimization opportunities. 1254 1255 Chapter 17 General 1256 ------------------- 1257 1258 Chapter 17 concerns overall library requirements. 1259 1260 The standard doesn't mention threads. A multi-thread (MT) extension 1261 primarily affects operators new and delete (18), allocator (20), 1262 string (21), locale (22), and iostreams (27). The common underlying 1263 support needed for this is discussed under chapter 20. 1264 1265 The standard requirements on names from the C headers create a 1266 lot of work, mostly done. Names in the C headers must be visible 1267 in the std:: and sometimes the global namespace; the names in the 1268 two scopes must refer to the same object. More stringent is that 1269 Koenig lookup implies that any types specified as defined in std:: 1270 really are defined in std::. Names optionally implemented as 1271 macros in C cannot be macros in C++. (An overview may be read at 1272 <http://www.cantrip.org/cheaders.html>). The scripts "inclosure" 1273 and "mkcshadow", and the directories shadow/ and cshadow/, are the 1274 beginning of an effort to conform in this area. 1275 1276 A correct conforming definition of C header names based on underlying 1277 C library headers, and practical linking of conforming namespaced 1278 customer code with third-party C libraries depends ultimately on 1279 an ABI change, allowing namespaced C type names to be mangled into 1280 type names as if they were global, somewhat as C function names in a 1281 namespace, or C++ global variable names, are left unmangled. Perhaps 1282 another "extern" mode, such as 'extern "C-global"' would be an 1283 appropriate place for such type definitions. Such a type would 1284 affect mangling as follows: 1285 1286 namespace A { 1287 struct X {}; 1288 extern "C-global" { // or maybe just 'extern "C"' 1289 struct Y {}; 1290 }; 1291 } 1292 void f(A::X*); // mangles to f__FPQ21A1X 1293 void f(A::Y*); // mangles to f__FP1Y 1294 1295 (It may be that this is really the appropriate semantics for regular 1296 'extern "C"', and 'extern "C-global"', as an extension, would not be 1297 necessary.) This would allow functions declared in non-standard C headers 1298 (and thus fixable by neither us nor users) to link properly with functions 1299 declared using C types defined in properly-namespaced headers. The 1300 problem this solves is that C headers (which C++ programmers do persist 1301 in using) frequently forward-declare C struct tags without including 1302 the header where the type is defined, as in 1303 1304 struct tm; 1305 void munge(tm*); 1306 1307 Without some compiler accommodation, munge cannot be called by correct 1308 C++ code using a pointer to a correctly-scoped tm* value. 1309 1310 The current C headers use the preprocessor extension "#include_next", 1311 which the compiler complains about when run "-pedantic". 1312 (Incidentally, it appears that "-fpedantic" is currently ignored, 1313 probably a bug.) The solution in the C compiler is to use 1314 "-isystem" rather than "-I", but unfortunately in g++ this seems 1315 also to wrap the whole header in an 'extern "C"' block, so it's 1316 unusable for C++ headers. The correct solution appears to be to 1317 allow the various special include-directory options, if not given 1318 an argument, to affect subsequent include-directory options additively, 1319 so that if one said 1320 1321 -pedantic -iprefix $(prefix) \ 1322 -idirafter -ino-pedantic -ino-extern-c -iwithprefix -I g++-v3 \ 1323 -iwithprefix -I g++-v3/ext 1324 1325 the compiler would search $(prefix)/g++-v3 and not report 1326 pedantic warnings for files found there, but treat files in 1327 $(prefix)/g++-v3/ext pedantically. (The undocumented semantics 1328 of "-isystem" in g++ stink. Can they be rescinded? If not it 1329 must be replaced with something more rationally behaved.) 1330 1331 All the C headers need the treatment above; in the standard these 1332 headers are mentioned in various clauses. Below, I have only 1333 mentioned those that present interesting implementation issues. 1334 1335 The components identified as "mostly complete", below, have not been 1336 audited for conformance. In many cases where the library passes 1337 conformance tests we have non-conforming extensions that must be 1338 wrapped in #if guards for "pedantic" use, and in some cases renamed 1339 in a conforming way for continued use in the implementation regardless 1340 of conformance flags. 1341 1342 The STL portion of the library still depends on a header 1343 stl/bits/stl_config.h full of #ifdef clauses. This apparatus 1344 should be replaced with autoconf/automake machinery. 1345 1346 The SGI STL defines a type_traits<> template, specialized for 1347 many types in their code including the built-in numeric and 1348 pointer types and some library types, to direct optimizations of 1349 standard functions. The SGI compiler has been extended to generate 1350 specializations of this template automatically for user types, 1351 so that use of STL templates on user types can take advantage of 1352 these optimizations. Specializations for other, non-STL, types 1353 would make more optimizations possible, but extending the gcc 1354 compiler in the same way would be much better. Probably the next 1355 round of standardization will ratify this, but probably with 1356 changes, so it probably should be renamed to place it in the 1357 implementation namespace. 1358 1359 The SGI STL also defines a large number of extensions visible in 1360 standard headers. (Other extensions that appear in separate headers 1361 have been sequestered in subdirectories ext/ and backward/.) All 1362 these extensions should be moved to other headers where possible, 1363 and in any case wrapped in a namespace (not std!), and (where kept 1364 in a standard header) girded about with macro guards. Some cannot be 1365 moved out of standard headers because they are used to implement 1366 standard features. The canonical method for accommodating these 1367 is to use a protected name, aliased in macro guards to a user-space 1368 name. Unfortunately C++ offers no satisfactory template typedef 1369 mechanism, so very ad-hoc and unsatisfactory aliasing must be used 1370 instead. 1371 1372 Implementation of a template typedef mechanism should have the highest 1373 priority among possible extensions, on the same level as implementation 1374 of the template "export" feature. 1375 1376 Chapter 18 Language support 1377 ---------------------------- 1378 1379 Headers: <limits> <new> <typeinfo> <exception> 1380 C headers: <cstddef> <climits> <cfloat> <cstdarg> <csetjmp> 1381 <ctime> <csignal> <cstdlib> (also 21, 25, 26) 1382 1383 This defines the built-in exceptions, rtti, numeric_limits<>, 1384 operator new and delete. Much of this is provided by the 1385 compiler in its static runtime library. 1386 1387 Work to do includes defining numeric_limits<> specializations in 1388 separate files for all target architectures. Values for integer types 1389 except for bool and wchar_t are readily obtained from the C header 1390 <limits.h>, but values for the remaining numeric types (bool, wchar_t, 1391 float, double, long double) must be entered manually. This is 1392 largely dog work except for those members whose values are not 1393 easily deduced from available documentation. Also, this involves 1394 some work in target configuration to identify the correct choice of 1395 file to build against and to install. 1396 1397 The definitions of the various operators new and delete must be 1398 made thread-safe, which depends on a portable exclusion mechanism, 1399 discussed under chapter 20. Of course there is always plenty of 1400 room for improvements to the speed of operators new and delete. 1401 1402 <cstdarg>, in Glibc, defines some macros that gcc does not allow to 1403 be wrapped into an inline function. Probably this header will demand 1404 attention whenever a new target is chosen. The functions atexit(), 1405 exit(), and abort() in cstdlib have different semantics in C++, so 1406 must be re-implemented for C++. 1407 1408 Chapter 19 Diagnostics 1409 ----------------------- 1410 1411 Headers: <stdexcept> 1412 C headers: <cassert> <cerrno> 1413 1414 This defines the standard exception objects, which are "mostly complete". 1415 Cygnus has a version, and now SGI provides a slightly different one. 1416 It makes little difference which we use. 1417 1418 The C global name "errno", which C allows to be a variable or a macro, 1419 is required in C++ to be a macro. For MT it must typically result in 1420 a function call. 1421 1422 Chapter 20 Utilities 1423 --------------------- 1424 Headers: <utility> <functional> <memory> 1425 C header: <ctime> (also in 18) 1426 1427 SGI STL provides "mostly complete" versions of all the components 1428 defined in this chapter. However, the auto_ptr<> implementation 1429 is known to be wrong. Furthermore, the standard definition of it 1430 is known to be unimplementable as written. A minor change to the 1431 standard would fix it, and auto_ptr<> should be adjusted to match. 1432 1433 Multi-threading affects the allocator implementation, and there must 1434 be configuration/installation choices for different users' MT 1435 requirements. Anyway, users will want to tune allocator options 1436 to support different target conditions, MT or no. 1437 1438 The primitives used for MT implementation should be exposed, as an 1439 extension, for users' own work. We need cross-CPU "mutex" support, 1440 multi-processor shared-memory atomic integer operations, and single- 1441 processor uninterruptible integer operations, and all three configurable 1442 to be stubbed out for non-MT use, or to use an appropriately-loaded 1443 dynamic library for the actual runtime environment, or statically 1444 compiled in for cases where the target architecture is known. 1445 1446 Chapter 21 String 1447 ------------------ 1448 Headers: <string> 1449 C headers: <cctype> <cwctype> <cstring> <cwchar> (also in 27) 1450 <cstdlib> (also in 18, 25, 26) 1451 1452 We have "mostly-complete" char_traits<> implementations. Many of the 1453 char_traits<char> operations might be optimized further using existing 1454 proprietary language extensions. 1455 1456 We have a "mostly-complete" basic_string<> implementation. The work 1457 to manually instantiate char and wchar_t specializations in object 1458 files to improve link-time behavior is extremely unsatisfactory, 1459 literally tripling library-build time with no commensurate improvement 1460 in static program link sizes. It must be redone. (Similar work is 1461 needed for some components in clauses 22 and 27.) 1462 1463 Other work needed for strings is MT-safety, as discussed under the 1464 chapter 20 heading. 1465 1466 The standard C type mbstate_t from <cwchar> and used in char_traits<> 1467 must be different in C++ than in C, because in C++ the default constructor 1468 value mbstate_t() must be the "base" or "ground" sequence state. 1469 (According to the likely resolution of a recently raised Core issue, 1470 this may become unnecessary. However, there are other reasons to 1471 use a state type not as limited as whatever the C library provides.) 1472 If we might want to provide conversions from (e.g.) internally- 1473 represented EUC-wide to externally-represented Unicode, or vice- 1474 versa, the mbstate_t we choose will need to be more accommodating 1475 than what might be provided by an underlying C library. 1476 1477 There remain some basic_string template-member functions which do 1478 not overload properly with their non-template brethren. The infamous 1479 hack akin to what was done in vector<> is needed, to conform to 1480 23.1.1 para 10. The CHECKLIST items for basic_string marked 'X', 1481 or incomplete, are so marked for this reason. 1482 1483 Replacing the string iterators, which currently are simple character 1484 pointers, with class objects would greatly increase the safety of the 1485 client interface, and also permit a "debug" mode in which range, 1486 ownership, and validity are rigorously checked. The current use of 1487 raw pointers as string iterators is evil. vector<> iterators need the 1488 same treatment. Note that the current implementation freely mixes 1489 pointers and iterators, and that must be fixed before safer iterators 1490 can be introduced. 1491 1492 Some of the functions in <cstring> are different from the C version. 1493 generally overloaded on const and non-const argument pointers. For 1494 example, in <cstring> strchr is overloaded. The functions isupper 1495 etc. in <cctype> typically implemented as macros in C are functions 1496 in C++, because they are overloaded with others of the same name 1497 defined in <locale>. 1498 1499 Many of the functions required in <cwctype> and <cwchar> cannot be 1500 implemented using underlying C facilities on intended targets because 1501 such facilities only partly exist. 1502 1503 Chapter 22 Locale 1504 ------------------ 1505 Headers: <locale> 1506 C headers: <clocale> 1507 1508 We have a "mostly complete" class locale, with the exception of 1509 code for constructing, and handling the names of, named locales. 1510 The ways that locales are named (particularly when categories 1511 (e.g. LC_TIME, LC_COLLATE) are different) varies among all target 1512 environments. This code must be written in various versions and 1513 chosen by configuration parameters. 1514 1515 Members of many of the facets defined in <locale> are stubs. Generally, 1516 there are two sets of facets: the base class facets (which are supposed 1517 to implement the "C" locale) and the "byname" facets, which are supposed 1518 to read files to determine their behavior. The base ctype<>, collate<>, 1519 and numpunct<> facets are "mostly complete", except that the table of 1520 bitmask values used for "is" operations, and corresponding mask values, 1521 are still defined in libio and just included/linked. (We will need to 1522 implement these tables independently, soon, but should take advantage 1523 of libio where possible.) The num_put<>::put members for integer types 1524 are "mostly complete". 1525 1526 A complete list of what has and has not been implemented may be 1527 found in CHECKLIST. However, note that the current definition of 1528 codecvt<wchar_t,char,mbstate_t> is wrong. It should simply write 1529 out the raw bytes representing the wide characters, rather than 1530 trying to convert each to a corresponding single "char" value. 1531 1532 Some of the facets are more important than others. Specifically, 1533 the members of ctype<>, numpunct<>, num_put<>, and num_get<> facets 1534 are used by other library facilities defined in <string>, <istream>, 1535 and <ostream>, and the codecvt<> facet is used by basic_filebuf<> 1536 in <fstream>, so a conforming iostream implementation depends on 1537 these. 1538 1539 The "long long" type eventually must be supported, but code mentioning 1540 it should be wrapped in #if guards to allow pedantic-mode compiling. 1541 1542 Performance of num_put<> and num_get<> depend critically on 1543 caching computed values in ios_base objects, and on extensions 1544 to the interface with streambufs. 1545 1546 Specifically: retrieving a copy of the locale object, extracting 1547 the needed facets, and gathering data from them, for each call to 1548 (e.g.) operator<< would be prohibitively slow. To cache format 1549 data for use by num_put<> and num_get<> we have a _Format_cache<> 1550 object stored in the ios_base::pword() array. This is constructed 1551 and initialized lazily, and is organized purely for utility. It 1552 is discarded when a new locale with different facets is imbued. 1553 1554 Using only the public interfaces of the iterator arguments to the 1555 facet functions would limit performance by forbidding "vector-style" 1556 character operations. The streambuf iterator optimizations are 1557 described under chapter 24, but facets can also bypass the streambuf 1558 iterators via explicit specializations and operate directly on the 1559 streambufs, and use extended interfaces to get direct access to the 1560 streambuf internal buffer arrays. These extensions are mentioned 1561 under chapter 27. These optimizations are particularly important 1562 for input parsing. 1563 1564 Unused virtual members of locale facets can be omitted, as mentioned 1565 above, by a smart linker. 1566 1567 Chapter 23 Containers 1568 ---------------------- 1569 Headers: <deque> <list> <queue> <stack> <vector> <map> <set> <bitset> 1570 1571 All the components in chapter 23 are implemented in the SGI STL. 1572 They are "mostly complete"; they include a large number of 1573 nonconforming extensions which must be wrapped. Some of these 1574 are used internally and must be renamed or duplicated. 1575 1576 The SGI components are optimized for large-memory environments. For 1577 embedded targets, different criteria might be more appropriate. Users 1578 will want to be able to tune this behavior. We should provide 1579 ways for users to compile the library with different memory usage 1580 characteristics. 1581 1582 A lot more work is needed on factoring out common code from different 1583 specializations to reduce code size here and in chapter 25. The 1584 easiest fix for this would be a compiler/ABI improvement that allows 1585 the compiler to recognize when a specialization depends only on the 1586 size (or other gross quality) of a template argument, and allow the 1587 linker to share the code with similar specializations. In its 1588 absence, many of the algorithms and containers can be partial- 1589 specialized, at least for the case of pointers, but this only solves 1590 a small part of the problem. Use of a type_traits-style template 1591 allows a few more optimization opportunities, more if the compiler 1592 can generate the specializations automatically. 1593 1594 As an optimization, containers can specialize on the default allocator 1595 and bypass it, or take advantage of details of its implementation 1596 after it has been improved upon. 1597 1598 Replacing the vector iterators, which currently are simple element 1599 pointers, with class objects would greatly increase the safety of the 1600 client interface, and also permit a "debug" mode in which range, 1601 ownership, and validity are rigorously checked. The current use of 1602 pointers for iterators is evil. 1603 1604 As mentioned for chapter 24, the deque iterator is a good example of 1605 an opportunity to implement a "staged" iterator that would benefit 1606 from specializations of some algorithms. 1607 1608 Chapter 24 Iterators 1609 --------------------- 1610 Headers: <iterator> 1611 1612 Standard iterators are "mostly complete", with the exception of 1613 the stream iterators, which are not yet templatized on the 1614 stream type. Also, the base class template iterator<> appears 1615 to be wrong, so everything derived from it must also be wrong, 1616 currently. 1617 1618 The streambuf iterators (currently located in stl/bits/std_iterator.h, 1619 but should be under bits/) can be rewritten to take advantage of 1620 friendship with the streambuf implementation. 1621 1622 Matt Austern has identified opportunities where certain iterator 1623 types, particularly including streambuf iterators and deque 1624 iterators, have a "two-stage" quality, such that an intermediate 1625 limit can be checked much more quickly than the true limit on 1626 range operations. If identified with a member of iterator_traits, 1627 algorithms may be specialized for this case. Of course the 1628 iterators that have this quality can be identified by specializing 1629 a traits class. 1630 1631 Many of the algorithms must be specialized for the streambuf 1632 iterators, to take advantage of block-mode operations, in order 1633 to allow iostream/locale operations' performance not to suffer. 1634 It may be that they could be treated as staged iterators and 1635 take advantage of those optimizations. 1636 1637 Chapter 25 Algorithms 1638 ---------------------- 1639 Headers: <algorithm> 1640 C headers: <cstdlib> (also in 18, 21, 26)) 1641 1642 The algorithms are "mostly complete". As mentioned above, they 1643 are optimized for speed at the expense of code and data size. 1644 1645 Specializations of many of the algorithms for non-STL types would 1646 give performance improvements, but we must use great care not to 1647 interfere with fragile template overloading semantics for the 1648 standard interfaces. Conventionally the standard function template 1649 interface is an inline which delegates to a non-standard function 1650 which is then overloaded (this is already done in many places in 1651 the library). Particularly appealing opportunities for the sake of 1652 iostream performance are for copy and find applied to streambuf 1653 iterators or (as noted elsewhere) for staged iterators, of which 1654 the streambuf iterators are a good example. 1655 1656 The bsearch and qsort functions cannot be overloaded properly as 1657 required by the standard because gcc does not yet allow overloading 1658 on the extern-"C"-ness of a function pointer. 1659 1660 Chapter 26 Numerics 1661 -------------------- 1662 Headers: <complex> <valarray> <numeric> 1663 C headers: <cmath>, <cstdlib> (also 18, 21, 25) 1664 1665 Numeric components: Gabriel dos Reis's valarray, Drepper's complex, 1666 and the few algorithms from the STL are "mostly done". Of course 1667 optimization opportunities abound for the numerically literate. It 1668 is not clear whether the valarray implementation really conforms 1669 fully, in the assumptions it makes about aliasing (and lack thereof) 1670 in its arguments. 1671 1672 The C div() and ldiv() functions are interesting, because they are the 1673 only case where a C library function returns a class object by value. 1674 Since the C++ type div_t must be different from the underlying C type 1675 (which is in the wrong namespace) the underlying functions div() and 1676 ldiv() cannot be re-used efficiently. Fortunately they are trivial to 1677 re-implement. 1678 1679 Chapter 27 Iostreams 1680 --------------------- 1681 Headers: <iosfwd> <streambuf> <ios> <ostream> <istream> <iostream> 1682 <iomanip> <sstream> <fstream> 1683 C headers: <cstdio> <cwchar> (also in 21) 1684 1685 Iostream is currently in a very incomplete state. <iosfwd>, <iomanip>, 1686 ios_base, and basic_ios<> are "mostly complete". basic_streambuf<> and 1687 basic_ostream<> are well along, but basic_istream<> has had little work 1688 done. The standard stream objects, <sstream> and <fstream> have been 1689 started; basic_filebuf<> "write" functions have been implemented just 1690 enough to do "hello, world". 1691 1692 Most of the istream and ostream operators << and >> (with the exception 1693 of the op<<(integer) ones) have not been changed to use locale primitives, 1694 sentry objects, or char_traits members. 1695 1696 All these templates should be manually instantiated for char and 1697 wchar_t in a way that links only used members into user programs. 1698 1699 Streambuf is fertile ground for optimization extensions. An extended 1700 interface giving iterator access to its internal buffer would be very 1701 useful for other library components. 1702 1703 Iostream operations (primarily operators << and >>) can take advantage 1704 of the case where user code has not specified a locale, and bypass locale 1705 operations entirely. The current implementation of op<</num_put<>::put, 1706 for the integer types, demonstrates how they can cache encoding details 1707 from the locale on each operation. There is lots more room for 1708 optimization in this area. 1709 1710 The definition of the relationship between the standard streams 1711 cout et al. and stdout et al. requires something like a "stdiobuf". 1712 The SGI solution of using double-indirection to actually use a 1713 stdio FILE object for buffering is unsatisfactory, because it 1714 interferes with peephole loop optimizations. 1715 1716 The <sstream> header work has begun. stringbuf can benefit from 1717 friendship with basic_string<> and basic_string<>::_Rep to use 1718 those objects directly as buffers, and avoid allocating and making 1719 copies. 1720 1721 The basic_filebuf<> template is a complex beast. It is specified to 1722 use the locale facet codecvt<> to translate characters between native 1723 files and the locale character encoding. In general this involves 1724 two buffers, one of "char" representing the file and another of 1725 "char_type", for the stream, with codecvt<> translating. The process 1726 is complicated by the variable-length nature of the translation, and 1727 the need to seek to corresponding places in the two representations. 1728 For the case of basic_filebuf<char>, when no translation is needed, 1729 a single buffer suffices. A specialized filebuf can be used to reduce 1730 code space overhead when no locale has been imbued. Matt Austern's 1731 work at SGI will be useful, perhaps directly as a source of code, or 1732 at least as an example to draw on. 1733 1734 Filebuf, almost uniquely (cf. operator new), depends heavily on 1735 underlying environmental facilities. In current releases iostream 1736 depends fairly heavily on libio constant definitions, but it should 1737 be made independent. It also depends on operating system primitives 1738 for file operations. There is immense room for optimizations using 1739 (e.g.) mmap for reading. The shadow/ directory wraps, besides the 1740 standard C headers, the libio.h and unistd.h headers, for use mainly 1741 by filebuf. These wrappings have not been completed, though there 1742 is scaffolding in place. 1743 1744 The encapsulation of certain C header <cstdio> names presents an 1745 interesting problem. It is possible to define an inline std::fprintf() 1746 implemented in terms of the 'extern "C"' vfprintf(), but there is no 1747 standard vfscanf() to use to implement std::fscanf(). It appears that 1748 vfscanf but be re-implemented in C++ for targets where no vfscanf 1749 extension has been defined. This is interesting in that it seems 1750 to be the only significant case in the C library where this kind of 1751 rewriting is necessary. (Of course Glibc provides the vfscanf() 1752 extension.) (The functions related to exit() must be rewritten 1753 for other reasons.) 1754 1755 1756 Annex D 1757 ------- 1758 Headers: <strstream> 1759 1760 Annex D defines many non-library features, and many minor 1761 modifications to various headers, and a complete header. 1762 It is "mostly done", except that the libstdc++-2 <strstream> 1763 header has not been adopted into the library, or checked to 1764 verify that it matches the draft in those details that were 1765 clarified by the committee. Certainly it must at least be 1766 moved into the std namespace. 1767 1768 We still need to wrap all the deprecated features in #if guards 1769 so that pedantic compile modes can detect their use. 1770 1771 Nonstandard Extensions 1772 ---------------------- 1773 Headers: <iostream.h> <strstream.h> <hash> <rbtree> 1774 <pthread_alloc> <stdiobuf> (etc.) 1775 1776 User code has come to depend on a variety of nonstandard components 1777 that we must not omit. Much of this code can be adopted from 1778 libstdc++-v2 or from the SGI STL. This particularly includes 1779 <iostream.h>, <strstream.h>, and various SGI extensions such 1780 as <hash_map.h>. Many of these are already placed in the 1781 subdirectories ext/ and backward/. (Note that it is better to 1782 include them via "<backward/hash_map.h>" or "<ext/hash_map>" than 1783 to search the subdirectory itself via a "-I" directive. 1784 </literallayout> 1785</section> 1786 1787</appendix> 1788