1<appendix xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="appendix.contrib" xreflabel="Contributing"> 3<?dbhtml filename="appendix_contributing.html"?> 4 5<info><title> 6 Contributing 7 <indexterm> 8 <primary>Appendix</primary> 9 <secondary>Contributing</secondary> 10 </indexterm> 11</title> 12 <keywordset> 13 <keyword>ISO C++</keyword> 14 <keyword>library</keyword> 15 </keywordset> 16</info> 17 18 19 20<para> 21 The GNU C++ Library follows an open development model. Active 22 contributors are assigned maintainer-ship responsibility, and given 23 write access to the source repository. First time contributors 24 should follow this procedure: 25</para> 26 27<section xml:id="contrib.list" xreflabel="Contributor Checklist"><info><title>Contributor Checklist</title></info> 28 29 30 <section xml:id="list.reading"><info><title>Reading</title></info> 31 32 33 <itemizedlist> 34 <listitem> 35 <para> 36 Get and read the relevant sections of the C++ language 37 specification. Copies of the full ISO 14882 standard are 38 available on line via the ISO mirror site for committee 39 members. Non-members, or those who have not paid for the 40 privilege of sitting on the committee and sustained their 41 two meeting commitment for voting rights, may get a copy of 42 the standard from their respective national standards 43 organization. In the USA, this national standards 44 organization is 45 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.ansi.org">ANSI</link>. 46 (And if you've already registered with them you can 47 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS%2fISO%2fIEC+14882-2003">buy the standard on-line</link>.) 48 </para> 49 </listitem> 50 51 <listitem> 52 <para> 53 The library working group bugs, and known defects, can 54 be obtained here: 55 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/">http://www.open-std.org/jtc1/sc22/wg21</link> 56 </para> 57 </listitem> 58 59 <listitem> 60 <para> 61 The newsgroup dedicated to standardization issues is 62 comp.std.c++: the 63 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.comeaucomputing.com/csc/faq.html">FAQ</link> 64 for this group is quite useful. 65 </para> 66 </listitem> 67 68 <listitem> 69 <para> 70 Peruse 71 the <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.gnu.org/prep/standards/">GNU 72 Coding Standards</link>, and chuckle when you hit the part 73 about <quote>Using Languages Other Than C</quote>. 74 </para> 75 </listitem> 76 77 <listitem> 78 <para> 79 Be familiar with the extensions that preceded these 80 general GNU rules. These style issues for libstdc++ can be 81 found in <link linkend="contrib.coding_style">Coding Style</link>. 82 </para> 83 </listitem> 84 85 <listitem> 86 <para> 87 And last but certainly not least, read the 88 library-specific information found in 89 <link linkend="appendix.porting">Porting and Maintenance</link>. 90 </para> 91 </listitem> 92 </itemizedlist> 93 94 </section> 95 <section xml:id="list.copyright"><info><title>Assignment</title></info> 96 97 <para> 98 Small changes can be accepted without a copyright assignment form on 99 file. New code and additions to the library need completed copyright 100 assignment form on file at the FSF. Note: your employer may be required 101 to fill out appropriate disclaimer forms as well. 102 </para> 103 104 <para> 105 Historically, the libstdc++ assignment form added the following 106 question: 107 </para> 108 109 <para> 110 <quote> 111 Which Belgian comic book character is better, Tintin or Asterix, and 112 why? 113 </quote> 114 </para> 115 116 <para> 117 While not strictly necessary, humoring the maintainers and answering 118 this question would be appreciated. 119 </para> 120 121 <para> 122 For more information about getting a copyright assignment, please see 123 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.gnu.org/prep/maintain/html_node/Legal-Matters.html">Legal 124 Matters</link>. 125 </para> 126 127 <para> 128 Please contact Benjamin Kosnik at 129 <email>bkoz+assign@redhat.com</email> if you are confused 130 about the assignment or have general licensing questions. When 131 requesting an assignment form from 132 <email>mailto:assign@gnu.org</email>, please cc the libstdc++ 133 maintainer above so that progress can be monitored. 134 </para> 135 </section> 136 137 <section xml:id="list.getting"><info><title>Getting Sources</title></info> 138 139 <para> 140 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/svnwrite.html">Getting write access 141 (look for "Write after approval")</link> 142 </para> 143 </section> 144 145 <section xml:id="list.patches"><info><title>Submitting Patches</title></info> 146 147 148 <para> 149 Every patch must have several pieces of information before it can be 150 properly evaluated. Ideally (and to ensure the fastest possible 151 response from the maintainers) it would have all of these pieces: 152 </para> 153 154 <itemizedlist> 155 <listitem> 156 <para> 157 A description of the bug and how your patch fixes this 158 bug. For new features a description of the feature and your 159 implementation. 160 </para> 161 </listitem> 162 163 <listitem> 164 <para> 165 A ChangeLog entry as plain text; see the various 166 ChangeLog files for format and content. If you are 167 using emacs as your editor, simply position the insertion 168 point at the beginning of your change and hit CX-4a to bring 169 up the appropriate ChangeLog entry. See--magic! Similar 170 functionality also exists for vi. 171 </para> 172 </listitem> 173 174 <listitem> 175 <para> 176 A testsuite submission or sample program that will 177 easily and simply show the existing error or test new 178 functionality. 179 </para> 180 </listitem> 181 182 <listitem> 183 <para> 184 The patch itself. If you are accessing the SVN 185 repository use <command>svn update; svn diff NEW</command>; 186 else, use <command>diff -cp OLD NEW</command> ... If your 187 version of diff does not support these options, then get the 188 latest version of GNU 189 diff. The <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/wiki/SvnTricks">SVN 190 Tricks</link> wiki page has information on customising the 191 output of <code>svn diff</code>. 192 </para> 193 </listitem> 194 195 <listitem> 196 <para> 197 When you have all these pieces, bundle them up in a 198 mail message and send it to libstdc++@gcc.gnu.org. All 199 patches and related discussion should be sent to the 200 libstdc++ mailing list. 201 </para> 202 </listitem> 203 </itemizedlist> 204 205 </section> 206 207</section> 208 209<section xml:id="contrib.organization" xreflabel="Source Organization"><info><title>Directory Layout and Source Conventions</title></info> 210 <?dbhtml filename="source_organization.html"?> 211 212 213 <para> 214 The unpacked source directory of libstdc++ contains the files 215 needed to create the GNU C++ Library. 216 </para> 217 218 <literallayout class="normal"> 219It has subdirectories: 220 221 doc 222 Files in HTML and text format that document usage, quirks of the 223 implementation, and contributor checklists. 224 225 include 226 All header files for the C++ library are within this directory, 227 modulo specific runtime-related files that are in the libsupc++ 228 directory. 229 230 include/std 231 Files meant to be found by #include <name> directives in 232 standard-conforming user programs. 233 234 include/c 235 Headers intended to directly include standard C headers. 236 [NB: this can be enabled via --enable-cheaders=c] 237 238 include/c_global 239 Headers intended to include standard C headers in 240 the global namespace, and put select names into the std:: 241 namespace. [NB: this is the default, and is the same as 242 --enable-cheaders=c_global] 243 244 include/c_std 245 Headers intended to include standard C headers 246 already in namespace std, and put select names into the std:: 247 namespace. [NB: this is the same as --enable-cheaders=c_std] 248 249 include/bits 250 Files included by standard headers and by other files in 251 the bits directory. 252 253 include/backward 254 Headers provided for backward compatibility, such as <iostream.h>. 255 They are not used in this library. 256 257 include/ext 258 Headers that define extensions to the standard library. No 259 standard header refers to any of them. 260 261 scripts 262 Scripts that are used during the configure, build, make, or test 263 process. 264 265 src 266 Files that are used in constructing the library, but are not 267 installed. 268 269 testsuites/[backward, demangle, ext, performance, thread, 17_* to 30_*] 270 Test programs are here, and may be used to begin to exercise the 271 library. Support for "make check" and "make check-install" is 272 complete, and runs through all the subdirectories here when this 273 command is issued from the build directory. Please note that 274 "make check" requires DejaGNU 1.4 or later to be installed. Please 275 note that "make check-script" calls the script mkcheck, which 276 requires bash, and which may need the paths to bash adjusted to 277 work properly, as /bin/bash is assumed. 278 279Other subdirectories contain variant versions of certain files 280that are meant to be copied or linked by the configure script. 281Currently these are: 282 283 config/abi 284 config/cpu 285 config/io 286 config/locale 287 config/os 288 289In addition, a subdirectory holds the convenience library libsupc++. 290 291 libsupc++ 292 Contains the runtime library for C++, including exception 293 handling and memory allocation and deallocation, RTTI, terminate 294 handlers, etc. 295 296Note that glibc also has a bits/ subdirectory. We will either 297need to be careful not to collide with names in its bits/ 298directory; or rename bits to (e.g.) cppbits/. 299 300In files throughout the system, lines marked with an "XXX" indicate 301a bug or incompletely-implemented feature. Lines marked "XXX MT" 302indicate a place that may require attention for multi-thread safety. 303 </literallayout> 304 305</section> 306 307<section xml:id="contrib.coding_style" xreflabel="Coding Style"><info><title>Coding Style</title></info> 308 <?dbhtml filename="source_code_style.html"?> 309 310 <para> 311 </para> 312 <section xml:id="coding_style.bad_identifiers"><info><title>Bad Identifiers</title></info> 313 314 <para> 315 Identifiers that conflict and should be avoided. 316 </para> 317 318 <literallayout class="normal"> 319 This is the list of names <quote>reserved to the 320 implementation</quote> that have been claimed by certain 321 compilers and system headers of interest, and should not be used 322 in the library. It will grow, of course. We generally are 323 interested in names that are not all-caps, except for those like 324 "_T" 325 326 For Solaris: 327 _B 328 _C 329 _L 330 _N 331 _P 332 _S 333 _U 334 _X 335 _E1 336 .. 337 _E24 338 339 Irix adds: 340 _A 341 _G 342 343 MS adds: 344 _T 345 346 BSD adds: 347 __used 348 __unused 349 __inline 350 _Complex 351 __istype 352 __maskrune 353 __tolower 354 __toupper 355 __wchar_t 356 __wint_t 357 _res 358 _res_ext 359 __tg_* 360 361 SPU adds: 362 __ea 363 364 For GCC: 365 366 [Note that this list is out of date. It applies to the old 367 name-mangling; in G++ 3.0 and higher a different name-mangling is 368 used. In addition, many of the bugs relating to G++ interpreting 369 these names as operators have been fixed.] 370 371 The full set of __* identifiers (combined from gcc/cp/lex.c and 372 gcc/cplus-dem.c) that are either old or new, but are definitely 373 recognized by the demangler, is: 374 375 __aa 376 __aad 377 __ad 378 __addr 379 __adv 380 __aer 381 __als 382 __alshift 383 __amd 384 __ami 385 __aml 386 __amu 387 __aor 388 __apl 389 __array 390 __ars 391 __arshift 392 __as 393 __bit_and 394 __bit_ior 395 __bit_not 396 __bit_xor 397 __call 398 __cl 399 __cm 400 __cn 401 __co 402 __component 403 __compound 404 __cond 405 __convert 406 __delete 407 __dl 408 __dv 409 __eq 410 __er 411 __ge 412 __gt 413 __indirect 414 __le 415 __ls 416 __lt 417 __max 418 __md 419 __method_call 420 __mi 421 __min 422 __minus 423 __ml 424 __mm 425 __mn 426 __mult 427 __mx 428 __ne 429 __negate 430 __new 431 __nop 432 __nt 433 __nw 434 __oo 435 __op 436 __or 437 __pl 438 __plus 439 __postdecrement 440 __postincrement 441 __pp 442 __pt 443 __rf 444 __rm 445 __rs 446 __sz 447 __trunc_div 448 __trunc_mod 449 __truth_andif 450 __truth_not 451 __truth_orif 452 __vc 453 __vd 454 __vn 455 456 SGI badnames: 457 __builtin_alloca 458 __builtin_fsqrt 459 __builtin_sqrt 460 __builtin_fabs 461 __builtin_dabs 462 __builtin_cast_f2i 463 __builtin_cast_i2f 464 __builtin_cast_d2ll 465 __builtin_cast_ll2d 466 __builtin_copy_dhi2i 467 __builtin_copy_i2dhi 468 __builtin_copy_dlo2i 469 __builtin_copy_i2dlo 470 __add_and_fetch 471 __sub_and_fetch 472 __or_and_fetch 473 __xor_and_fetch 474 __and_and_fetch 475 __nand_and_fetch 476 __mpy_and_fetch 477 __min_and_fetch 478 __max_and_fetch 479 __fetch_and_add 480 __fetch_and_sub 481 __fetch_and_or 482 __fetch_and_xor 483 __fetch_and_and 484 __fetch_and_nand 485 __fetch_and_mpy 486 __fetch_and_min 487 __fetch_and_max 488 __lock_test_and_set 489 __lock_release 490 __lock_acquire 491 __compare_and_swap 492 __synchronize 493 __high_multiply 494 __unix 495 __sgi 496 __linux__ 497 __i386__ 498 __i486__ 499 __cplusplus 500 __embedded_cplusplus 501 // long double conversion members mangled as __opr 502 // http://gcc.gnu.org/ml/libstdc++/1999-q4/msg00060.html 503 __opr 504 </literallayout> 505 </section> 506 507 <section xml:id="coding_style.example"><info><title>By Example</title></info> 508 509 <literallayout class="normal"> 510 This library is written to appropriate C++ coding standards. As such, 511 it is intended to precede the recommendations of the GNU Coding 512 Standard, which can be referenced in full here: 513 514 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.gnu.org/prep/standards/standards.html#Formatting">http://www.gnu.org/prep/standards/standards.html#Formatting</link> 515 516 The rest of this is also interesting reading, but skip the "Design 517 Advice" part. 518 519 The GCC coding conventions are here, and are also useful: 520 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/codingconventions.html">http://gcc.gnu.org/codingconventions.html</link> 521 522 In addition, because it doesn't seem to be stated explicitly anywhere 523 else, there is an 80 column source limit. 524 525 <filename>ChangeLog</filename> entries for member functions should use the 526 classname::member function name syntax as follows: 527 528<code> 5291999-04-15 Dennis Ritchie <dr@att.com> 530 531 * src/basic_file.cc (__basic_file::open): Fix thinko in 532 _G_HAVE_IO_FILE_OPEN bits. 533</code> 534 535 Notable areas of divergence from what may be previous local practice 536 (particularly for GNU C) include: 537 538 01. Pointers and references 539 <code> 540 char* p = "flop"; 541 char& c = *p; 542 -NOT- 543 char *p = "flop"; // wrong 544 char &c = *p; // wrong 545 </code> 546 547 Reason: In C++, definitions are mixed with executable code. Here, 548 <code>p</code> is being initialized, not <code>*p</code>. This is near-universal 549 practice among C++ programmers; it is normal for C hackers 550 to switch spontaneously as they gain experience. 551 552 02. Operator names and parentheses 553 <code> 554 operator==(type) 555 -NOT- 556 operator == (type) // wrong 557 </code> 558 559 Reason: The <code>==</code> is part of the function name. Separating 560 it makes the declaration look like an expression. 561 562 03. Function names and parentheses 563 <code> 564 void mangle() 565 -NOT- 566 void mangle () // wrong 567 </code> 568 569 Reason: no space before parentheses (except after a control-flow 570 keyword) is near-universal practice for C++. It identifies the 571 parentheses as the function-call operator or declarator, as 572 opposed to an expression or other overloaded use of parentheses. 573 574 04. Template function indentation 575 <code> 576 template<typename T> 577 void 578 template_function(args) 579 { } 580 -NOT- 581 template<class T> 582 void template_function(args) {}; 583 </code> 584 585 Reason: In class definitions, without indentation whitespace is 586 needed both above and below the declaration to distinguish 587 it visually from other members. (Also, re: "typename" 588 rather than "class".) <code>T</code> often could be <code>int</code>, which is 589 not a class. ("class", here, is an anachronism.) 590 591 05. Template class indentation 592 <code> 593 template<typename _CharT, typename _Traits> 594 class basic_ios : public ios_base 595 { 596 public: 597 // Types: 598 }; 599 -NOT- 600 template<class _CharT, class _Traits> 601 class basic_ios : public ios_base 602 { 603 public: 604 // Types: 605 }; 606 -NOT- 607 template<class _CharT, class _Traits> 608 class basic_ios : public ios_base 609 { 610 public: 611 // Types: 612 }; 613 </code> 614 615 06. Enumerators 616 <code> 617 enum 618 { 619 space = _ISspace, 620 print = _ISprint, 621 cntrl = _IScntrl 622 }; 623 -NOT- 624 enum { space = _ISspace, print = _ISprint, cntrl = _IScntrl }; 625 </code> 626 627 07. Member initialization lists 628 All one line, separate from class name. 629 630 <code> 631 gribble::gribble() 632 : _M_private_data(0), _M_more_stuff(0), _M_helper(0) 633 { } 634 -NOT- 635 gribble::gribble() : _M_private_data(0), _M_more_stuff(0), _M_helper(0) 636 { } 637 </code> 638 639 08. Try/Catch blocks 640 <code> 641 try 642 { 643 // 644 } 645 catch (...) 646 { 647 // 648 } 649 -NOT- 650 try { 651 // 652 } catch(...) { 653 // 654 } 655 </code> 656 657 09. Member functions declarations and definitions 658 Keywords such as extern, static, export, explicit, inline, etc 659 go on the line above the function name. Thus 660 661 <code> 662 virtual int 663 foo() 664 -NOT- 665 virtual int foo() 666 </code> 667 668 Reason: GNU coding conventions dictate return types for functions 669 are on a separate line than the function name and parameter list 670 for definitions. For C++, where we have member functions that can 671 be either inline definitions or declarations, keeping to this 672 standard allows all member function names for a given class to be 673 aligned to the same margin, increasing readability. 674 675 676 10. Invocation of member functions with "this->" 677 For non-uglified names, use <code>this->name</code> to call the function. 678 679 <code> 680 this->sync() 681 -NOT- 682 sync() 683 </code> 684 685 Reason: Koenig lookup. 686 687 11. Namespaces 688 <code> 689 namespace std 690 { 691 blah blah blah; 692 } // namespace std 693 694 -NOT- 695 696 namespace std { 697 blah blah blah; 698 } // namespace std 699 </code> 700 701 12. Spacing under protected and private in class declarations: 702 space above, none below 703 i.e. 704 705 <code> 706 public: 707 int foo; 708 709 -NOT- 710 public: 711 712 int foo; 713 </code> 714 715 13. Spacing WRT return statements. 716 no extra spacing before returns, no parenthesis 717 i.e. 718 719 <code> 720 } 721 return __ret; 722 723 -NOT- 724 } 725 726 return __ret; 727 728 -NOT- 729 730 } 731 return (__ret); 732 </code> 733 734 735 14. Location of global variables. 736 All global variables of class type, whether in the "user visible" 737 space (e.g., <code>cin</code>) or the implementation namespace, must be defined 738 as a character array with the appropriate alignment and then later 739 re-initialized to the correct value. 740 741 This is due to startup issues on certain platforms, such as AIX. 742 For more explanation and examples, see <filename>src/globals.cc</filename>. All such 743 variables should be contained in that file, for simplicity. 744 745 15. Exception abstractions 746 Use the exception abstractions found in <filename class="headerfile">functexcept.h</filename>, which allow 747 C++ programmers to use this library with <literal>-fno-exceptions</literal>. (Even if 748 that is rarely advisable, it's a necessary evil for backwards 749 compatibility.) 750 751 16. Exception error messages 752 All start with the name of the function where the exception is 753 thrown, and then (optional) descriptive text is added. Example: 754 755 <code> 756 __throw_logic_error(__N("basic_string::_S_construct NULL not valid")); 757 </code> 758 759 Reason: The verbose terminate handler prints out <code>exception::what()</code>, 760 as well as the typeinfo for the thrown exception. As this is the 761 default terminate handler, by putting location info into the 762 exception string, a very useful error message is printed out for 763 uncaught exceptions. So useful, in fact, that non-programmers can 764 give useful error messages, and programmers can intelligently 765 speculate what went wrong without even using a debugger. 766 767 17. The doxygen style guide to comments is a separate document, 768 see index. 769 770 The library currently has a mixture of GNU-C and modern C++ coding 771 styles. The GNU C usages will be combed out gradually. 772 773 Name patterns: 774 775 For nonstandard names appearing in Standard headers, we are constrained 776 to use names that begin with underscores. This is called "uglification". 777 The convention is: 778 779 Local and argument names: <literal>__[a-z].*</literal> 780 781 Examples: <code>__count __ix __s1</code> 782 783 Type names and template formal-argument names: <literal>_[A-Z][^_].*</literal> 784 785 Examples: <code>_Helper _CharT _N</code> 786 787 Member data and function names: <literal>_M_.*</literal> 788 789 Examples: <code>_M_num_elements _M_initialize ()</code> 790 791 Static data members, constants, and enumerations: <literal>_S_.*</literal> 792 793 Examples: <code>_S_max_elements _S_default_value</code> 794 795 Don't use names in the same scope that differ only in the prefix, 796 e.g. _S_top and _M_top. See BADNAMES for a list of forbidden names. 797 (The most tempting of these seem to be and "_T" and "__sz".) 798 799 Names must never have "__" internally; it would confuse name 800 unmanglers on some targets. Also, never use "__[0-9]", same reason. 801 802 -------------------------- 803 804 [BY EXAMPLE] 805 <code> 806 807 #ifndef _HEADER_ 808 #define _HEADER_ 1 809 810 namespace std 811 { 812 class gribble 813 { 814 public: 815 gribble() throw(); 816 817 gribble(const gribble&); 818 819 explicit 820 gribble(int __howmany); 821 822 gribble& 823 operator=(const gribble&); 824 825 virtual 826 ~gribble() throw (); 827 828 // Start with a capital letter, end with a period. 829 inline void 830 public_member(const char* __arg) const; 831 832 // In-class function definitions should be restricted to one-liners. 833 int 834 one_line() { return 0 } 835 836 int 837 two_lines(const char* arg) 838 { return strchr(arg, 'a'); } 839 840 inline int 841 three_lines(); // inline, but defined below. 842 843 // Note indentation. 844 template<typename _Formal_argument> 845 void 846 public_template() const throw(); 847 848 template<typename _Iterator> 849 void 850 other_template(); 851 852 private: 853 class _Helper; 854 855 int _M_private_data; 856 int _M_more_stuff; 857 _Helper* _M_helper; 858 int _M_private_function(); 859 860 enum _Enum 861 { 862 _S_one, 863 _S_two 864 }; 865 866 static void 867 _S_initialize_library(); 868 }; 869 870 // More-or-less-standard language features described by lack, not presence. 871 # ifndef _G_NO_LONGLONG 872 extern long long _G_global_with_a_good_long_name; // avoid globals! 873 # endif 874 875 // Avoid in-class inline definitions, define separately; 876 // likewise for member class definitions: 877 inline int 878 gribble::public_member() const 879 { int __local = 0; return __local; } 880 881 class gribble::_Helper 882 { 883 int _M_stuff; 884 885 friend class gribble; 886 }; 887 } 888 889 // Names beginning with "__": only for arguments and 890 // local variables; never use "__" in a type name, or 891 // within any name; never use "__[0-9]". 892 893 #endif /* _HEADER_ */ 894 895 896 namespace std 897 { 898 template<typename T> // notice: "typename", not "class", no space 899 long_return_value_type<with_many, args> 900 function_name(char* pointer, // "char *pointer" is wrong. 901 char* argument, 902 const Reference& ref) 903 { 904 // int a_local; /* wrong; see below. */ 905 if (test) 906 { 907 nested code 908 } 909 910 int a_local = 0; // declare variable at first use. 911 912 // char a, b, *p; /* wrong */ 913 char a = 'a'; 914 char b = a + 1; 915 char* c = "abc"; // each variable goes on its own line, always. 916 917 // except maybe here... 918 for (unsigned i = 0, mask = 1; mask; ++i, mask <<= 1) { 919 // ... 920 } 921 } 922 923 gribble::gribble() 924 : _M_private_data(0), _M_more_stuff(0), _M_helper(0) 925 { } 926 927 int 928 gribble::three_lines() 929 { 930 // doesn't fit in one line. 931 } 932 } // namespace std 933 </code> 934 </literallayout> 935 </section> 936</section> 937 938<section xml:id="contrib.design_notes" xreflabel="Design Notes"><info><title>Design Notes</title></info> 939 <?dbhtml filename="source_design_notes.html"?> 940 941 <para> 942 </para> 943 944 <literallayout class="normal"> 945 946 The Library 947 ----------- 948 949 This paper is covers two major areas: 950 951 - Features and policies not mentioned in the standard that 952 the quality of the library implementation depends on, including 953 extensions and "implementation-defined" features; 954 955 - Plans for required but unimplemented library features and 956 optimizations to them. 957 958 Overhead 959 -------- 960 961 The standard defines a large library, much larger than the standard 962 C library. A naive implementation would suffer substantial overhead 963 in compile time, executable size, and speed, rendering it unusable 964 in many (particularly embedded) applications. The alternative demands 965 care in construction, and some compiler support, but there is no 966 need for library subsets. 967 968 What are the sources of this overhead? There are four main causes: 969 970 - The library is specified almost entirely as templates, which 971 with current compilers must be included in-line, resulting in 972 very slow builds as tens or hundreds of thousands of lines 973 of function definitions are read for each user source file. 974 Indeed, the entire SGI STL, as well as the dos Reis valarray, 975 are provided purely as header files, largely for simplicity in 976 porting. Iostream/locale is (or will be) as large again. 977 978 - The library is very flexible, specifying a multitude of hooks 979 where users can insert their own code in place of defaults. 980 When these hooks are not used, any time and code expended to 981 support that flexibility is wasted. 982 983 - Templates are often described as causing to "code bloat". In 984 practice, this refers (when it refers to anything real) to several 985 independent processes. First, when a class template is manually 986 instantiated in its entirely, current compilers place the definitions 987 for all members in a single object file, so that a program linking 988 to one member gets definitions of all. Second, template functions 989 which do not actually depend on the template argument are, under 990 current compilers, generated anew for each instantiation, rather 991 than being shared with other instantiations. Third, some of the 992 flexibility mentioned above comes from virtual functions (both in 993 regular classes and template classes) which current linkers add 994 to the executable file even when they manifestly cannot be called. 995 996 - The library is specified to use a language feature, exceptions, 997 which in the current gcc compiler ABI imposes a run time and 998 code space cost to handle the possibility of exceptions even when 999 they are not used. Under the new ABI (accessed with -fnew-abi), 1000 there is a space overhead and a small reduction in code efficiency 1001 resulting from lost optimization opportunities associated with 1002 non-local branches associated with exceptions. 1003 1004 What can be done to eliminate this overhead? A variety of coding 1005 techniques, and compiler, linker and library improvements and 1006 extensions may be used, as covered below. Most are not difficult, 1007 and some are already implemented in varying degrees. 1008 1009 Overhead: Compilation Time 1010 -------------------------- 1011 1012 Providing "ready-instantiated" template code in object code archives 1013 allows us to avoid generating and optimizing template instantiations 1014 in each compilation unit which uses them. However, the number of such 1015 instantiations that are useful to provide is limited, and anyway this 1016 is not enough, by itself, to minimize compilation time. In particular, 1017 it does not reduce time spent parsing conforming headers. 1018 1019 Quicker header parsing will depend on library extensions and compiler 1020 improvements. One approach is some variation on the techniques 1021 previously marketed as "pre-compiled headers", now standardized as 1022 support for the "export" keyword. "Exported" template definitions 1023 can be placed (once) in a "repository" -- really just a library, but 1024 of template definitions rather than object code -- to be drawn upon 1025 at link time when an instantiation is needed, rather than placed in 1026 header files to be parsed along with every compilation unit. 1027 1028 Until "export" is implemented we can put some of the lengthy template 1029 definitions in #if guards or alternative headers so that users can skip 1030 over the full definitions when they need only the ready-instantiated 1031 specializations. 1032 1033 To be precise, this means that certain headers which define 1034 templates which users normally use only for certain arguments 1035 can be instrumented to avoid exposing the template definitions 1036 to the compiler unless a macro is defined. For example, in 1037 <string>, we might have: 1038 1039 template <class _CharT, ... > class basic_string { 1040 ... // member declarations 1041 }; 1042 ... // operator declarations 1043 1044 #ifdef _STRICT_ISO_ 1045 # if _G_NO_TEMPLATE_EXPORT 1046 # include <bits/std_locale.h> // headers needed by definitions 1047 # ... 1048 # include <bits/string.tcc> // member and global template definitions. 1049 # endif 1050 #endif 1051 1052 Users who compile without specifying a strict-ISO-conforming flag 1053 would not see many of the template definitions they now see, and rely 1054 instead on ready-instantiated specializations in the library. This 1055 technique would be useful for the following substantial components: 1056 string, locale/iostreams, valarray. It would *not* be useful or 1057 usable with the following: containers, algorithms, iterators, 1058 allocator. Since these constitute a large (though decreasing) 1059 fraction of the library, the benefit the technique offers is 1060 limited. 1061 1062 The language specifies the semantics of the "export" keyword, but 1063 the gcc compiler does not yet support it. When it does, problems 1064 with large template inclusions can largely disappear, given some 1065 minor library reorganization, along with the need for the apparatus 1066 described above. 1067 1068 Overhead: Flexibility Cost 1069 -------------------------- 1070 1071 The library offers many places where users can specify operations 1072 to be performed by the library in place of defaults. Sometimes 1073 this seems to require that the library use a more-roundabout, and 1074 possibly slower, way to accomplish the default requirements than 1075 would be used otherwise. 1076 1077 The primary protection against this overhead is thorough compiler 1078 optimization, to crush out layers of inline function interfaces. 1079 Kuck & Associates has demonstrated the practicality of this kind 1080 of optimization. 1081 1082 The second line of defense against this overhead is explicit 1083 specialization. By defining helper function templates, and writing 1084 specialized code for the default case, overhead can be eliminated 1085 for that case without sacrificing flexibility. This takes full 1086 advantage of any ability of the optimizer to crush out degenerate 1087 code. 1088 1089 The library specifies many virtual functions which current linkers 1090 load even when they cannot be called. Some minor improvements to the 1091 compiler and to ld would eliminate any such overhead by simply 1092 omitting virtual functions that the complete program does not call. 1093 A prototype of this work has already been done. For targets where 1094 GNU ld is not used, a "pre-linker" could do the same job. 1095 1096 The main areas in the standard interface where user flexibility 1097 can result in overhead are: 1098 1099 - Allocators: Containers are specified to use user-definable 1100 allocator types and objects, making tuning for the container 1101 characteristics tricky. 1102 1103 - Locales: the standard specifies locale objects used to implement 1104 iostream operations, involving many virtual functions which use 1105 streambuf iterators. 1106 1107 - Algorithms and containers: these may be instantiated on any type, 1108 frequently duplicating code for identical operations. 1109 1110 - Iostreams and strings: users are permitted to use these on their 1111 own types, and specify the operations the stream must use on these 1112 types. 1113 1114 Note that these sources of overhead are _avoidable_. The techniques 1115 to avoid them are covered below. 1116 1117 Code Bloat 1118 ---------- 1119 1120 In the SGI STL, and in some other headers, many of the templates 1121 are defined "inline" -- either explicitly or by their placement 1122 in class definitions -- which should not be inline. This is a 1123 source of code bloat. Matt had remarked that he was relying on 1124 the compiler to recognize what was too big to benefit from inlining, 1125 and generate it out-of-line automatically. However, this also can 1126 result in code bloat except where the linker can eliminate the extra 1127 copies. 1128 1129 Fixing these cases will require an audit of all inline functions 1130 defined in the library to determine which merit inlining, and moving 1131 the rest out of line. This is an issue mainly in chapters 23, 25, and 1132 27. Of course it can be done incrementally, and we should generally 1133 accept patches that move large functions out of line and into ".tcc" 1134 files, which can later be pulled into a repository. Compiler/linker 1135 improvements to recognize very large inline functions and move them 1136 out-of-line, but shared among compilation units, could make this 1137 work unnecessary. 1138 1139 Pre-instantiating template specializations currently produces large 1140 amounts of dead code which bloats statically linked programs. The 1141 current state of the static library, libstdc++.a, is intolerable on 1142 this account, and will fuel further confused speculation about a need 1143 for a library "subset". A compiler improvement that treats each 1144 instantiated function as a separate object file, for linking purposes, 1145 would be one solution to this problem. An alternative would be to 1146 split up the manual instantiation files into dozens upon dozens of 1147 little files, each compiled separately, but an abortive attempt at 1148 this was done for <string> and, though it is far from complete, it 1149 is already a nuisance. A better interim solution (just until we have 1150 "export") is badly needed. 1151 1152 When building a shared library, the current compiler/linker cannot 1153 automatically generate the instantiations needed. This creates a 1154 miserable situation; it means any time something is changed in the 1155 library, before a shared library can be built someone must manually 1156 copy the declarations of all templates that are needed by other parts 1157 of the library to an "instantiation" file, and add it to the build 1158 system to be compiled and linked to the library. This process is 1159 readily automated, and should be automated as soon as possible. 1160 Users building their own shared libraries experience identical 1161 frustrations. 1162 1163 Sharing common aspects of template definitions among instantiations 1164 can radically reduce code bloat. The compiler could help a great 1165 deal here by recognizing when a function depends on nothing about 1166 a template parameter, or only on its size, and giving the resulting 1167 function a link-name "equate" that allows it to be shared with other 1168 instantiations. Implementation code could take advantage of the 1169 capability by factoring out code that does not depend on the template 1170 argument into separate functions to be merged by the compiler. 1171 1172 Until such a compiler optimization is implemented, much can be done 1173 manually (if tediously) in this direction. One such optimization is 1174 to derive class templates from non-template classes, and move as much 1175 implementation as possible into the base class. Another is to partial- 1176 specialize certain common instantiations, such as vector<T*>, to share 1177 code for instantiations on all types T. While these techniques work, 1178 they are far from the complete solution that a compiler improvement 1179 would afford. 1180 1181 Overhead: Expensive Language Features 1182 ------------------------------------- 1183 1184 The main "expensive" language feature used in the standard library 1185 is exception support, which requires compiling in cleanup code with 1186 static table data to locate it, and linking in library code to use 1187 the table. For small embedded programs the amount of such library 1188 code and table data is assumed by some to be excessive. Under the 1189 "new" ABI this perception is generally exaggerated, although in some 1190 cases it may actually be excessive. 1191 1192 To implement a library which does not use exceptions directly is 1193 not difficult given minor compiler support (to "turn off" exceptions 1194 and ignore exception constructs), and results in no great library 1195 maintenance difficulties. To be precise, given "-fno-exceptions", 1196 the compiler should treat "try" blocks as ordinary blocks, and 1197 "catch" blocks as dead code to ignore or eliminate. Compiler 1198 support is not strictly necessary, except in the case of "function 1199 try blocks"; otherwise the following macros almost suffice: 1200 1201 #define throw(X) 1202 #define try if (true) 1203 #define catch(X) else if (false) 1204 1205 However, there may be a need to use function try blocks in the 1206 library implementation, and use of macros in this way can make 1207 correct diagnostics impossible. Furthermore, use of this scheme 1208 would require the library to call a function to re-throw exceptions 1209 from a try block. Implementing the above semantics in the compiler 1210 is preferable. 1211 1212 Given the support above (however implemented) it only remains to 1213 replace code that "throws" with a call to a well-documented "handler" 1214 function in a separate compilation unit which may be replaced by 1215 the user. The main source of exceptions that would be difficult 1216 for users to avoid is memory allocation failures, but users can 1217 define their own memory allocation primitives that never throw. 1218 Otherwise, the complete list of such handlers, and which library 1219 functions may call them, would be needed for users to be able to 1220 implement the necessary substitutes. (Fortunately, they have the 1221 source code.) 1222 1223 Opportunities 1224 ------------- 1225 1226 The template capabilities of C++ offer enormous opportunities for 1227 optimizing common library operations, well beyond what would be 1228 considered "eliminating overhead". In particular, many operations 1229 done in Glibc with macros that depend on proprietary language 1230 extensions can be implemented in pristine Standard C++. For example, 1231 the chapter 25 algorithms, and even C library functions such as strchr, 1232 can be specialized for the case of static arrays of known (small) size. 1233 1234 Detailed optimization opportunities are identified below where 1235 the component where they would appear is discussed. Of course new 1236 opportunities will be identified during implementation. 1237 1238 Unimplemented Required Library Features 1239 --------------------------------------- 1240 1241 The standard specifies hundreds of components, grouped broadly by 1242 chapter. These are listed in excruciating detail in the CHECKLIST 1243 file. 1244 1245 17 general 1246 18 support 1247 19 diagnostics 1248 20 utilities 1249 21 string 1250 22 locale 1251 23 containers 1252 24 iterators 1253 25 algorithms 1254 26 numerics 1255 27 iostreams 1256 Annex D backward compatibility 1257 1258 Anyone participating in implementation of the library should obtain 1259 a copy of the standard, ISO 14882. People in the U.S. can obtain an 1260 electronic copy for US$18 from ANSI's web site. Those from other 1261 countries should visit http://www.iso.org/ to find out the location 1262 of their country's representation in ISO, in order to know who can 1263 sell them a copy. 1264 1265 The emphasis in the following sections is on unimplemented features 1266 and optimization opportunities. 1267 1268 Chapter 17 General 1269 ------------------- 1270 1271 Chapter 17 concerns overall library requirements. 1272 1273 The standard doesn't mention threads. A multi-thread (MT) extension 1274 primarily affects operators new and delete (18), allocator (20), 1275 string (21), locale (22), and iostreams (27). The common underlying 1276 support needed for this is discussed under chapter 20. 1277 1278 The standard requirements on names from the C headers create a 1279 lot of work, mostly done. Names in the C headers must be visible 1280 in the std:: and sometimes the global namespace; the names in the 1281 two scopes must refer to the same object. More stringent is that 1282 Koenig lookup implies that any types specified as defined in std:: 1283 really are defined in std::. Names optionally implemented as 1284 macros in C cannot be macros in C++. (An overview may be read at 1285 <http://www.cantrip.org/cheaders.html>). The scripts "inclosure" 1286 and "mkcshadow", and the directories shadow/ and cshadow/, are the 1287 beginning of an effort to conform in this area. 1288 1289 A correct conforming definition of C header names based on underlying 1290 C library headers, and practical linking of conforming namespaced 1291 customer code with third-party C libraries depends ultimately on 1292 an ABI change, allowing namespaced C type names to be mangled into 1293 type names as if they were global, somewhat as C function names in a 1294 namespace, or C++ global variable names, are left unmangled. Perhaps 1295 another "extern" mode, such as 'extern "C-global"' would be an 1296 appropriate place for such type definitions. Such a type would 1297 affect mangling as follows: 1298 1299 namespace A { 1300 struct X {}; 1301 extern "C-global" { // or maybe just 'extern "C"' 1302 struct Y {}; 1303 }; 1304 } 1305 void f(A::X*); // mangles to f__FPQ21A1X 1306 void f(A::Y*); // mangles to f__FP1Y 1307 1308 (It may be that this is really the appropriate semantics for regular 1309 'extern "C"', and 'extern "C-global"', as an extension, would not be 1310 necessary.) This would allow functions declared in non-standard C headers 1311 (and thus fixable by neither us nor users) to link properly with functions 1312 declared using C types defined in properly-namespaced headers. The 1313 problem this solves is that C headers (which C++ programmers do persist 1314 in using) frequently forward-declare C struct tags without including 1315 the header where the type is defined, as in 1316 1317 struct tm; 1318 void munge(tm*); 1319 1320 Without some compiler accommodation, munge cannot be called by correct 1321 C++ code using a pointer to a correctly-scoped tm* value. 1322 1323 The current C headers use the preprocessor extension "#include_next", 1324 which the compiler complains about when run "-pedantic". 1325 (Incidentally, it appears that "-fpedantic" is currently ignored, 1326 probably a bug.) The solution in the C compiler is to use 1327 "-isystem" rather than "-I", but unfortunately in g++ this seems 1328 also to wrap the whole header in an 'extern "C"' block, so it's 1329 unusable for C++ headers. The correct solution appears to be to 1330 allow the various special include-directory options, if not given 1331 an argument, to affect subsequent include-directory options additively, 1332 so that if one said 1333 1334 -pedantic -iprefix $(prefix) \ 1335 -idirafter -ino-pedantic -ino-extern-c -iwithprefix -I g++-v3 \ 1336 -iwithprefix -I g++-v3/ext 1337 1338 the compiler would search $(prefix)/g++-v3 and not report 1339 pedantic warnings for files found there, but treat files in 1340 $(prefix)/g++-v3/ext pedantically. (The undocumented semantics 1341 of "-isystem" in g++ stink. Can they be rescinded? If not it 1342 must be replaced with something more rationally behaved.) 1343 1344 All the C headers need the treatment above; in the standard these 1345 headers are mentioned in various chapters. Below, I have only 1346 mentioned those that present interesting implementation issues. 1347 1348 The components identified as "mostly complete", below, have not been 1349 audited for conformance. In many cases where the library passes 1350 conformance tests we have non-conforming extensions that must be 1351 wrapped in #if guards for "pedantic" use, and in some cases renamed 1352 in a conforming way for continued use in the implementation regardless 1353 of conformance flags. 1354 1355 The STL portion of the library still depends on a header 1356 stl/bits/stl_config.h full of #ifdef clauses. This apparatus 1357 should be replaced with autoconf/automake machinery. 1358 1359 The SGI STL defines a type_traits<> template, specialized for 1360 many types in their code including the built-in numeric and 1361 pointer types and some library types, to direct optimizations of 1362 standard functions. The SGI compiler has been extended to generate 1363 specializations of this template automatically for user types, 1364 so that use of STL templates on user types can take advantage of 1365 these optimizations. Specializations for other, non-STL, types 1366 would make more optimizations possible, but extending the gcc 1367 compiler in the same way would be much better. Probably the next 1368 round of standardization will ratify this, but probably with 1369 changes, so it probably should be renamed to place it in the 1370 implementation namespace. 1371 1372 The SGI STL also defines a large number of extensions visible in 1373 standard headers. (Other extensions that appear in separate headers 1374 have been sequestered in subdirectories ext/ and backward/.) All 1375 these extensions should be moved to other headers where possible, 1376 and in any case wrapped in a namespace (not std!), and (where kept 1377 in a standard header) girded about with macro guards. Some cannot be 1378 moved out of standard headers because they are used to implement 1379 standard features. The canonical method for accommodating these 1380 is to use a protected name, aliased in macro guards to a user-space 1381 name. Unfortunately C++ offers no satisfactory template typedef 1382 mechanism, so very ad-hoc and unsatisfactory aliasing must be used 1383 instead. 1384 1385 Implementation of a template typedef mechanism should have the highest 1386 priority among possible extensions, on the same level as implementation 1387 of the template "export" feature. 1388 1389 Chapter 18 Language support 1390 ---------------------------- 1391 1392 Headers: <limits> <new> <typeinfo> <exception> 1393 C headers: <cstddef> <climits> <cfloat> <cstdarg> <csetjmp> 1394 <ctime> <csignal> <cstdlib> (also 21, 25, 26) 1395 1396 This defines the built-in exceptions, rtti, numeric_limits<>, 1397 operator new and delete. Much of this is provided by the 1398 compiler in its static runtime library. 1399 1400 Work to do includes defining numeric_limits<> specializations in 1401 separate files for all target architectures. Values for integer types 1402 except for bool and wchar_t are readily obtained from the C header 1403 <limits.h>, but values for the remaining numeric types (bool, wchar_t, 1404 float, double, long double) must be entered manually. This is 1405 largely dog work except for those members whose values are not 1406 easily deduced from available documentation. Also, this involves 1407 some work in target configuration to identify the correct choice of 1408 file to build against and to install. 1409 1410 The definitions of the various operators new and delete must be 1411 made thread-safe, which depends on a portable exclusion mechanism, 1412 discussed under chapter 20. Of course there is always plenty of 1413 room for improvements to the speed of operators new and delete. 1414 1415 <cstdarg>, in Glibc, defines some macros that gcc does not allow to 1416 be wrapped into an inline function. Probably this header will demand 1417 attention whenever a new target is chosen. The functions atexit(), 1418 exit(), and abort() in cstdlib have different semantics in C++, so 1419 must be re-implemented for C++. 1420 1421 Chapter 19 Diagnostics 1422 ----------------------- 1423 1424 Headers: <stdexcept> 1425 C headers: <cassert> <cerrno> 1426 1427 This defines the standard exception objects, which are "mostly complete". 1428 Cygnus has a version, and now SGI provides a slightly different one. 1429 It makes little difference which we use. 1430 1431 The C global name "errno", which C allows to be a variable or a macro, 1432 is required in C++ to be a macro. For MT it must typically result in 1433 a function call. 1434 1435 Chapter 20 Utilities 1436 --------------------- 1437 Headers: <utility> <functional> <memory> 1438 C header: <ctime> (also in 18) 1439 1440 SGI STL provides "mostly complete" versions of all the components 1441 defined in this chapter. However, the auto_ptr<> implementation 1442 is known to be wrong. Furthermore, the standard definition of it 1443 is known to be unimplementable as written. A minor change to the 1444 standard would fix it, and auto_ptr<> should be adjusted to match. 1445 1446 Multi-threading affects the allocator implementation, and there must 1447 be configuration/installation choices for different users' MT 1448 requirements. Anyway, users will want to tune allocator options 1449 to support different target conditions, MT or no. 1450 1451 The primitives used for MT implementation should be exposed, as an 1452 extension, for users' own work. We need cross-CPU "mutex" support, 1453 multi-processor shared-memory atomic integer operations, and single- 1454 processor uninterruptible integer operations, and all three configurable 1455 to be stubbed out for non-MT use, or to use an appropriately-loaded 1456 dynamic library for the actual runtime environment, or statically 1457 compiled in for cases where the target architecture is known. 1458 1459 Chapter 21 String 1460 ------------------ 1461 Headers: <string> 1462 C headers: <cctype> <cwctype> <cstring> <cwchar> (also in 27) 1463 <cstdlib> (also in 18, 25, 26) 1464 1465 We have "mostly-complete" char_traits<> implementations. Many of the 1466 char_traits<char> operations might be optimized further using existing 1467 proprietary language extensions. 1468 1469 We have a "mostly-complete" basic_string<> implementation. The work 1470 to manually instantiate char and wchar_t specializations in object 1471 files to improve link-time behavior is extremely unsatisfactory, 1472 literally tripling library-build time with no commensurate improvement 1473 in static program link sizes. It must be redone. (Similar work is 1474 needed for some components in chapters 22 and 27.) 1475 1476 Other work needed for strings is MT-safety, as discussed under the 1477 chapter 20 heading. 1478 1479 The standard C type mbstate_t from <cwchar> and used in char_traits<> 1480 must be different in C++ than in C, because in C++ the default constructor 1481 value mbstate_t() must be the "base" or "ground" sequence state. 1482 (According to the likely resolution of a recently raised Core issue, 1483 this may become unnecessary. However, there are other reasons to 1484 use a state type not as limited as whatever the C library provides.) 1485 If we might want to provide conversions from (e.g.) internally- 1486 represented EUC-wide to externally-represented Unicode, or vice- 1487 versa, the mbstate_t we choose will need to be more accommodating 1488 than what might be provided by an underlying C library. 1489 1490 There remain some basic_string template-member functions which do 1491 not overload properly with their non-template brethren. The infamous 1492 hack akin to what was done in vector<> is needed, to conform to 1493 23.1.1 para 10. The CHECKLIST items for basic_string marked 'X', 1494 or incomplete, are so marked for this reason. 1495 1496 Replacing the string iterators, which currently are simple character 1497 pointers, with class objects would greatly increase the safety of the 1498 client interface, and also permit a "debug" mode in which range, 1499 ownership, and validity are rigorously checked. The current use of 1500 raw pointers as string iterators is evil. vector<> iterators need the 1501 same treatment. Note that the current implementation freely mixes 1502 pointers and iterators, and that must be fixed before safer iterators 1503 can be introduced. 1504 1505 Some of the functions in <cstring> are different from the C version. 1506 generally overloaded on const and non-const argument pointers. For 1507 example, in <cstring> strchr is overloaded. The functions isupper 1508 etc. in <cctype> typically implemented as macros in C are functions 1509 in C++, because they are overloaded with others of the same name 1510 defined in <locale>. 1511 1512 Many of the functions required in <cwctype> and <cwchar> cannot be 1513 implemented using underlying C facilities on intended targets because 1514 such facilities only partly exist. 1515 1516 Chapter 22 Locale 1517 ------------------ 1518 Headers: <locale> 1519 C headers: <clocale> 1520 1521 We have a "mostly complete" class locale, with the exception of 1522 code for constructing, and handling the names of, named locales. 1523 The ways that locales are named (particularly when categories 1524 (e.g. LC_TIME, LC_COLLATE) are different) varies among all target 1525 environments. This code must be written in various versions and 1526 chosen by configuration parameters. 1527 1528 Members of many of the facets defined in <locale> are stubs. Generally, 1529 there are two sets of facets: the base class facets (which are supposed 1530 to implement the "C" locale) and the "byname" facets, which are supposed 1531 to read files to determine their behavior. The base ctype<>, collate<>, 1532 and numpunct<> facets are "mostly complete", except that the table of 1533 bitmask values used for "is" operations, and corresponding mask values, 1534 are still defined in libio and just included/linked. (We will need to 1535 implement these tables independently, soon, but should take advantage 1536 of libio where possible.) The num_put<>::put members for integer types 1537 are "mostly complete". 1538 1539 A complete list of what has and has not been implemented may be 1540 found in CHECKLIST. However, note that the current definition of 1541 codecvt<wchar_t,char,mbstate_t> is wrong. It should simply write 1542 out the raw bytes representing the wide characters, rather than 1543 trying to convert each to a corresponding single "char" value. 1544 1545 Some of the facets are more important than others. Specifically, 1546 the members of ctype<>, numpunct<>, num_put<>, and num_get<> facets 1547 are used by other library facilities defined in <string>, <istream>, 1548 and <ostream>, and the codecvt<> facet is used by basic_filebuf<> 1549 in <fstream>, so a conforming iostream implementation depends on 1550 these. 1551 1552 The "long long" type eventually must be supported, but code mentioning 1553 it should be wrapped in #if guards to allow pedantic-mode compiling. 1554 1555 Performance of num_put<> and num_get<> depend critically on 1556 caching computed values in ios_base objects, and on extensions 1557 to the interface with streambufs. 1558 1559 Specifically: retrieving a copy of the locale object, extracting 1560 the needed facets, and gathering data from them, for each call to 1561 (e.g.) operator<< would be prohibitively slow. To cache format 1562 data for use by num_put<> and num_get<> we have a _Format_cache<> 1563 object stored in the ios_base::pword() array. This is constructed 1564 and initialized lazily, and is organized purely for utility. It 1565 is discarded when a new locale with different facets is imbued. 1566 1567 Using only the public interfaces of the iterator arguments to the 1568 facet functions would limit performance by forbidding "vector-style" 1569 character operations. The streambuf iterator optimizations are 1570 described under chapter 24, but facets can also bypass the streambuf 1571 iterators via explicit specializations and operate directly on the 1572 streambufs, and use extended interfaces to get direct access to the 1573 streambuf internal buffer arrays. These extensions are mentioned 1574 under chapter 27. These optimizations are particularly important 1575 for input parsing. 1576 1577 Unused virtual members of locale facets can be omitted, as mentioned 1578 above, by a smart linker. 1579 1580 Chapter 23 Containers 1581 ---------------------- 1582 Headers: <deque> <list> <queue> <stack> <vector> <map> <set> <bitset> 1583 1584 All the components in chapter 23 are implemented in the SGI STL. 1585 They are "mostly complete"; they include a large number of 1586 nonconforming extensions which must be wrapped. Some of these 1587 are used internally and must be renamed or duplicated. 1588 1589 The SGI components are optimized for large-memory environments. For 1590 embedded targets, different criteria might be more appropriate. Users 1591 will want to be able to tune this behavior. We should provide 1592 ways for users to compile the library with different memory usage 1593 characteristics. 1594 1595 A lot more work is needed on factoring out common code from different 1596 specializations to reduce code size here and in chapter 25. The 1597 easiest fix for this would be a compiler/ABI improvement that allows 1598 the compiler to recognize when a specialization depends only on the 1599 size (or other gross quality) of a template argument, and allow the 1600 linker to share the code with similar specializations. In its 1601 absence, many of the algorithms and containers can be partial- 1602 specialized, at least for the case of pointers, but this only solves 1603 a small part of the problem. Use of a type_traits-style template 1604 allows a few more optimization opportunities, more if the compiler 1605 can generate the specializations automatically. 1606 1607 As an optimization, containers can specialize on the default allocator 1608 and bypass it, or take advantage of details of its implementation 1609 after it has been improved upon. 1610 1611 Replacing the vector iterators, which currently are simple element 1612 pointers, with class objects would greatly increase the safety of the 1613 client interface, and also permit a "debug" mode in which range, 1614 ownership, and validity are rigorously checked. The current use of 1615 pointers for iterators is evil. 1616 1617 As mentioned for chapter 24, the deque iterator is a good example of 1618 an opportunity to implement a "staged" iterator that would benefit 1619 from specializations of some algorithms. 1620 1621 Chapter 24 Iterators 1622 --------------------- 1623 Headers: <iterator> 1624 1625 Standard iterators are "mostly complete", with the exception of 1626 the stream iterators, which are not yet templatized on the 1627 stream type. Also, the base class template iterator<> appears 1628 to be wrong, so everything derived from it must also be wrong, 1629 currently. 1630 1631 The streambuf iterators (currently located in stl/bits/std_iterator.h, 1632 but should be under bits/) can be rewritten to take advantage of 1633 friendship with the streambuf implementation. 1634 1635 Matt Austern has identified opportunities where certain iterator 1636 types, particularly including streambuf iterators and deque 1637 iterators, have a "two-stage" quality, such that an intermediate 1638 limit can be checked much more quickly than the true limit on 1639 range operations. If identified with a member of iterator_traits, 1640 algorithms may be specialized for this case. Of course the 1641 iterators that have this quality can be identified by specializing 1642 a traits class. 1643 1644 Many of the algorithms must be specialized for the streambuf 1645 iterators, to take advantage of block-mode operations, in order 1646 to allow iostream/locale operations' performance not to suffer. 1647 It may be that they could be treated as staged iterators and 1648 take advantage of those optimizations. 1649 1650 Chapter 25 Algorithms 1651 ---------------------- 1652 Headers: <algorithm> 1653 C headers: <cstdlib> (also in 18, 21, 26)) 1654 1655 The algorithms are "mostly complete". As mentioned above, they 1656 are optimized for speed at the expense of code and data size. 1657 1658 Specializations of many of the algorithms for non-STL types would 1659 give performance improvements, but we must use great care not to 1660 interfere with fragile template overloading semantics for the 1661 standard interfaces. Conventionally the standard function template 1662 interface is an inline which delegates to a non-standard function 1663 which is then overloaded (this is already done in many places in 1664 the library). Particularly appealing opportunities for the sake of 1665 iostream performance are for copy and find applied to streambuf 1666 iterators or (as noted elsewhere) for staged iterators, of which 1667 the streambuf iterators are a good example. 1668 1669 The bsearch and qsort functions cannot be overloaded properly as 1670 required by the standard because gcc does not yet allow overloading 1671 on the extern-"C"-ness of a function pointer. 1672 1673 Chapter 26 Numerics 1674 -------------------- 1675 Headers: <complex> <valarray> <numeric> 1676 C headers: <cmath>, <cstdlib> (also 18, 21, 25) 1677 1678 Numeric components: Gabriel dos Reis's valarray, Drepper's complex, 1679 and the few algorithms from the STL are "mostly done". Of course 1680 optimization opportunities abound for the numerically literate. It 1681 is not clear whether the valarray implementation really conforms 1682 fully, in the assumptions it makes about aliasing (and lack thereof) 1683 in its arguments. 1684 1685 The C div() and ldiv() functions are interesting, because they are the 1686 only case where a C library function returns a class object by value. 1687 Since the C++ type div_t must be different from the underlying C type 1688 (which is in the wrong namespace) the underlying functions div() and 1689 ldiv() cannot be re-used efficiently. Fortunately they are trivial to 1690 re-implement. 1691 1692 Chapter 27 Iostreams 1693 --------------------- 1694 Headers: <iosfwd> <streambuf> <ios> <ostream> <istream> <iostream> 1695 <iomanip> <sstream> <fstream> 1696 C headers: <cstdio> <cwchar> (also in 21) 1697 1698 Iostream is currently in a very incomplete state. <iosfwd>, <iomanip>, 1699 ios_base, and basic_ios<> are "mostly complete". basic_streambuf<> and 1700 basic_ostream<> are well along, but basic_istream<> has had little work 1701 done. The standard stream objects, <sstream> and <fstream> have been 1702 started; basic_filebuf<> "write" functions have been implemented just 1703 enough to do "hello, world". 1704 1705 Most of the istream and ostream operators << and >> (with the exception 1706 of the op<<(integer) ones) have not been changed to use locale primitives, 1707 sentry objects, or char_traits members. 1708 1709 All these templates should be manually instantiated for char and 1710 wchar_t in a way that links only used members into user programs. 1711 1712 Streambuf is fertile ground for optimization extensions. An extended 1713 interface giving iterator access to its internal buffer would be very 1714 useful for other library components. 1715 1716 Iostream operations (primarily operators << and >>) can take advantage 1717 of the case where user code has not specified a locale, and bypass locale 1718 operations entirely. The current implementation of op<</num_put<>::put, 1719 for the integer types, demonstrates how they can cache encoding details 1720 from the locale on each operation. There is lots more room for 1721 optimization in this area. 1722 1723 The definition of the relationship between the standard streams 1724 cout et al. and stdout et al. requires something like a "stdiobuf". 1725 The SGI solution of using double-indirection to actually use a 1726 stdio FILE object for buffering is unsatisfactory, because it 1727 interferes with peephole loop optimizations. 1728 1729 The <sstream> header work has begun. stringbuf can benefit from 1730 friendship with basic_string<> and basic_string<>::_Rep to use 1731 those objects directly as buffers, and avoid allocating and making 1732 copies. 1733 1734 The basic_filebuf<> template is a complex beast. It is specified to 1735 use the locale facet codecvt<> to translate characters between native 1736 files and the locale character encoding. In general this involves 1737 two buffers, one of "char" representing the file and another of 1738 "char_type", for the stream, with codecvt<> translating. The process 1739 is complicated by the variable-length nature of the translation, and 1740 the need to seek to corresponding places in the two representations. 1741 For the case of basic_filebuf<char>, when no translation is needed, 1742 a single buffer suffices. A specialized filebuf can be used to reduce 1743 code space overhead when no locale has been imbued. Matt Austern's 1744 work at SGI will be useful, perhaps directly as a source of code, or 1745 at least as an example to draw on. 1746 1747 Filebuf, almost uniquely (cf. operator new), depends heavily on 1748 underlying environmental facilities. In current releases iostream 1749 depends fairly heavily on libio constant definitions, but it should 1750 be made independent. It also depends on operating system primitives 1751 for file operations. There is immense room for optimizations using 1752 (e.g.) mmap for reading. The shadow/ directory wraps, besides the 1753 standard C headers, the libio.h and unistd.h headers, for use mainly 1754 by filebuf. These wrappings have not been completed, though there 1755 is scaffolding in place. 1756 1757 The encapsulation of certain C header <cstdio> names presents an 1758 interesting problem. It is possible to define an inline std::fprintf() 1759 implemented in terms of the 'extern "C"' vfprintf(), but there is no 1760 standard vfscanf() to use to implement std::fscanf(). It appears that 1761 vfscanf but be re-implemented in C++ for targets where no vfscanf 1762 extension has been defined. This is interesting in that it seems 1763 to be the only significant case in the C library where this kind of 1764 rewriting is necessary. (Of course Glibc provides the vfscanf() 1765 extension.) (The functions related to exit() must be rewritten 1766 for other reasons.) 1767 1768 1769 Annex D 1770 ------- 1771 Headers: <strstream> 1772 1773 Annex D defines many non-library features, and many minor 1774 modifications to various headers, and a complete header. 1775 It is "mostly done", except that the libstdc++-2 <strstream> 1776 header has not been adopted into the library, or checked to 1777 verify that it matches the draft in those details that were 1778 clarified by the committee. Certainly it must at least be 1779 moved into the std namespace. 1780 1781 We still need to wrap all the deprecated features in #if guards 1782 so that pedantic compile modes can detect their use. 1783 1784 Nonstandard Extensions 1785 ---------------------- 1786 Headers: <iostream.h> <strstream.h> <hash> <rbtree> 1787 <pthread_alloc> <stdiobuf> (etc.) 1788 1789 User code has come to depend on a variety of nonstandard components 1790 that we must not omit. Much of this code can be adopted from 1791 libstdc++-v2 or from the SGI STL. This particularly includes 1792 <iostream.h>, <strstream.h>, and various SGI extensions such 1793 as <hash_map.h>. Many of these are already placed in the 1794 subdirectories ext/ and backward/. (Note that it is better to 1795 include them via "<backward/hash_map.h>" or "<ext/hash_map>" than 1796 to search the subdirectory itself via a "-I" directive. 1797 </literallayout> 1798</section> 1799 1800</appendix> 1801