1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?> 2*404b540aSrobert<!DOCTYPE html 3*404b540aSrobert PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 4*404b540aSrobert "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 5*404b540aSrobert 6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 7*404b540aSrobert<head> 8*404b540aSrobert <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 9*404b540aSrobert <meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)" /> 10*404b540aSrobert <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" /> 11*404b540aSrobert <meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 21." /> 12*404b540aSrobert <meta name="GENERATOR" content="vi and eight fingers" /> 13*404b540aSrobert <title>libstdc++-v3 HOWTO: Chapter 21: Strings</title> 14*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" /> 15*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html" 16*404b540aSrobert title="GNU C++ Standard Library" /> 17*404b540aSrobert<link rel="Prev" href="../20_util/howto.html" type="text/html" 18*404b540aSrobert title="General Utilities" /> 19*404b540aSrobert<link rel="Next" href="../22_locale/howto.html" type="text/html" 20*404b540aSrobert title="Localization" /> 21*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" /> 22*404b540aSrobert<link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." /> 23*404b540aSrobert</head> 24*404b540aSrobert<body> 25*404b540aSrobert 26*404b540aSrobert<h1 class="centered"><a name="top">Chapter 21: Strings</a></h1> 27*404b540aSrobert 28*404b540aSrobert<p>Chapter 21 deals with the C++ strings library (a welcome relief). 29*404b540aSrobert</p> 30*404b540aSrobert 31*404b540aSrobert 32*404b540aSrobert<!-- ####################################################### --> 33*404b540aSrobert<hr /> 34*404b540aSrobert<h1>Contents</h1> 35*404b540aSrobert<ul> 36*404b540aSrobert <li><a href="#1">MFC's CString</a></li> 37*404b540aSrobert <li><a href="#2">A case-insensitive string class</a></li> 38*404b540aSrobert <li><a href="#3">Breaking a C++ string into tokens</a></li> 39*404b540aSrobert <li><a href="#4">Simple transformations</a></li> 40*404b540aSrobert <li><a href="#5">Making strings of arbitrary character types</a></li> 41*404b540aSrobert <li><a href="#6">Shrink-to-fit strings</a></li> 42*404b540aSrobert</ul> 43*404b540aSrobert 44*404b540aSrobert<hr /> 45*404b540aSrobert 46*404b540aSrobert<!-- ####################################################### --> 47*404b540aSrobert 48*404b540aSrobert<h2><a name="1">MFC's CString</a></h2> 49*404b540aSrobert <p>A common lament seen in various newsgroups deals with the Standard 50*404b540aSrobert string class as opposed to the Microsoft Foundation Class called 51*404b540aSrobert CString. Often programmers realize that a standard portable 52*404b540aSrobert answer is better than a proprietary nonportable one, but in porting 53*404b540aSrobert their application from a Win32 platform, they discover that they 54*404b540aSrobert are relying on special functions offered by the CString class. 55*404b540aSrobert </p> 56*404b540aSrobert <p>Things are not as bad as they seem. In 57*404b540aSrobert <a href="http://gcc.gnu.org/ml/gcc/1999-04n/msg00236.html">this 58*404b540aSrobert message</a>, Joe Buck points out a few very important things: 59*404b540aSrobert </p> 60*404b540aSrobert <ul> 61*404b540aSrobert <li>The Standard <code>string</code> supports all the operations 62*404b540aSrobert that CString does, with three exceptions. 63*404b540aSrobert </li> 64*404b540aSrobert <li>Two of those exceptions (whitespace trimming and case 65*404b540aSrobert conversion) are trivial to implement. In fact, we do so 66*404b540aSrobert on this page. 67*404b540aSrobert </li> 68*404b540aSrobert <li>The third is <code>CString::Format</code>, which allows formatting 69*404b540aSrobert in the style of <code>sprintf</code>. This deserves some mention: 70*404b540aSrobert </li> 71*404b540aSrobert </ul> 72*404b540aSrobert <p><a name="1.1internal"> <!-- Coming from Chapter 27 --> 73*404b540aSrobert The old libg++ library had a function called form(), which did much 74*404b540aSrobert the same thing. But for a Standard solution, you should use the 75*404b540aSrobert stringstream classes. These are the bridge between the iostream 76*404b540aSrobert hierarchy and the string class, and they operate with regular 77*404b540aSrobert streams seamlessly because they inherit from the iostream 78*404b540aSrobert hierarchy. An quick example: 79*404b540aSrobert </a> 80*404b540aSrobert </p> 81*404b540aSrobert <pre> 82*404b540aSrobert #include <iostream> 83*404b540aSrobert #include <string> 84*404b540aSrobert #include <sstream> 85*404b540aSrobert 86*404b540aSrobert string f (string& incoming) // incoming is "foo N" 87*404b540aSrobert { 88*404b540aSrobert istringstream incoming_stream(incoming); 89*404b540aSrobert string the_word; 90*404b540aSrobert int the_number; 91*404b540aSrobert 92*404b540aSrobert incoming_stream >> the_word // extract "foo" 93*404b540aSrobert >> the_number; // extract N 94*404b540aSrobert 95*404b540aSrobert ostringstream output_stream; 96*404b540aSrobert output_stream << "The word was " << the_word 97*404b540aSrobert << " and 3*N was " << (3*the_number); 98*404b540aSrobert 99*404b540aSrobert return output_stream.str(); 100*404b540aSrobert } </pre> 101*404b540aSrobert <p>A serious problem with CString is a design bug in its memory 102*404b540aSrobert allocation. Specifically, quoting from that same message: 103*404b540aSrobert </p> 104*404b540aSrobert <pre> 105*404b540aSrobert CString suffers from a common programming error that results in 106*404b540aSrobert poor performance. Consider the following code: 107*404b540aSrobert 108*404b540aSrobert CString n_copies_of (const CString& foo, unsigned n) 109*404b540aSrobert { 110*404b540aSrobert CString tmp; 111*404b540aSrobert for (unsigned i = 0; i < n; i++) 112*404b540aSrobert tmp += foo; 113*404b540aSrobert return tmp; 114*404b540aSrobert } 115*404b540aSrobert 116*404b540aSrobert This function is O(n^2), not O(n). The reason is that each += 117*404b540aSrobert causes a reallocation and copy of the existing string. Microsoft 118*404b540aSrobert applications are full of this kind of thing (quadratic performance 119*404b540aSrobert on tasks that can be done in linear time) -- on the other hand, 120*404b540aSrobert we should be thankful, as it's created such a big market for high-end 121*404b540aSrobert ix86 hardware. :-) 122*404b540aSrobert 123*404b540aSrobert If you replace CString with string in the above function, the 124*404b540aSrobert performance is O(n). 125*404b540aSrobert </pre> 126*404b540aSrobert <p>Joe Buck also pointed out some other things to keep in mind when 127*404b540aSrobert comparing CString and the Standard string class: 128*404b540aSrobert </p> 129*404b540aSrobert <ul> 130*404b540aSrobert <li>CString permits access to its internal representation; coders 131*404b540aSrobert who exploited that may have problems moving to <code>string</code>. 132*404b540aSrobert </li> 133*404b540aSrobert <li>Microsoft ships the source to CString (in the files 134*404b540aSrobert MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation 135*404b540aSrobert bug and rebuild your MFC libraries. 136*404b540aSrobert <em><strong>Note:</strong> It looks like the CString shipped 137*404b540aSrobert with VC++6.0 has fixed this, although it may in fact have been 138*404b540aSrobert one of the VC++ SPs that did it.</em> 139*404b540aSrobert </li> 140*404b540aSrobert <li><code>string</code> operations like this have O(n) complexity 141*404b540aSrobert <em>if the implementors do it correctly</em>. The libstdc++ 142*404b540aSrobert implementors did it correctly. Other vendors might not. 143*404b540aSrobert </li> 144*404b540aSrobert <li>While parts of the SGI STL are used in libstdc++-v3, their 145*404b540aSrobert string class is not. The SGI <code>string</code> is essentially 146*404b540aSrobert <code>vector<char></code> and does not do any reference 147*404b540aSrobert counting like libstdc++-v3's does. (It is O(n), though.) 148*404b540aSrobert So if you're thinking about SGI's string or rope classes, 149*404b540aSrobert you're now looking at four possibilities: CString, the 150*404b540aSrobert libstdc++ string, the SGI string, and the SGI rope, and this 151*404b540aSrobert is all before any allocator or traits customizations! (More 152*404b540aSrobert choices than you can shake a stick at -- want fries with that?) 153*404b540aSrobert </li> 154*404b540aSrobert </ul> 155*404b540aSrobert <p>Return <a href="#top">to top of page</a> or 156*404b540aSrobert <a href="../faq/index.html">to the FAQ</a>. 157*404b540aSrobert </p> 158*404b540aSrobert 159*404b540aSrobert<hr /> 160*404b540aSrobert<h2><a name="2">A case-insensitive string class</a></h2> 161*404b540aSrobert <p>The well-known-and-if-it-isn't-well-known-it-ought-to-be 162*404b540aSrobert <a href="http://www.gotw.ca/gotw/index.htm">Guru of the Week</a> 163*404b540aSrobert discussions held on Usenet covered this topic in January of 1998. 164*404b540aSrobert Briefly, the challenge was, "write a 'ci_string' class which 165*404b540aSrobert is identical to the standard 'string' class, but is 166*404b540aSrobert case-insensitive in the same way as the (common but nonstandard) 167*404b540aSrobert C function stricmp():" 168*404b540aSrobert </p> 169*404b540aSrobert <pre> 170*404b540aSrobert ci_string s( "AbCdE" ); 171*404b540aSrobert 172*404b540aSrobert // case insensitive 173*404b540aSrobert assert( s == "abcde" ); 174*404b540aSrobert assert( s == "ABCDE" ); 175*404b540aSrobert 176*404b540aSrobert // still case-preserving, of course 177*404b540aSrobert assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); 178*404b540aSrobert assert( strcmp( s.c_str(), "abcde" ) != 0 ); </pre> 179*404b540aSrobert 180*404b540aSrobert <p>The solution is surprisingly easy. The original answer pages 181*404b540aSrobert on the GotW website were removed into cold storage, in 182*404b540aSrobert preparation for 183*404b540aSrobert <a href="http://cseng.aw.com/bookpage.taf?ISBN=0-201-61562-2">a 184*404b540aSrobert published book of GotW notes</a>. Before being 185*404b540aSrobert put on the web, of course, it was posted on Usenet, and that 186*404b540aSrobert posting containing the answer is <a href="gotw29a.txt">available 187*404b540aSrobert here</a>. 188*404b540aSrobert </p> 189*404b540aSrobert <p>See? Told you it was easy!</p> 190*404b540aSrobert <p><strong>Added June 2000:</strong> The May issue of <u>C++ Report</u> 191*404b540aSrobert contains 192*404b540aSrobert a fascinating article by Matt Austern (yes, <em>the</em> Matt Austern) 193*404b540aSrobert on why case-insensitive comparisons are not as easy as they seem, 194*404b540aSrobert and why creating a class is the <em>wrong</em> way to go about it in 195*404b540aSrobert production code. (The GotW answer mentions one of the principle 196*404b540aSrobert difficulties; his article mentions more.) 197*404b540aSrobert </p> 198*404b540aSrobert <p>Basically, this is "easy" only if you ignore some things, 199*404b540aSrobert things which may be too important to your program to ignore. (I chose 200*404b540aSrobert to ignore them when originally writing this entry, and am surprised 201*404b540aSrobert that nobody ever called me on it...) The GotW question and answer 202*404b540aSrobert remain useful instructional tools, however. 203*404b540aSrobert </p> 204*404b540aSrobert <p><strong>Added September 2000:</strong> James Kanze provided a link to a 205*404b540aSrobert <a href="http://www.unicode.org/unicode/reports/tr21/">Unicode 206*404b540aSrobert Technical Report discussing case handling</a>, which provides some 207*404b540aSrobert very good information. 208*404b540aSrobert </p> 209*404b540aSrobert <p>Return <a href="#top">to top of page</a> or 210*404b540aSrobert <a href="../faq/index.html">to the FAQ</a>. 211*404b540aSrobert </p> 212*404b540aSrobert 213*404b540aSrobert<hr /> 214*404b540aSrobert<h2><a name="3">Breaking a C++ string into tokens</a></h2> 215*404b540aSrobert <p>The Standard C (and C++) function <code>strtok()</code> leaves a lot to 216*404b540aSrobert be desired in terms of user-friendliness. It's unintuitive, it 217*404b540aSrobert destroys the character string on which it operates, and it requires 218*404b540aSrobert you to handle all the memory problems. But it does let the client 219*404b540aSrobert code decide what to use to break the string into pieces; it allows 220*404b540aSrobert you to choose the "whitespace," so to speak. 221*404b540aSrobert </p> 222*404b540aSrobert <p>A C++ implementation lets us keep the good things and fix those 223*404b540aSrobert annoyances. The implementation here is more intuitive (you only 224*404b540aSrobert call it once, not in a loop with varying argument), it does not 225*404b540aSrobert affect the original string at all, and all the memory allocation 226*404b540aSrobert is handled for you. 227*404b540aSrobert </p> 228*404b540aSrobert <p>It's called stringtok, and it's a template function. It's given 229*404b540aSrobert <a href="stringtok_h.txt">in this file</a> in a less-portable form than 230*404b540aSrobert it could be, to keep this example simple (for example, see the 231*404b540aSrobert comments on what kind of string it will accept). The author uses 232*404b540aSrobert a more general (but less readable) form of it for parsing command 233*404b540aSrobert strings and the like. If you compiled and ran this code using it: 234*404b540aSrobert </p> 235*404b540aSrobert <pre> 236*404b540aSrobert std::list<string> ls; 237*404b540aSrobert stringtok (ls, " this \t is\t\n a test "); 238*404b540aSrobert for (std::list<string>const_iterator i = ls.begin(); 239*404b540aSrobert i != ls.end(); ++i) 240*404b540aSrobert { 241*404b540aSrobert std::cerr << ':' << (*i) << ":\n"; 242*404b540aSrobert } </pre> 243*404b540aSrobert <p>You would see this as output: 244*404b540aSrobert </p> 245*404b540aSrobert <pre> 246*404b540aSrobert :this: 247*404b540aSrobert :is: 248*404b540aSrobert :a: 249*404b540aSrobert :test: </pre> 250*404b540aSrobert <p>with all the whitespace removed. The original <code>s</code> is still 251*404b540aSrobert available for use, <code>ls</code> will clean up after itself, and 252*404b540aSrobert <code>ls.size()</code> will return how many tokens there were. 253*404b540aSrobert </p> 254*404b540aSrobert <p>As always, there is a price paid here, in that stringtok is not 255*404b540aSrobert as fast as strtok. The other benefits usually outweigh that, however. 256*404b540aSrobert <a href="stringtok_std_h.txt">Another version of stringtok is given 257*404b540aSrobert here</a>, suggested by Chris King and tweaked by Petr Prikryl, 258*404b540aSrobert and this one uses the 259*404b540aSrobert transformation functions mentioned below. If you are comfortable 260*404b540aSrobert with reading the new function names, this version is recommended 261*404b540aSrobert as an example. 262*404b540aSrobert </p> 263*404b540aSrobert <p><strong>Added February 2001:</strong> Mark Wilden pointed out that the 264*404b540aSrobert standard <code>std::getline()</code> function can be used with standard 265*404b540aSrobert <a href="../27_io/howto.html">istringstreams</a> to perform 266*404b540aSrobert tokenizing as well. Build an istringstream from the input text, 267*404b540aSrobert and then use std::getline with varying delimiters (the three-argument 268*404b540aSrobert signature) to extract tokens into a string. 269*404b540aSrobert </p> 270*404b540aSrobert <p>Return <a href="#top">to top of page</a> or 271*404b540aSrobert <a href="../faq/index.html">to the FAQ</a>. 272*404b540aSrobert </p> 273*404b540aSrobert 274*404b540aSrobert<hr /> 275*404b540aSrobert<h2><a name="4">Simple transformations</a></h2> 276*404b540aSrobert <p>Here are Standard, simple, and portable ways to perform common 277*404b540aSrobert transformations on a <code>string</code> instance, such as "convert 278*404b540aSrobert to all upper case." The word transformations is especially 279*404b540aSrobert apt, because the standard template function 280*404b540aSrobert <code>transform<></code> is used. 281*404b540aSrobert </p> 282*404b540aSrobert <p>This code will go through some iterations (no pun). Here's the 283*404b540aSrobert simplistic version usually seen on Usenet: 284*404b540aSrobert </p> 285*404b540aSrobert <pre> 286*404b540aSrobert #include <string> 287*404b540aSrobert #include <algorithm> 288*404b540aSrobert #include <cctype> // old <ctype.h> 289*404b540aSrobert 290*404b540aSrobert struct ToLower 291*404b540aSrobert { 292*404b540aSrobert char operator() (char c) const { return std::tolower(c); } 293*404b540aSrobert }; 294*404b540aSrobert 295*404b540aSrobert struct ToUpper 296*404b540aSrobert { 297*404b540aSrobert char operator() (char c) const { return std::toupper(c); } 298*404b540aSrobert }; 299*404b540aSrobert 300*404b540aSrobert int main() 301*404b540aSrobert { 302*404b540aSrobert std::string s ("Some Kind Of Initial Input Goes Here"); 303*404b540aSrobert 304*404b540aSrobert // Change everything into upper case 305*404b540aSrobert std::transform (s.begin(), s.end(), s.begin(), ToUpper()); 306*404b540aSrobert 307*404b540aSrobert // Change everything into lower case 308*404b540aSrobert std::transform (s.begin(), s.end(), s.begin(), ToLower()); 309*404b540aSrobert 310*404b540aSrobert // Change everything back into upper case, but store the 311*404b540aSrobert // result in a different string 312*404b540aSrobert std::string capital_s; 313*404b540aSrobert capital_s.resize(s.size()); 314*404b540aSrobert std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper()); 315*404b540aSrobert } </pre> 316*404b540aSrobert <p><span class="larger"><strong>Note</strong></span> that these calls all 317*404b540aSrobert involve the global C locale through the use of the C functions 318*404b540aSrobert <code>toupper/tolower</code>. This is absolutely guaranteed to work -- 319*404b540aSrobert but <em>only</em> if the string contains <em>only</em> characters 320*404b540aSrobert from the basic source character set, and there are <em>only</em> 321*404b540aSrobert 96 of those. Which means that not even all English text can be 322*404b540aSrobert represented (certain British spellings, proper names, and so forth). 323*404b540aSrobert So, if all your input forevermore consists of only those 96 324*404b540aSrobert characters (hahahahahaha), then you're done. 325*404b540aSrobert </p> 326*404b540aSrobert <p><span class="larger"><strong>Note</strong></span> that the 327*404b540aSrobert <code>ToUpper</code> and <code>ToLower</code> function objects 328*404b540aSrobert are needed because <code>toupper</code> and <code>tolower</code> 329*404b540aSrobert are overloaded names (declared in <code><cctype></code> and 330*404b540aSrobert <code><locale></code>) so the template-arguments for 331*404b540aSrobert <code>transform<></code> cannot be deduced, as explained in 332*404b540aSrobert <a href="http://gcc.gnu.org/ml/libstdc++/2002-11/msg00180.html">this 333*404b540aSrobert message</a>. <!-- section 14.8.2.4 clause 16 in ISO 14882:1998 334*404b540aSrobert if you're into that sort of thing --> 335*404b540aSrobert At minimum, you can write short wrappers like 336*404b540aSrobert </p> 337*404b540aSrobert <pre> 338*404b540aSrobert char toLower (char c) 339*404b540aSrobert { 340*404b540aSrobert return std::tolower(c); 341*404b540aSrobert } </pre> 342*404b540aSrobert <p>The correct method is to use a facet for a particular locale 343*404b540aSrobert and call its conversion functions. These are discussed more in 344*404b540aSrobert Chapter 22; the specific part is 345*404b540aSrobert <a href="../22_locale/howto.html#7">Correct Transformations</a>, 346*404b540aSrobert which shows the final version of this code. (Thanks to James Kanze 347*404b540aSrobert for assistance and suggestions on all of this.) 348*404b540aSrobert </p> 349*404b540aSrobert <p>Another common operation is trimming off excess whitespace. Much 350*404b540aSrobert like transformations, this task is trivial with the use of string's 351*404b540aSrobert <code>find</code> family. These examples are broken into multiple 352*404b540aSrobert statements for readability: 353*404b540aSrobert </p> 354*404b540aSrobert <pre> 355*404b540aSrobert std::string str (" \t blah blah blah \n "); 356*404b540aSrobert 357*404b540aSrobert // trim leading whitespace 358*404b540aSrobert string::size_type notwhite = str.find_first_not_of(" \t\n"); 359*404b540aSrobert str.erase(0,notwhite); 360*404b540aSrobert 361*404b540aSrobert // trim trailing whitespace 362*404b540aSrobert notwhite = str.find_last_not_of(" \t\n"); 363*404b540aSrobert str.erase(notwhite+1); </pre> 364*404b540aSrobert <p>Obviously, the calls to <code>find</code> could be inserted directly 365*404b540aSrobert into the calls to <code>erase</code>, in case your compiler does not 366*404b540aSrobert optimize named temporaries out of existence. 367*404b540aSrobert </p> 368*404b540aSrobert <p>Return <a href="#top">to top of page</a> or 369*404b540aSrobert <a href="../faq/index.html">to the FAQ</a>. 370*404b540aSrobert </p> 371*404b540aSrobert 372*404b540aSrobert<hr /> 373*404b540aSrobert<h2><a name="5">Making strings of arbitrary character types</a></h2> 374*404b540aSrobert <p>The <code>std::basic_string</code> is tantalizingly general, in that 375*404b540aSrobert it is parameterized on the type of the characters which it holds. 376*404b540aSrobert In theory, you could whip up a Unicode character class and instantiate 377*404b540aSrobert <code>std::basic_string<my_unicode_char></code>, or assuming 378*404b540aSrobert that integers are wider than characters on your platform, maybe just 379*404b540aSrobert declare variables of type <code>std::basic_string<int></code>. 380*404b540aSrobert </p> 381*404b540aSrobert <p>That's the theory. Remember however that basic_string has additional 382*404b540aSrobert type parameters, which take default arguments based on the character 383*404b540aSrobert type (called CharT here): 384*404b540aSrobert </p> 385*404b540aSrobert <pre> 386*404b540aSrobert template <typename CharT, 387*404b540aSrobert typename Traits = char_traits<CharT>, 388*404b540aSrobert typename Alloc = allocator<CharT> > 389*404b540aSrobert class basic_string { .... };</pre> 390*404b540aSrobert <p>Now, <code>allocator<CharT></code> will probably Do The Right 391*404b540aSrobert Thing by default, unless you need to implement your own allocator 392*404b540aSrobert for your characters. 393*404b540aSrobert </p> 394*404b540aSrobert <p>But <code>char_traits</code> takes more work. The char_traits 395*404b540aSrobert template is <em>declared</em> but not <em>defined</em>. 396*404b540aSrobert That means there is only 397*404b540aSrobert </p> 398*404b540aSrobert <pre> 399*404b540aSrobert template <typename CharT> 400*404b540aSrobert struct char_traits 401*404b540aSrobert { 402*404b540aSrobert static void foo (type1 x, type2 y); 403*404b540aSrobert ... 404*404b540aSrobert };</pre> 405*404b540aSrobert <p>and functions such as char_traits<CharT>::foo() are not 406*404b540aSrobert actually defined anywhere for the general case. The C++ standard 407*404b540aSrobert permits this, because writing such a definition to fit all possible 408*404b540aSrobert CharT's cannot be done. (For a time, in earlier versions of GCC, 409*404b540aSrobert there was a mostly-correct implementation that let programmers be 410*404b540aSrobert lazy. :-) But it broke under many situations, so it was removed. 411*404b540aSrobert You are no longer allowed to be lazy and non-portable.) 412*404b540aSrobert </p> 413*404b540aSrobert <p>The C++ standard also requires that char_traits be specialized for 414*404b540aSrobert instantiations of <code>char</code> and <code>wchar_t</code>, and it 415*404b540aSrobert is these template specializations that permit entities like 416*404b540aSrobert <code>basic_string<char,char_traits<char>></code> to work. 417*404b540aSrobert </p> 418*404b540aSrobert <p>If you want to use character types other than char and wchar_t, 419*404b540aSrobert such as <code>unsigned char</code> and <code>int</code>, you will 420*404b540aSrobert need to write specializations for them at the present time. If you 421*404b540aSrobert want to use your own special character class, then you have 422*404b540aSrobert <a href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00163.html">a lot 423*404b540aSrobert of work to do</a>, especially if you with to use i18n features 424*404b540aSrobert (facets require traits information but don't have a traits argument). 425*404b540aSrobert </p> 426*404b540aSrobert <p>One example of how to specialize char_traits is given <a 427*404b540aSrobert href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00260.html">in 428*404b540aSrobert this message</a>, which was then put into the file <code> 429*404b540aSrobert include/ext/pod_char_traits.h</code> at a later date. We agree 430*404b540aSrobert that the way it's used with basic_string (scroll down to main()) 431*404b540aSrobert doesn't look nice, but that's because <a 432*404b540aSrobert href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00236.html">the 433*404b540aSrobert nice-looking first attempt</a> turned out to <a 434*404b540aSrobert href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00242.html">not 435*404b540aSrobert be conforming C++</a>, due to the rule that CharT must be a POD. 436*404b540aSrobert (See how tricky this is?) 437*404b540aSrobert </p> 438*404b540aSrobert <p>Other approaches were suggested in that same thread, such as providing 439*404b540aSrobert more specializations and/or some helper types in the library to assist 440*404b540aSrobert users writing such code. So far nobody has had the time... 441*404b540aSrobert <a href="../17_intro/contribute.html">do you?</a> 442*404b540aSrobert </p> 443*404b540aSrobert <p>Return <a href="#top">to top of page</a> or 444*404b540aSrobert <a href="../faq/index.html">to the FAQ</a>. 445*404b540aSrobert </p> 446*404b540aSrobert 447*404b540aSrobert<hr /> 448*404b540aSrobert<h2><a name="6">Shrink-to-fit strings</a></h2> 449*404b540aSrobert <!-- referenced by faq/index.html#5_9, update link if numbering changes --> 450*404b540aSrobert <p>From GCC 3.4 calling <code>s.reserve(res)</code> on a 451*404b540aSrobert <code>string s</code> with <code>res < s.capacity()</code> will 452*404b540aSrobert reduce the string's capacity to <code>std::max(s.size(), res)</code>. 453*404b540aSrobert </p> 454*404b540aSrobert <p>This behaviour is suggested, but not required by the standard. Prior 455*404b540aSrobert to GCC 3.4 the following alternative can be used instead 456*404b540aSrobert </p> 457*404b540aSrobert <pre> 458*404b540aSrobert std::string(str.data(), str.size()).swap(str); 459*404b540aSrobert </pre> 460*404b540aSrobert <p>This is similar to the idiom for reducing a <code>vector</code>'s 461*404b540aSrobert memory usage (see <a href='../faq/index.html#5_9'>FAQ 5.9</a>) but 462*404b540aSrobert the regular copy constructor cannot be used because libstdc++'s 463*404b540aSrobert <code>string</code> is Copy-On-Write. 464*404b540aSrobert </p> 465*404b540aSrobert 466*404b540aSrobert 467*404b540aSrobert<!-- ####################################################### --> 468*404b540aSrobert 469*404b540aSrobert<hr /> 470*404b540aSrobert<p class="fineprint"><em> 471*404b540aSrobertSee <a href="../17_intro/license.html">license.html</a> for copying conditions. 472*404b540aSrobertComments and suggestions are welcome, and may be sent to 473*404b540aSrobert<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>. 474*404b540aSrobert</em></p> 475*404b540aSrobert 476*404b540aSrobert 477*404b540aSrobert</body> 478*404b540aSrobert</html> 479