xref: /openbsd-src/gnu/gcc/libstdc++-v3/docs/html/21_strings/howto.html (revision 404b540a9034ac75a6199ad1a32d1bbc7a0d4210)
1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?>
2*404b540aSrobert<!DOCTYPE html
3*404b540aSrobert          PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4*404b540aSrobert          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5*404b540aSrobert
6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7*404b540aSrobert<head>
8*404b540aSrobert   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9*404b540aSrobert   <meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)" />
10*404b540aSrobert   <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" />
11*404b540aSrobert   <meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 21." />
12*404b540aSrobert   <meta name="GENERATOR" content="vi and eight fingers" />
13*404b540aSrobert   <title>libstdc++-v3 HOWTO:  Chapter 21: Strings</title>
14*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
15*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html"
16*404b540aSrobert  title="GNU C++ Standard Library" />
17*404b540aSrobert<link rel="Prev" href="../20_util/howto.html" type="text/html"
18*404b540aSrobert  title="General Utilities" />
19*404b540aSrobert<link rel="Next" href="../22_locale/howto.html" type="text/html"
20*404b540aSrobert  title="Localization" />
21*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" />
22*404b540aSrobert<link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." />
23*404b540aSrobert</head>
24*404b540aSrobert<body>
25*404b540aSrobert
26*404b540aSrobert<h1 class="centered"><a name="top">Chapter 21:  Strings</a></h1>
27*404b540aSrobert
28*404b540aSrobert<p>Chapter 21 deals with the C++ strings library (a welcome relief).
29*404b540aSrobert</p>
30*404b540aSrobert
31*404b540aSrobert
32*404b540aSrobert<!-- ####################################################### -->
33*404b540aSrobert<hr />
34*404b540aSrobert<h1>Contents</h1>
35*404b540aSrobert<ul>
36*404b540aSrobert   <li><a href="#1">MFC's CString</a></li>
37*404b540aSrobert   <li><a href="#2">A case-insensitive string class</a></li>
38*404b540aSrobert   <li><a href="#3">Breaking a C++ string into tokens</a></li>
39*404b540aSrobert   <li><a href="#4">Simple transformations</a></li>
40*404b540aSrobert   <li><a href="#5">Making strings of arbitrary character types</a></li>
41*404b540aSrobert   <li><a href="#6">Shrink-to-fit strings</a></li>
42*404b540aSrobert</ul>
43*404b540aSrobert
44*404b540aSrobert<hr />
45*404b540aSrobert
46*404b540aSrobert<!-- ####################################################### -->
47*404b540aSrobert
48*404b540aSrobert<h2><a name="1">MFC's CString</a></h2>
49*404b540aSrobert   <p>A common lament seen in various newsgroups deals with the Standard
50*404b540aSrobert      string class as opposed to the Microsoft Foundation Class called
51*404b540aSrobert      CString.  Often programmers realize that a standard portable
52*404b540aSrobert      answer is better than a proprietary nonportable one, but in porting
53*404b540aSrobert      their application from a Win32 platform, they discover that they
54*404b540aSrobert      are relying on special functions offered by the CString class.
55*404b540aSrobert   </p>
56*404b540aSrobert   <p>Things are not as bad as they seem.  In
57*404b540aSrobert      <a href="http://gcc.gnu.org/ml/gcc/1999-04n/msg00236.html">this
58*404b540aSrobert      message</a>, Joe Buck points out a few very important things:
59*404b540aSrobert   </p>
60*404b540aSrobert      <ul>
61*404b540aSrobert         <li>The Standard <code>string</code> supports all the operations
62*404b540aSrobert             that CString does, with three exceptions.
63*404b540aSrobert         </li>
64*404b540aSrobert         <li>Two of those exceptions (whitespace trimming and case
65*404b540aSrobert             conversion) are trivial to implement.  In fact, we do so
66*404b540aSrobert             on this page.
67*404b540aSrobert         </li>
68*404b540aSrobert         <li>The third is <code>CString::Format</code>, which allows formatting
69*404b540aSrobert             in the style of <code>sprintf</code>.  This deserves some mention:
70*404b540aSrobert         </li>
71*404b540aSrobert      </ul>
72*404b540aSrobert   <p><a name="1.1internal"> <!-- Coming from Chapter 27 -->
73*404b540aSrobert      The old libg++ library had a function called form(), which did much
74*404b540aSrobert      the same thing.  But for a Standard solution, you should use the
75*404b540aSrobert      stringstream classes.  These are the bridge between the iostream
76*404b540aSrobert      hierarchy and the string class, and they operate with regular
77*404b540aSrobert      streams seamlessly because they inherit from the iostream
78*404b540aSrobert      hierarchy.  An quick example:
79*404b540aSrobert      </a>
80*404b540aSrobert   </p>
81*404b540aSrobert   <pre>
82*404b540aSrobert   #include &lt;iostream&gt;
83*404b540aSrobert   #include &lt;string&gt;
84*404b540aSrobert   #include &lt;sstream&gt;
85*404b540aSrobert
86*404b540aSrobert   string f (string&amp; incoming)     // incoming is "foo  N"
87*404b540aSrobert   {
88*404b540aSrobert       istringstream   incoming_stream(incoming);
89*404b540aSrobert       string          the_word;
90*404b540aSrobert       int             the_number;
91*404b540aSrobert
92*404b540aSrobert       incoming_stream &gt;&gt; the_word        // extract "foo"
93*404b540aSrobert                       &gt;&gt; the_number;     // extract N
94*404b540aSrobert
95*404b540aSrobert       ostringstream   output_stream;
96*404b540aSrobert       output_stream &lt;&lt; "The word was " &lt;&lt; the_word
97*404b540aSrobert                     &lt;&lt; " and 3*N was " &lt;&lt; (3*the_number);
98*404b540aSrobert
99*404b540aSrobert       return output_stream.str();
100*404b540aSrobert   } </pre>
101*404b540aSrobert   <p>A serious problem with CString is a design bug in its memory
102*404b540aSrobert      allocation.  Specifically, quoting from that same message:
103*404b540aSrobert   </p>
104*404b540aSrobert   <pre>
105*404b540aSrobert   CString suffers from a common programming error that results in
106*404b540aSrobert   poor performance.  Consider the following code:
107*404b540aSrobert
108*404b540aSrobert   CString n_copies_of (const CString&amp; foo, unsigned n)
109*404b540aSrobert   {
110*404b540aSrobert           CString tmp;
111*404b540aSrobert           for (unsigned i = 0; i &lt; n; i++)
112*404b540aSrobert                   tmp += foo;
113*404b540aSrobert           return tmp;
114*404b540aSrobert   }
115*404b540aSrobert
116*404b540aSrobert   This function is O(n^2), not O(n).  The reason is that each +=
117*404b540aSrobert   causes a reallocation and copy of the existing string.  Microsoft
118*404b540aSrobert   applications are full of this kind of thing (quadratic performance
119*404b540aSrobert   on tasks that can be done in linear time) -- on the other hand,
120*404b540aSrobert   we should be thankful, as it's created such a big market for high-end
121*404b540aSrobert   ix86 hardware. :-)
122*404b540aSrobert
123*404b540aSrobert   If you replace CString with string in the above function, the
124*404b540aSrobert   performance is O(n).
125*404b540aSrobert   </pre>
126*404b540aSrobert   <p>Joe Buck also pointed out some other things to keep in mind when
127*404b540aSrobert      comparing CString and the Standard string class:
128*404b540aSrobert   </p>
129*404b540aSrobert      <ul>
130*404b540aSrobert         <li>CString permits access to its internal representation; coders
131*404b540aSrobert             who exploited that may have problems moving to <code>string</code>.
132*404b540aSrobert         </li>
133*404b540aSrobert         <li>Microsoft ships the source to CString (in the files
134*404b540aSrobert             MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation
135*404b540aSrobert             bug and rebuild your MFC libraries.
136*404b540aSrobert             <em><strong>Note:</strong> It looks like the CString shipped
137*404b540aSrobert             with VC++6.0 has fixed this, although it may in fact have been
138*404b540aSrobert             one of the VC++ SPs that did it.</em>
139*404b540aSrobert         </li>
140*404b540aSrobert         <li><code>string</code> operations like this have O(n) complexity
141*404b540aSrobert             <em>if the implementors do it correctly</em>.  The libstdc++
142*404b540aSrobert             implementors did it correctly.  Other vendors might not.
143*404b540aSrobert         </li>
144*404b540aSrobert         <li>While parts of the SGI STL are used in libstdc++-v3, their
145*404b540aSrobert             string class is not.  The SGI <code>string</code> is essentially
146*404b540aSrobert             <code>vector&lt;char&gt;</code> and does not do any reference
147*404b540aSrobert             counting like libstdc++-v3's does.  (It is O(n), though.)
148*404b540aSrobert             So if you're thinking about SGI's string or rope classes,
149*404b540aSrobert             you're now looking at four possibilities:  CString, the
150*404b540aSrobert             libstdc++ string, the SGI string, and the SGI rope, and this
151*404b540aSrobert             is all before any allocator or traits customizations!  (More
152*404b540aSrobert             choices than you can shake a stick at -- want fries with that?)
153*404b540aSrobert         </li>
154*404b540aSrobert      </ul>
155*404b540aSrobert   <p>Return <a href="#top">to top of page</a> or
156*404b540aSrobert      <a href="../faq/index.html">to the FAQ</a>.
157*404b540aSrobert   </p>
158*404b540aSrobert
159*404b540aSrobert<hr />
160*404b540aSrobert<h2><a name="2">A case-insensitive string class</a></h2>
161*404b540aSrobert   <p>The well-known-and-if-it-isn't-well-known-it-ought-to-be
162*404b540aSrobert      <a href="http://www.gotw.ca/gotw/index.htm">Guru of the Week</a>
163*404b540aSrobert      discussions held on Usenet covered this topic in January of 1998.
164*404b540aSrobert      Briefly, the challenge was, &quot;write a 'ci_string' class which
165*404b540aSrobert      is identical to the standard 'string' class, but is
166*404b540aSrobert      case-insensitive in the same way as the (common but nonstandard)
167*404b540aSrobert      C function stricmp():&quot;
168*404b540aSrobert   </p>
169*404b540aSrobert   <pre>
170*404b540aSrobert   ci_string s( "AbCdE" );
171*404b540aSrobert
172*404b540aSrobert   // case insensitive
173*404b540aSrobert   assert( s == "abcde" );
174*404b540aSrobert   assert( s == "ABCDE" );
175*404b540aSrobert
176*404b540aSrobert   // still case-preserving, of course
177*404b540aSrobert   assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
178*404b540aSrobert   assert( strcmp( s.c_str(), "abcde" ) != 0 ); </pre>
179*404b540aSrobert
180*404b540aSrobert   <p>The solution is surprisingly easy.  The original answer pages
181*404b540aSrobert      on the GotW website were removed into cold storage, in
182*404b540aSrobert      preparation for
183*404b540aSrobert      <a href="http://cseng.aw.com/bookpage.taf?ISBN=0-201-61562-2">a
184*404b540aSrobert      published book of GotW notes</a>.  Before being
185*404b540aSrobert      put on the web, of course, it was posted on Usenet, and that
186*404b540aSrobert      posting containing the answer is <a href="gotw29a.txt">available
187*404b540aSrobert      here</a>.
188*404b540aSrobert   </p>
189*404b540aSrobert   <p>See?  Told you it was easy!</p>
190*404b540aSrobert   <p><strong>Added June 2000:</strong>  The May issue of <u>C++ Report</u>
191*404b540aSrobert      contains
192*404b540aSrobert      a fascinating article by Matt Austern (yes, <em>the</em> Matt Austern)
193*404b540aSrobert      on why case-insensitive comparisons are not as easy as they seem,
194*404b540aSrobert      and why creating a class is the <em>wrong</em> way to go about it in
195*404b540aSrobert      production code.  (The GotW answer mentions one of the principle
196*404b540aSrobert      difficulties; his article mentions more.)
197*404b540aSrobert   </p>
198*404b540aSrobert   <p>Basically, this is &quot;easy&quot; only if you ignore some things,
199*404b540aSrobert      things which may be too important to your program to ignore.  (I chose
200*404b540aSrobert      to ignore them when originally writing this entry, and am surprised
201*404b540aSrobert      that nobody ever called me on it...)  The GotW question and answer
202*404b540aSrobert      remain useful instructional tools, however.
203*404b540aSrobert   </p>
204*404b540aSrobert   <p><strong>Added September 2000:</strong>  James Kanze provided a link to a
205*404b540aSrobert      <a href="http://www.unicode.org/unicode/reports/tr21/">Unicode
206*404b540aSrobert      Technical Report discussing case handling</a>, which provides some
207*404b540aSrobert      very good information.
208*404b540aSrobert   </p>
209*404b540aSrobert   <p>Return <a href="#top">to top of page</a> or
210*404b540aSrobert      <a href="../faq/index.html">to the FAQ</a>.
211*404b540aSrobert   </p>
212*404b540aSrobert
213*404b540aSrobert<hr />
214*404b540aSrobert<h2><a name="3">Breaking a C++ string into tokens</a></h2>
215*404b540aSrobert   <p>The Standard C (and C++) function <code>strtok()</code> leaves a lot to
216*404b540aSrobert      be desired in terms of user-friendliness.  It's unintuitive, it
217*404b540aSrobert      destroys the character string on which it operates, and it requires
218*404b540aSrobert      you to handle all the memory problems.  But it does let the client
219*404b540aSrobert      code decide what to use to break the string into pieces; it allows
220*404b540aSrobert      you to choose the &quot;whitespace,&quot; so to speak.
221*404b540aSrobert   </p>
222*404b540aSrobert   <p>A C++ implementation lets us keep the good things and fix those
223*404b540aSrobert      annoyances.  The implementation here is more intuitive (you only
224*404b540aSrobert      call it once, not in a loop with varying argument), it does not
225*404b540aSrobert      affect the original string at all, and all the memory allocation
226*404b540aSrobert      is handled for you.
227*404b540aSrobert   </p>
228*404b540aSrobert   <p>It's called stringtok, and it's a template function.  It's given
229*404b540aSrobert      <a href="stringtok_h.txt">in this file</a> in a less-portable form than
230*404b540aSrobert      it could be, to keep this example simple (for example, see the
231*404b540aSrobert      comments on what kind of string it will accept).  The author uses
232*404b540aSrobert      a more general (but less readable) form of it for parsing command
233*404b540aSrobert      strings and the like.  If you compiled and ran this code using it:
234*404b540aSrobert   </p>
235*404b540aSrobert   <pre>
236*404b540aSrobert   std::list&lt;string&gt;  ls;
237*404b540aSrobert   stringtok (ls, " this  \t is\t\n  a test  ");
238*404b540aSrobert   for (std::list&lt;string&gt;const_iterator i = ls.begin();
239*404b540aSrobert        i != ls.end(); ++i)
240*404b540aSrobert   {
241*404b540aSrobert       std::cerr &lt;&lt; ':' &lt;&lt; (*i) &lt;&lt; ":\n";
242*404b540aSrobert   } </pre>
243*404b540aSrobert   <p>You would see this as output:
244*404b540aSrobert   </p>
245*404b540aSrobert   <pre>
246*404b540aSrobert   :this:
247*404b540aSrobert   :is:
248*404b540aSrobert   :a:
249*404b540aSrobert   :test: </pre>
250*404b540aSrobert   <p>with all the whitespace removed.  The original <code>s</code> is still
251*404b540aSrobert      available for use, <code>ls</code> will clean up after itself, and
252*404b540aSrobert      <code>ls.size()</code> will return how many tokens there were.
253*404b540aSrobert   </p>
254*404b540aSrobert   <p>As always, there is a price paid here, in that stringtok is not
255*404b540aSrobert      as fast as strtok.  The other benefits usually outweigh that, however.
256*404b540aSrobert      <a href="stringtok_std_h.txt">Another version of stringtok is given
257*404b540aSrobert      here</a>, suggested by Chris King and tweaked by Petr Prikryl,
258*404b540aSrobert      and this one uses the
259*404b540aSrobert      transformation functions mentioned below.  If you are comfortable
260*404b540aSrobert      with reading the new function names, this version is recommended
261*404b540aSrobert      as an example.
262*404b540aSrobert   </p>
263*404b540aSrobert   <p><strong>Added February 2001:</strong>  Mark Wilden pointed out that the
264*404b540aSrobert      standard <code>std::getline()</code> function can be used with standard
265*404b540aSrobert      <a href="../27_io/howto.html">istringstreams</a> to perform
266*404b540aSrobert      tokenizing as well.  Build an istringstream from the input text,
267*404b540aSrobert      and then use std::getline with varying delimiters (the three-argument
268*404b540aSrobert      signature) to extract tokens into a string.
269*404b540aSrobert   </p>
270*404b540aSrobert   <p>Return <a href="#top">to top of page</a> or
271*404b540aSrobert      <a href="../faq/index.html">to the FAQ</a>.
272*404b540aSrobert   </p>
273*404b540aSrobert
274*404b540aSrobert<hr />
275*404b540aSrobert<h2><a name="4">Simple transformations</a></h2>
276*404b540aSrobert   <p>Here are Standard, simple, and portable ways to perform common
277*404b540aSrobert      transformations on a <code>string</code> instance, such as &quot;convert
278*404b540aSrobert      to all upper case.&quot;  The word transformations is especially
279*404b540aSrobert      apt, because the standard template function
280*404b540aSrobert      <code>transform&lt;&gt;</code> is used.
281*404b540aSrobert   </p>
282*404b540aSrobert   <p>This code will go through some iterations (no pun).  Here's the
283*404b540aSrobert      simplistic version usually seen on Usenet:
284*404b540aSrobert   </p>
285*404b540aSrobert   <pre>
286*404b540aSrobert   #include &lt;string&gt;
287*404b540aSrobert   #include &lt;algorithm&gt;
288*404b540aSrobert   #include &lt;cctype&gt;      // old &lt;ctype.h&gt;
289*404b540aSrobert
290*404b540aSrobert   struct ToLower
291*404b540aSrobert   {
292*404b540aSrobert     char operator() (char c) const  { return std::tolower(c); }
293*404b540aSrobert   };
294*404b540aSrobert
295*404b540aSrobert   struct ToUpper
296*404b540aSrobert   {
297*404b540aSrobert     char operator() (char c) const  { return std::toupper(c); }
298*404b540aSrobert   };
299*404b540aSrobert
300*404b540aSrobert   int main()
301*404b540aSrobert   {
302*404b540aSrobert     std::string  s ("Some Kind Of Initial Input Goes Here");
303*404b540aSrobert
304*404b540aSrobert     // Change everything into upper case
305*404b540aSrobert     std::transform (s.begin(), s.end(), s.begin(), ToUpper());
306*404b540aSrobert
307*404b540aSrobert     // Change everything into lower case
308*404b540aSrobert     std::transform (s.begin(), s.end(), s.begin(), ToLower());
309*404b540aSrobert
310*404b540aSrobert     // Change everything back into upper case, but store the
311*404b540aSrobert     // result in a different string
312*404b540aSrobert     std::string  capital_s;
313*404b540aSrobert     capital_s.resize(s.size());
314*404b540aSrobert     std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper());
315*404b540aSrobert   } </pre>
316*404b540aSrobert   <p><span class="larger"><strong>Note</strong></span> that these calls all
317*404b540aSrobert      involve the global C locale through the use of the C functions
318*404b540aSrobert      <code>toupper/tolower</code>.  This is absolutely guaranteed to work --
319*404b540aSrobert      but <em>only</em> if the string contains <em>only</em> characters
320*404b540aSrobert      from the basic source character set, and there are <em>only</em>
321*404b540aSrobert      96 of those.  Which means that not even all English text can be
322*404b540aSrobert      represented (certain British spellings, proper names, and so forth).
323*404b540aSrobert      So, if all your input forevermore consists of only those 96
324*404b540aSrobert      characters (hahahahahaha), then you're done.
325*404b540aSrobert   </p>
326*404b540aSrobert   <p><span class="larger"><strong>Note</strong></span> that the
327*404b540aSrobert      <code>ToUpper</code> and <code>ToLower</code> function objects
328*404b540aSrobert      are needed because <code>toupper</code> and <code>tolower</code>
329*404b540aSrobert      are overloaded names (declared in <code>&lt;cctype&gt;</code> and
330*404b540aSrobert      <code>&lt;locale&gt;</code>) so the template-arguments for
331*404b540aSrobert      <code>transform&lt;&gt;</code> cannot be deduced, as explained in
332*404b540aSrobert      <a href="http://gcc.gnu.org/ml/libstdc++/2002-11/msg00180.html">this
333*404b540aSrobert      message</a>.  <!-- section 14.8.2.4 clause 16 in ISO 14882:1998
334*404b540aSrobert      if you're into that sort of thing -->
335*404b540aSrobert      At minimum, you can write short wrappers like
336*404b540aSrobert   </p>
337*404b540aSrobert   <pre>
338*404b540aSrobert   char toLower (char c)
339*404b540aSrobert   {
340*404b540aSrobert      return std::tolower(c);
341*404b540aSrobert   } </pre>
342*404b540aSrobert   <p>The correct method is to use a facet for a particular locale
343*404b540aSrobert      and call its conversion functions.  These are discussed more in
344*404b540aSrobert      Chapter 22; the specific part is
345*404b540aSrobert      <a href="../22_locale/howto.html#7">Correct Transformations</a>,
346*404b540aSrobert      which shows the final version of this code.  (Thanks to James Kanze
347*404b540aSrobert      for assistance and suggestions on all of this.)
348*404b540aSrobert   </p>
349*404b540aSrobert   <p>Another common operation is trimming off excess whitespace.  Much
350*404b540aSrobert      like transformations, this task is trivial with the use of string's
351*404b540aSrobert      <code>find</code> family.  These examples are broken into multiple
352*404b540aSrobert      statements for readability:
353*404b540aSrobert   </p>
354*404b540aSrobert   <pre>
355*404b540aSrobert   std::string  str (" \t blah blah blah    \n ");
356*404b540aSrobert
357*404b540aSrobert   // trim leading whitespace
358*404b540aSrobert   string::size_type  notwhite = str.find_first_not_of(" \t\n");
359*404b540aSrobert   str.erase(0,notwhite);
360*404b540aSrobert
361*404b540aSrobert   // trim trailing whitespace
362*404b540aSrobert   notwhite = str.find_last_not_of(" \t\n");
363*404b540aSrobert   str.erase(notwhite+1); </pre>
364*404b540aSrobert   <p>Obviously, the calls to <code>find</code> could be inserted directly
365*404b540aSrobert      into the calls to <code>erase</code>, in case your compiler does not
366*404b540aSrobert      optimize named temporaries out of existence.
367*404b540aSrobert   </p>
368*404b540aSrobert   <p>Return <a href="#top">to top of page</a> or
369*404b540aSrobert      <a href="../faq/index.html">to the FAQ</a>.
370*404b540aSrobert   </p>
371*404b540aSrobert
372*404b540aSrobert<hr />
373*404b540aSrobert<h2><a name="5">Making strings of arbitrary character types</a></h2>
374*404b540aSrobert   <p>The <code>std::basic_string</code> is tantalizingly general, in that
375*404b540aSrobert      it is parameterized on the type of the characters which it holds.
376*404b540aSrobert      In theory, you could whip up a Unicode character class and instantiate
377*404b540aSrobert      <code>std::basic_string&lt;my_unicode_char&gt;</code>, or assuming
378*404b540aSrobert      that integers are wider than characters on your platform, maybe just
379*404b540aSrobert      declare variables of type <code>std::basic_string&lt;int&gt;</code>.
380*404b540aSrobert   </p>
381*404b540aSrobert   <p>That's the theory.  Remember however that basic_string has additional
382*404b540aSrobert      type parameters, which take default arguments based on the character
383*404b540aSrobert      type (called CharT here):
384*404b540aSrobert   </p>
385*404b540aSrobert   <pre>
386*404b540aSrobert      template &lt;typename CharT,
387*404b540aSrobert                typename Traits = char_traits&lt;CharT&gt;,
388*404b540aSrobert                typename Alloc = allocator&lt;CharT&gt; &gt;
389*404b540aSrobert      class basic_string { .... };</pre>
390*404b540aSrobert   <p>Now, <code>allocator&lt;CharT&gt;</code> will probably Do The Right
391*404b540aSrobert      Thing by default, unless you need to implement your own allocator
392*404b540aSrobert      for your characters.
393*404b540aSrobert   </p>
394*404b540aSrobert   <p>But <code>char_traits</code> takes more work.  The char_traits
395*404b540aSrobert      template is <em>declared</em> but not <em>defined</em>.
396*404b540aSrobert      That means there is only
397*404b540aSrobert   </p>
398*404b540aSrobert   <pre>
399*404b540aSrobert      template &lt;typename CharT&gt;
400*404b540aSrobert        struct char_traits
401*404b540aSrobert        {
402*404b540aSrobert            static void foo (type1 x, type2 y);
403*404b540aSrobert            ...
404*404b540aSrobert        };</pre>
405*404b540aSrobert   <p>and functions such as char_traits&lt;CharT&gt;::foo() are not
406*404b540aSrobert      actually defined anywhere for the general case.  The C++ standard
407*404b540aSrobert      permits this, because writing such a definition to fit all possible
408*404b540aSrobert      CharT's cannot be done.  (For a time, in earlier versions of GCC,
409*404b540aSrobert      there was a mostly-correct implementation that let programmers be
410*404b540aSrobert      lazy.  :-)  But it broke under many situations, so it was removed.
411*404b540aSrobert      You are no longer allowed to be lazy and non-portable.)
412*404b540aSrobert   </p>
413*404b540aSrobert   <p>The C++ standard also requires that char_traits be specialized for
414*404b540aSrobert      instantiations of <code>char</code> and <code>wchar_t</code>, and it
415*404b540aSrobert      is these template specializations that permit entities like
416*404b540aSrobert      <code>basic_string&lt;char,char_traits&lt;char&gt;&gt;</code> to work.
417*404b540aSrobert   </p>
418*404b540aSrobert   <p>If you want to use character types other than char and wchar_t,
419*404b540aSrobert      such as <code>unsigned char</code> and <code>int</code>, you will
420*404b540aSrobert      need to write specializations for them at the present time.  If you
421*404b540aSrobert      want to use your own special character class, then you have
422*404b540aSrobert      <a href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00163.html">a lot
423*404b540aSrobert      of work to do</a>, especially if you with to use i18n features
424*404b540aSrobert      (facets require traits information but don't have a traits argument).
425*404b540aSrobert   </p>
426*404b540aSrobert   <p>One example of how to specialize char_traits is given <a
427*404b540aSrobert      href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00260.html">in
428*404b540aSrobert      this message</a>, which was then put into the file <code>
429*404b540aSrobert      include/ext/pod_char_traits.h</code> at a later date.  We agree
430*404b540aSrobert      that the way it's used with basic_string (scroll down to main())
431*404b540aSrobert      doesn't look nice, but that's because <a
432*404b540aSrobert      href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00236.html">the
433*404b540aSrobert      nice-looking first attempt</a> turned out to <a
434*404b540aSrobert      href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00242.html">not
435*404b540aSrobert      be conforming C++</a>, due to the rule that CharT must be a POD.
436*404b540aSrobert      (See how tricky this is?)
437*404b540aSrobert   </p>
438*404b540aSrobert   <p>Other approaches were suggested in that same thread, such as providing
439*404b540aSrobert      more specializations and/or some helper types in the library to assist
440*404b540aSrobert      users writing such code.  So far nobody has had the time...
441*404b540aSrobert      <a href="../17_intro/contribute.html">do you?</a>
442*404b540aSrobert   </p>
443*404b540aSrobert   <p>Return <a href="#top">to top of page</a> or
444*404b540aSrobert      <a href="../faq/index.html">to the FAQ</a>.
445*404b540aSrobert   </p>
446*404b540aSrobert
447*404b540aSrobert<hr />
448*404b540aSrobert<h2><a name="6">Shrink-to-fit strings</a></h2>
449*404b540aSrobert   <!-- referenced by faq/index.html#5_9, update link if numbering changes -->
450*404b540aSrobert   <p>From GCC 3.4 calling <code>s.reserve(res)</code> on a
451*404b540aSrobert      <code>string s</code> with <code>res &lt; s.capacity()</code> will
452*404b540aSrobert      reduce the string's capacity to <code>std::max(s.size(), res)</code>.
453*404b540aSrobert   </p>
454*404b540aSrobert   <p>This behaviour is suggested, but not required by the standard. Prior
455*404b540aSrobert      to GCC 3.4 the following alternative can be used instead
456*404b540aSrobert   </p>
457*404b540aSrobert   <pre>
458*404b540aSrobert      std::string(str.data(), str.size()).swap(str);
459*404b540aSrobert   </pre>
460*404b540aSrobert   <p>This is similar to the idiom for reducing a <code>vector</code>'s
461*404b540aSrobert      memory usage (see <a href='../faq/index.html#5_9'>FAQ 5.9</a>) but
462*404b540aSrobert      the regular copy constructor cannot be used because libstdc++'s
463*404b540aSrobert      <code>string</code> is Copy-On-Write.
464*404b540aSrobert   </p>
465*404b540aSrobert
466*404b540aSrobert
467*404b540aSrobert<!-- ####################################################### -->
468*404b540aSrobert
469*404b540aSrobert<hr />
470*404b540aSrobert<p class="fineprint"><em>
471*404b540aSrobertSee <a href="../17_intro/license.html">license.html</a> for copying conditions.
472*404b540aSrobertComments and suggestions are welcome, and may be sent to
473*404b540aSrobert<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
474*404b540aSrobert</em></p>
475*404b540aSrobert
476*404b540aSrobert
477*404b540aSrobert</body>
478*404b540aSrobert</html>
479