136ac495dSmrg<?xml version="1.0" encoding="UTF-8" standalone="no"?> 236ac495dSmrg<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Chapter 7. Strings</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="std_contents.html" title="Part II. Standard Contents" /><link rel="prev" href="traits.html" title="Traits" /><link rel="next" href="localization.html" title="Chapter 8. Localization" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 7. 336ac495dSmrg Strings 436ac495dSmrg 536ac495dSmrg</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="traits.html">Prev</a> </td><th width="60%" align="center">Part II. 636ac495dSmrg Standard Contents 736ac495dSmrg </th><td width="20%" align="right"> <a accesskey="n" href="localization.html">Next</a></td></tr></table><hr /></div><div class="chapter"><div class="titlepage"><div><div><h2 class="title"><a id="std.strings"></a>Chapter 7. 836ac495dSmrg Strings 936ac495dSmrg <a id="id-1.3.4.5.1.1.1" class="indexterm"></a> 1036ac495dSmrg</h2></div></div></div><div class="toc"><p><strong>Table of Contents</strong></p><dl class="toc"><dt><span class="section"><a href="strings.html#std.strings.string">String Classes</a></span></dt><dd><dl><dt><span class="section"><a href="strings.html#strings.string.simple">Simple Transformations</a></span></dt><dt><span class="section"><a href="strings.html#strings.string.case">Case Sensitivity</a></span></dt><dt><span class="section"><a href="strings.html#strings.string.character_types">Arbitrary Character Types</a></span></dt><dt><span class="section"><a href="strings.html#strings.string.token">Tokenizing</a></span></dt><dt><span class="section"><a href="strings.html#strings.string.shrink">Shrink to Fit</a></span></dt><dt><span class="section"><a href="strings.html#strings.string.Cstring">CString (MFC)</a></span></dt></dl></dd></dl></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.strings.string"></a>String Classes</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="strings.string.simple"></a>Simple Transformations</h3></div></div></div><p> 1136ac495dSmrg Here are Standard, simple, and portable ways to perform common 1236ac495dSmrg transformations on a <code class="code">string</code> instance, such as 1336ac495dSmrg "convert to all upper case." The word transformations 1436ac495dSmrg is especially apt, because the standard template function 1536ac495dSmrg <code class="code">transform<></code> is used. 1636ac495dSmrg </p><p> 1736ac495dSmrg This code will go through some iterations. Here's a simple 1836ac495dSmrg version: 1936ac495dSmrg </p><pre class="programlisting"> 2036ac495dSmrg #include <string> 2136ac495dSmrg #include <algorithm> 2236ac495dSmrg #include <cctype> // old <ctype.h> 2336ac495dSmrg 2436ac495dSmrg struct ToLower 2536ac495dSmrg { 2636ac495dSmrg char operator() (char c) const { return std::tolower(c); } 2736ac495dSmrg }; 2836ac495dSmrg 2936ac495dSmrg struct ToUpper 3036ac495dSmrg { 3136ac495dSmrg char operator() (char c) const { return std::toupper(c); } 3236ac495dSmrg }; 3336ac495dSmrg 3436ac495dSmrg int main() 3536ac495dSmrg { 3636ac495dSmrg std::string s ("Some Kind Of Initial Input Goes Here"); 3736ac495dSmrg 3836ac495dSmrg // Change everything into upper case 3936ac495dSmrg std::transform (s.begin(), s.end(), s.begin(), ToUpper()); 4036ac495dSmrg 4136ac495dSmrg // Change everything into lower case 4236ac495dSmrg std::transform (s.begin(), s.end(), s.begin(), ToLower()); 4336ac495dSmrg 4436ac495dSmrg // Change everything back into upper case, but store the 4536ac495dSmrg // result in a different string 4636ac495dSmrg std::string capital_s; 4736ac495dSmrg capital_s.resize(s.size()); 4836ac495dSmrg std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper()); 4936ac495dSmrg } 5036ac495dSmrg </pre><p> 5136ac495dSmrg <span class="emphasis"><em>Note</em></span> that these calls all 5236ac495dSmrg involve the global C locale through the use of the C functions 5336ac495dSmrg <code class="code">toupper/tolower</code>. This is absolutely guaranteed to work -- 5436ac495dSmrg but <span class="emphasis"><em>only</em></span> if the string contains <span class="emphasis"><em>only</em></span> characters 5536ac495dSmrg from the basic source character set, and there are <span class="emphasis"><em>only</em></span> 5636ac495dSmrg 96 of those. Which means that not even all English text can be 5736ac495dSmrg represented (certain British spellings, proper names, and so forth). 5836ac495dSmrg So, if all your input forevermore consists of only those 96 5936ac495dSmrg characters (hahahahahaha), then you're done. 6036ac495dSmrg </p><p><span class="emphasis"><em>Note</em></span> that the 6136ac495dSmrg <code class="code">ToUpper</code> and <code class="code">ToLower</code> function objects 6236ac495dSmrg are needed because <code class="code">toupper</code> and <code class="code">tolower</code> 6336ac495dSmrg are overloaded names (declared in <code class="code"><cctype></code> and 6436ac495dSmrg <code class="code"><locale></code>) so the template-arguments for 6536ac495dSmrg <code class="code">transform<></code> cannot be deduced, as explained in 6636ac495dSmrg <a class="link" href="http://gcc.gnu.org/ml/libstdc++/2002-11/msg00180.html" target="_top">this 6736ac495dSmrg message</a>. 6836ac495dSmrg 6936ac495dSmrg At minimum, you can write short wrappers like 7036ac495dSmrg </p><pre class="programlisting"> 7136ac495dSmrg char toLower (char c) 7236ac495dSmrg { 73*a2dc1f3fSmrg // std::tolower(c) is undefined if c < 0 so cast to unsigned char. 74*a2dc1f3fSmrg return std::tolower((unsigned char)c); 7536ac495dSmrg } </pre><p>(Thanks to James Kanze for assistance and suggestions on all of this.) 7636ac495dSmrg </p><p>Another common operation is trimming off excess whitespace. Much 7736ac495dSmrg like transformations, this task is trivial with the use of string's 7836ac495dSmrg <code class="code">find</code> family. These examples are broken into multiple 7936ac495dSmrg statements for readability: 8036ac495dSmrg </p><pre class="programlisting"> 8136ac495dSmrg std::string str (" \t blah blah blah \n "); 8236ac495dSmrg 8336ac495dSmrg // trim leading whitespace 8436ac495dSmrg string::size_type notwhite = str.find_first_not_of(" \t\n"); 8536ac495dSmrg str.erase(0,notwhite); 8636ac495dSmrg 8736ac495dSmrg // trim trailing whitespace 8836ac495dSmrg notwhite = str.find_last_not_of(" \t\n"); 8936ac495dSmrg str.erase(notwhite+1); </pre><p>Obviously, the calls to <code class="code">find</code> could be inserted directly 9036ac495dSmrg into the calls to <code class="code">erase</code>, in case your compiler does not 9136ac495dSmrg optimize named temporaries out of existence. 9236ac495dSmrg </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="strings.string.case"></a>Case Sensitivity</h3></div></div></div><p> 9336ac495dSmrg </p><p>The well-known-and-if-it-isn't-well-known-it-ought-to-be 9436ac495dSmrg <a class="link" href="http://www.gotw.ca/gotw/" target="_top">Guru of the Week</a> 9536ac495dSmrg discussions held on Usenet covered this topic in January of 1998. 9636ac495dSmrg Briefly, the challenge was, <span class="quote">“<span class="quote">write a 'ci_string' class which 9736ac495dSmrg is identical to the standard 'string' class, but is 9836ac495dSmrg case-insensitive in the same way as the (common but nonstandard) 9936ac495dSmrg C function stricmp()</span>”</span>. 10036ac495dSmrg </p><pre class="programlisting"> 10136ac495dSmrg ci_string s( "AbCdE" ); 10236ac495dSmrg 10336ac495dSmrg // case insensitive 10436ac495dSmrg assert( s == "abcde" ); 10536ac495dSmrg assert( s == "ABCDE" ); 10636ac495dSmrg 10736ac495dSmrg // still case-preserving, of course 10836ac495dSmrg assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); 10936ac495dSmrg assert( strcmp( s.c_str(), "abcde" ) != 0 ); </pre><p>The solution is surprisingly easy. The original answer was 11036ac495dSmrg posted on Usenet, and a revised version appears in Herb Sutter's 11136ac495dSmrg book <span class="emphasis"><em>Exceptional C++</em></span> and on his website as <a class="link" href="http://www.gotw.ca/gotw/029.htm" target="_top">GotW 29</a>. 11236ac495dSmrg </p><p>See? Told you it was easy!</p><p> 11336ac495dSmrg <span class="emphasis"><em>Added June 2000:</em></span> The May 2000 issue of C++ 11436ac495dSmrg Report contains a fascinating <a class="link" href="http://lafstern.org/matt/col2_new.pdf" target="_top"> article</a> by 11536ac495dSmrg Matt Austern (yes, <span class="emphasis"><em>the</em></span> Matt Austern) on why 11636ac495dSmrg case-insensitive comparisons are not as easy as they seem, and 11736ac495dSmrg why creating a class is the <span class="emphasis"><em>wrong</em></span> way to go 11836ac495dSmrg about it in production code. (The GotW answer mentions one of 11936ac495dSmrg the principle difficulties; his article mentions more.) 12036ac495dSmrg </p><p>Basically, this is "easy" only if you ignore some things, 12136ac495dSmrg things which may be too important to your program to ignore. (I chose 12236ac495dSmrg to ignore them when originally writing this entry, and am surprised 12336ac495dSmrg that nobody ever called me on it...) The GotW question and answer 12436ac495dSmrg remain useful instructional tools, however. 12536ac495dSmrg </p><p><span class="emphasis"><em>Added September 2000:</em></span> James Kanze provided a link to a 12636ac495dSmrg <a class="link" href="http://www.unicode.org/reports/tr21/tr21-5.html" target="_top">Unicode 12736ac495dSmrg Technical Report discussing case handling</a>, which provides some 12836ac495dSmrg very good information. 12936ac495dSmrg </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="strings.string.character_types"></a>Arbitrary Character Types</h3></div></div></div><p> 13036ac495dSmrg </p><p>The <code class="code">std::basic_string</code> is tantalizingly general, in that 13136ac495dSmrg it is parameterized on the type of the characters which it holds. 13236ac495dSmrg In theory, you could whip up a Unicode character class and instantiate 13336ac495dSmrg <code class="code">std::basic_string<my_unicode_char></code>, or assuming 13436ac495dSmrg that integers are wider than characters on your platform, maybe just 13536ac495dSmrg declare variables of type <code class="code">std::basic_string<int></code>. 13636ac495dSmrg </p><p>That's the theory. Remember however that basic_string has additional 13736ac495dSmrg type parameters, which take default arguments based on the character 13836ac495dSmrg type (called <code class="code">CharT</code> here): 13936ac495dSmrg </p><pre class="programlisting"> 14036ac495dSmrg template <typename CharT, 14136ac495dSmrg typename Traits = char_traits<CharT>, 14236ac495dSmrg typename Alloc = allocator<CharT> > 14336ac495dSmrg class basic_string { .... };</pre><p>Now, <code class="code">allocator<CharT></code> will probably Do The Right 14436ac495dSmrg Thing by default, unless you need to implement your own allocator 14536ac495dSmrg for your characters. 14636ac495dSmrg </p><p>But <code class="code">char_traits</code> takes more work. The char_traits 14736ac495dSmrg template is <span class="emphasis"><em>declared</em></span> but not <span class="emphasis"><em>defined</em></span>. 14836ac495dSmrg That means there is only 14936ac495dSmrg </p><pre class="programlisting"> 15036ac495dSmrg template <typename CharT> 15136ac495dSmrg struct char_traits 15236ac495dSmrg { 15336ac495dSmrg static void foo (type1 x, type2 y); 15436ac495dSmrg ... 15536ac495dSmrg };</pre><p>and functions such as char_traits<CharT>::foo() are not 15636ac495dSmrg actually defined anywhere for the general case. The C++ standard 15736ac495dSmrg permits this, because writing such a definition to fit all possible 15836ac495dSmrg CharT's cannot be done. 15936ac495dSmrg </p><p>The C++ standard also requires that char_traits be specialized for 16036ac495dSmrg instantiations of <code class="code">char</code> and <code class="code">wchar_t</code>, and it 16136ac495dSmrg is these template specializations that permit entities like 16236ac495dSmrg <code class="code">basic_string<char,char_traits<char>></code> to work. 16336ac495dSmrg </p><p>If you want to use character types other than char and wchar_t, 16436ac495dSmrg such as <code class="code">unsigned char</code> and <code class="code">int</code>, you will 16536ac495dSmrg need suitable specializations for them. For a time, in earlier 16636ac495dSmrg versions of GCC, there was a mostly-correct implementation that 16736ac495dSmrg let programmers be lazy but it broke under many situations, so it 16836ac495dSmrg was removed. GCC 3.4 introduced a new implementation that mostly 16936ac495dSmrg works and can be specialized even for <code class="code">int</code> and other 17036ac495dSmrg built-in types. 17136ac495dSmrg </p><p>If you want to use your own special character class, then you have 17236ac495dSmrg <a class="link" href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00163.html" target="_top">a lot 17336ac495dSmrg of work to do</a>, especially if you with to use i18n features 17436ac495dSmrg (facets require traits information but don't have a traits argument). 17536ac495dSmrg </p><p>Another example of how to specialize char_traits was given <a class="link" href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00260.html" target="_top">on the 17636ac495dSmrg mailing list</a> and at a later date was put into the file <code class="code"> 17736ac495dSmrg include/ext/pod_char_traits.h</code>. We agree 17836ac495dSmrg that the way it's used with basic_string (scroll down to main()) 17936ac495dSmrg doesn't look nice, but that's because <a class="link" href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00236.html" target="_top">the 18036ac495dSmrg nice-looking first attempt</a> turned out to <a class="link" href="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00242.html" target="_top">not 18136ac495dSmrg be conforming C++</a>, due to the rule that CharT must be a POD. 18236ac495dSmrg (See how tricky this is?) 18336ac495dSmrg </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="strings.string.token"></a>Tokenizing</h3></div></div></div><p> 18436ac495dSmrg </p><p>The Standard C (and C++) function <code class="code">strtok()</code> leaves a lot to 18536ac495dSmrg be desired in terms of user-friendliness. It's unintuitive, it 18636ac495dSmrg destroys the character string on which it operates, and it requires 18736ac495dSmrg you to handle all the memory problems. But it does let the client 18836ac495dSmrg code decide what to use to break the string into pieces; it allows 18936ac495dSmrg you to choose the "whitespace," so to speak. 19036ac495dSmrg </p><p>A C++ implementation lets us keep the good things and fix those 19136ac495dSmrg annoyances. The implementation here is more intuitive (you only 19236ac495dSmrg call it once, not in a loop with varying argument), it does not 19336ac495dSmrg affect the original string at all, and all the memory allocation 19436ac495dSmrg is handled for you. 19536ac495dSmrg </p><p>It's called stringtok, and it's a template function. Sources are 19636ac495dSmrg as below, in a less-portable form than it could be, to keep this 19736ac495dSmrg example simple (for example, see the comments on what kind of 19836ac495dSmrg string it will accept). 19936ac495dSmrg </p><pre class="programlisting"> 20036ac495dSmrg#include <string> 20136ac495dSmrgtemplate <typename Container> 20236ac495dSmrgvoid 20336ac495dSmrgstringtok(Container &container, string const &in, 20436ac495dSmrg const char * const delimiters = " \t\n") 20536ac495dSmrg{ 20636ac495dSmrg const string::size_type len = in.length(); 20736ac495dSmrg string::size_type i = 0; 20836ac495dSmrg 20936ac495dSmrg while (i < len) 21036ac495dSmrg { 21136ac495dSmrg // Eat leading whitespace 21236ac495dSmrg i = in.find_first_not_of(delimiters, i); 21336ac495dSmrg if (i == string::npos) 21436ac495dSmrg return; // Nothing left but white space 21536ac495dSmrg 21636ac495dSmrg // Find the end of the token 21736ac495dSmrg string::size_type j = in.find_first_of(delimiters, i); 21836ac495dSmrg 21936ac495dSmrg // Push token 22036ac495dSmrg if (j == string::npos) 22136ac495dSmrg { 22236ac495dSmrg container.push_back(in.substr(i)); 22336ac495dSmrg return; 22436ac495dSmrg } 22536ac495dSmrg else 22636ac495dSmrg container.push_back(in.substr(i, j-i)); 22736ac495dSmrg 22836ac495dSmrg // Set up for next loop 22936ac495dSmrg i = j + 1; 23036ac495dSmrg } 23136ac495dSmrg} 23236ac495dSmrg</pre><p> 23336ac495dSmrg The author uses a more general (but less readable) form of it for 23436ac495dSmrg parsing command strings and the like. If you compiled and ran this 23536ac495dSmrg code using it: 23636ac495dSmrg </p><pre class="programlisting"> 23736ac495dSmrg std::list<string> ls; 23836ac495dSmrg stringtok (ls, " this \t is\t\n a test "); 23936ac495dSmrg for (std::list<string>const_iterator i = ls.begin(); 24036ac495dSmrg i != ls.end(); ++i) 24136ac495dSmrg { 24236ac495dSmrg std::cerr << ':' << (*i) << ":\n"; 24336ac495dSmrg } </pre><p>You would see this as output: 24436ac495dSmrg </p><pre class="programlisting"> 24536ac495dSmrg :this: 24636ac495dSmrg :is: 24736ac495dSmrg :a: 24836ac495dSmrg :test: </pre><p>with all the whitespace removed. The original <code class="code">s</code> is still 24936ac495dSmrg available for use, <code class="code">ls</code> will clean up after itself, and 25036ac495dSmrg <code class="code">ls.size()</code> will return how many tokens there were. 25136ac495dSmrg </p><p>As always, there is a price paid here, in that stringtok is not 25236ac495dSmrg as fast as strtok. The other benefits usually outweigh that, however. 25336ac495dSmrg </p><p><span class="emphasis"><em>Added February 2001:</em></span> Mark Wilden pointed out that the 25436ac495dSmrg standard <code class="code">std::getline()</code> function can be used with standard 25536ac495dSmrg <code class="code">istringstreams</code> to perform 25636ac495dSmrg tokenizing as well. Build an istringstream from the input text, 25736ac495dSmrg and then use std::getline with varying delimiters (the three-argument 25836ac495dSmrg signature) to extract tokens into a string. 25936ac495dSmrg </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="strings.string.shrink"></a>Shrink to Fit</h3></div></div></div><p> 26036ac495dSmrg </p><p>From GCC 3.4 calling <code class="code">s.reserve(res)</code> on a 26136ac495dSmrg <code class="code">string s</code> with <code class="code">res < s.capacity()</code> will 26236ac495dSmrg reduce the string's capacity to <code class="code">std::max(s.size(), res)</code>. 26336ac495dSmrg </p><p>This behaviour is suggested, but not required by the standard. Prior 26436ac495dSmrg to GCC 3.4 the following alternative can be used instead 26536ac495dSmrg </p><pre class="programlisting"> 26636ac495dSmrg std::string(str.data(), str.size()).swap(str); 26736ac495dSmrg </pre><p>This is similar to the idiom for reducing 26836ac495dSmrg a <code class="code">vector</code>'s memory usage 26936ac495dSmrg (see <a class="link" href="../faq.html#faq.size_equals_capacity" title="7.8.">this FAQ 27036ac495dSmrg entry</a>) but the regular copy constructor cannot be used 27136ac495dSmrg because libstdc++'s <code class="code">string</code> is Copy-On-Write in GCC 3. 27236ac495dSmrg </p><p>In <a class="link" href="status.html#status.iso.2011" title="C++ 2011">C++11</a> mode you can call 27336ac495dSmrg <code class="code">s.shrink_to_fit()</code> to achieve the same effect as 27436ac495dSmrg <code class="code">s.reserve(s.size())</code>. 27536ac495dSmrg </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="strings.string.Cstring"></a>CString (MFC)</h3></div></div></div><p> 27636ac495dSmrg </p><p>A common lament seen in various newsgroups deals with the Standard 27736ac495dSmrg string class as opposed to the Microsoft Foundation Class called 27836ac495dSmrg CString. Often programmers realize that a standard portable 27936ac495dSmrg answer is better than a proprietary nonportable one, but in porting 28036ac495dSmrg their application from a Win32 platform, they discover that they 28136ac495dSmrg are relying on special functions offered by the CString class. 28236ac495dSmrg </p><p>Things are not as bad as they seem. In 28336ac495dSmrg <a class="link" href="http://gcc.gnu.org/ml/gcc/1999-04n/msg00236.html" target="_top">this 28436ac495dSmrg message</a>, Joe Buck points out a few very important things: 28536ac495dSmrg </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>The Standard <code class="code">string</code> supports all the operations 28636ac495dSmrg that CString does, with three exceptions. 28736ac495dSmrg </p></li><li class="listitem"><p>Two of those exceptions (whitespace trimming and case 28836ac495dSmrg conversion) are trivial to implement. In fact, we do so 28936ac495dSmrg on this page. 29036ac495dSmrg </p></li><li class="listitem"><p>The third is <code class="code">CString::Format</code>, which allows formatting 29136ac495dSmrg in the style of <code class="code">sprintf</code>. This deserves some mention: 29236ac495dSmrg </p></li></ul></div><p> 29336ac495dSmrg The old libg++ library had a function called form(), which did much 29436ac495dSmrg the same thing. But for a Standard solution, you should use the 29536ac495dSmrg stringstream classes. These are the bridge between the iostream 29636ac495dSmrg hierarchy and the string class, and they operate with regular 29736ac495dSmrg streams seamlessly because they inherit from the iostream 29836ac495dSmrg hierarchy. An quick example: 29936ac495dSmrg </p><pre class="programlisting"> 30036ac495dSmrg #include <iostream> 30136ac495dSmrg #include <string> 30236ac495dSmrg #include <sstream> 30336ac495dSmrg 30436ac495dSmrg string f (string& incoming) // incoming is "foo N" 30536ac495dSmrg { 30636ac495dSmrg istringstream incoming_stream(incoming); 30736ac495dSmrg string the_word; 30836ac495dSmrg int the_number; 30936ac495dSmrg 31036ac495dSmrg incoming_stream >> the_word // extract "foo" 31136ac495dSmrg >> the_number; // extract N 31236ac495dSmrg 31336ac495dSmrg ostringstream output_stream; 31436ac495dSmrg output_stream << "The word was " << the_word 31536ac495dSmrg << " and 3*N was " << (3*the_number); 31636ac495dSmrg 31736ac495dSmrg return output_stream.str(); 31836ac495dSmrg } </pre><p>A serious problem with CString is a design bug in its memory 31936ac495dSmrg allocation. Specifically, quoting from that same message: 32036ac495dSmrg </p><pre class="programlisting"> 32136ac495dSmrg CString suffers from a common programming error that results in 32236ac495dSmrg poor performance. Consider the following code: 32336ac495dSmrg 32436ac495dSmrg CString n_copies_of (const CString& foo, unsigned n) 32536ac495dSmrg { 32636ac495dSmrg CString tmp; 32736ac495dSmrg for (unsigned i = 0; i < n; i++) 32836ac495dSmrg tmp += foo; 32936ac495dSmrg return tmp; 33036ac495dSmrg } 33136ac495dSmrg 33236ac495dSmrg This function is O(n^2), not O(n). The reason is that each += 33336ac495dSmrg causes a reallocation and copy of the existing string. Microsoft 33436ac495dSmrg applications are full of this kind of thing (quadratic performance 33536ac495dSmrg on tasks that can be done in linear time) -- on the other hand, 33636ac495dSmrg we should be thankful, as it's created such a big market for high-end 33736ac495dSmrg ix86 hardware. :-) 33836ac495dSmrg 33936ac495dSmrg If you replace CString with string in the above function, the 34036ac495dSmrg performance is O(n). 34136ac495dSmrg </pre><p>Joe Buck also pointed out some other things to keep in mind when 34236ac495dSmrg comparing CString and the Standard string class: 34336ac495dSmrg </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>CString permits access to its internal representation; coders 34436ac495dSmrg who exploited that may have problems moving to <code class="code">string</code>. 34536ac495dSmrg </p></li><li class="listitem"><p>Microsoft ships the source to CString (in the files 34636ac495dSmrg MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation 34736ac495dSmrg bug and rebuild your MFC libraries. 34836ac495dSmrg <span class="emphasis"><em><span class="emphasis"><em>Note:</em></span> It looks like the CString shipped 34936ac495dSmrg with VC++6.0 has fixed this, although it may in fact have been 35036ac495dSmrg one of the VC++ SPs that did it.</em></span> 35136ac495dSmrg </p></li><li class="listitem"><p><code class="code">string</code> operations like this have O(n) complexity 35236ac495dSmrg <span class="emphasis"><em>if the implementors do it correctly</em></span>. The libstdc++ 35336ac495dSmrg implementors did it correctly. Other vendors might not. 35436ac495dSmrg </p></li><li class="listitem"><p>While parts of the SGI STL are used in libstdc++, their 35536ac495dSmrg string class is not. The SGI <code class="code">string</code> is essentially 35636ac495dSmrg <code class="code">vector<char></code> and does not do any reference 35736ac495dSmrg counting like libstdc++'s does. (It is O(n), though.) 35836ac495dSmrg So if you're thinking about SGI's string or rope classes, 35936ac495dSmrg you're now looking at four possibilities: CString, the 36036ac495dSmrg libstdc++ string, the SGI string, and the SGI rope, and this 36136ac495dSmrg is all before any allocator or traits customizations! (More 36236ac495dSmrg choices than you can shake a stick at -- want fries with that?) 36336ac495dSmrg </p></li></ul></div></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="traits.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="std_contents.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="localization.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Traits </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 8. 36436ac495dSmrg Localization 36536ac495dSmrg 36636ac495dSmrg</td></tr></table></div></body></html>