1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?> 2*404b540aSrobert<!DOCTYPE html 3*404b540aSrobert PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 4*404b540aSrobert "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 5*404b540aSrobert 6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 7*404b540aSrobert<head> 8*404b540aSrobert <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 9*404b540aSrobert <meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)" /> 10*404b540aSrobert <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" /> 11*404b540aSrobert <meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 22." /> 12*404b540aSrobert <meta name="GENERATOR" content="vi and eight fingers" /> 13*404b540aSrobert <title>libstdc++-v3 HOWTO: Chapter 22: Localization</title> 14*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" /> 15*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html" 16*404b540aSrobert title="GNU C++ Standard Library" /> 17*404b540aSrobert<link rel="Prev" href="../21_strings/howto.html" type="text/html" 18*404b540aSrobert title="Strings" /> 19*404b540aSrobert<link rel="Next" href="../23_containers/howto.html" type="text/html" 20*404b540aSrobert title="Containers" /> 21*404b540aSrobert<link rel="Bookmark" href="locale.html" type="text/html" title="class locale" /> 22*404b540aSrobert<link rel="Bookmark" href="codecvt.html" type="text/html" title="class codecvt" /> 23*404b540aSrobert<link rel="Bookmark" href="ctype.html" type="text/html" title="class ctype" /> 24*404b540aSrobert<link rel="Bookmark" href="messages.html" type="text/html" title="class messages" /> 25*404b540aSrobert<link rel="Bookmark" href="http://www.research.att.com/~bs/3rd_loc0.html" type="text/html" title="Bjarne Stroustrup on Locales" /> 26*404b540aSrobert<link rel="Bookmark" href="http://www.cantrip.org/locale.html" type="text/html" title="Nathan Myers on Locales" /> 27*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" /> 28*404b540aSrobert<link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." /> 29*404b540aSrobert</head> 30*404b540aSrobert<body> 31*404b540aSrobert 32*404b540aSrobert<h1 class="centered"><a name="top">Chapter 22: Localization</a></h1> 33*404b540aSrobert 34*404b540aSrobert<p>Chapter 22 deals with the C++ localization facilities. 35*404b540aSrobert</p> 36*404b540aSrobert<!-- I wanted to write that sentence in something requiring an exotic font, 37*404b540aSrobert like Cyrllic or Kanji. Probably more work than such cuteness is worth, 38*404b540aSrobert but I still think it'd be funny. 39*404b540aSrobert --> 40*404b540aSrobert 41*404b540aSrobert 42*404b540aSrobert<!-- ####################################################### --> 43*404b540aSrobert<hr /> 44*404b540aSrobert<h1>Contents</h1> 45*404b540aSrobert<ul> 46*404b540aSrobert <li><a href="#1">class locale</a></li> 47*404b540aSrobert <li><a href="#2">class codecvt</a></li> 48*404b540aSrobert <li><a href="#3">class ctype</a></li> 49*404b540aSrobert <li><a href="#4">class messages</a></li> 50*404b540aSrobert <li><a href="#5">Bjarne Stroustrup on Locales</a></li> 51*404b540aSrobert <li><a href="#6">Nathan Myers on Locales</a></li> 52*404b540aSrobert <li><a href="#7">Correct Transformations</a></li> 53*404b540aSrobert</ul> 54*404b540aSrobert 55*404b540aSrobert<!-- ####################################################### --> 56*404b540aSrobert 57*404b540aSrobert<hr /> 58*404b540aSrobert<h2><a name="1">class locale</a></h2> 59*404b540aSrobert <p>Notes made during the implementation of locales can be found 60*404b540aSrobert <a href="locale.html">here</a>. 61*404b540aSrobert </p> 62*404b540aSrobert 63*404b540aSrobert<hr /> 64*404b540aSrobert<h2><a name="2">class codecvt</a></h2> 65*404b540aSrobert <p>Notes made during the implementation of codecvt can be found 66*404b540aSrobert <a href="codecvt.html">here</a>. 67*404b540aSrobert </p> 68*404b540aSrobert 69*404b540aSrobert <p>The following is the abstract from the implementation notes: 70*404b540aSrobert </p> 71*404b540aSrobert <blockquote> 72*404b540aSrobert The standard class codecvt attempts to address conversions between 73*404b540aSrobert different character encoding schemes. In particular, the standard 74*404b540aSrobert attempts to detail conversions between the implementation-defined 75*404b540aSrobert wide characters (hereafter referred to as wchar_t) and the standard 76*404b540aSrobert type char that is so beloved in classic "C" (which can 77*404b540aSrobert now be referred to as narrow characters.) This document attempts 78*404b540aSrobert to describe how the GNU libstdc++-v3 implementation deals with the 79*404b540aSrobert conversion between wide and narrow characters, and also presents a 80*404b540aSrobert framework for dealing with the huge number of other encodings that 81*404b540aSrobert iconv can convert, including Unicode and UTF8. Design issues and 82*404b540aSrobert requirements are addressed, and examples of correct usage for both 83*404b540aSrobert the required specializations for wide and narrow characters and the 84*404b540aSrobert implementation-provided extended functionality are given. 85*404b540aSrobert </blockquote> 86*404b540aSrobert 87*404b540aSrobert<hr /> 88*404b540aSrobert<h2><a name="3">class ctype</a></h2> 89*404b540aSrobert <p>Notes made during the implementation of ctype can be found 90*404b540aSrobert <a href="ctype.html">here</a>. 91*404b540aSrobert </p> 92*404b540aSrobert 93*404b540aSrobert<hr /> 94*404b540aSrobert<h2><a name="4">class messages</a></h2> 95*404b540aSrobert <p>Notes made during the implementation of messages can be found 96*404b540aSrobert <a href="messages.html">here</a>. 97*404b540aSrobert </p> 98*404b540aSrobert 99*404b540aSrobert<hr /> 100*404b540aSrobert<h2><a name="5">Bjarne Stroustrup on Locales</a></h2> 101*404b540aSrobert <p>Dr. Bjarne Stroustrup has released a 102*404b540aSrobert <a href="http://www.research.att.com/~bs/3rd_loc0.html">pointer</a> 103*404b540aSrobert to Appendix D of his book, 104*404b540aSrobert <a href="http://www.research.att.com/~bs/3rd.html">The C++ 105*404b540aSrobert Programming Language (3rd Edition)</a>. It is a detailed 106*404b540aSrobert description of locales and how to use them. 107*404b540aSrobert </p> 108*404b540aSrobert <p>He also writes: 109*404b540aSrobert </p> 110*404b540aSrobert <blockquote><em> 111*404b540aSrobert Please note that I still consider this detailed description of 112*404b540aSrobert locales beyond the needs of most C++ programmers. It is written 113*404b540aSrobert with experienced programmers in mind and novices will do best to 114*404b540aSrobert avoid it. 115*404b540aSrobert </em></blockquote> 116*404b540aSrobert 117*404b540aSrobert<hr /> 118*404b540aSrobert<h2><a name="6">Nathan Myers on Locales</a></h2> 119*404b540aSrobert <p>An article entitled "The Standard C++ Locale" was 120*404b540aSrobert published in Dr. Dobb's Journal and can be found 121*404b540aSrobert <a href="http://www.cantrip.org/locale.html">here</a>. 122*404b540aSrobert </p> 123*404b540aSrobert 124*404b540aSrobert<hr /> 125*404b540aSrobert<h2><a name="7">Correct Transformations</a></h2> 126*404b540aSrobert <!-- Jumping directly to here from chapter 21. --> 127*404b540aSrobert <p>A very common question on newsgroups and mailing lists is, "How 128*404b540aSrobert do I do <foo> to a character string?" where <foo> is 129*404b540aSrobert a task such as changing all the letters to uppercase, to lowercase, 130*404b540aSrobert testing for digits, etc. A skilled and conscientious programmer 131*404b540aSrobert will follow the question with another, "And how do I make the 132*404b540aSrobert code portable?" 133*404b540aSrobert </p> 134*404b540aSrobert <p>(Poor innocent programmer, you have no idea the depths of trouble 135*404b540aSrobert you are getting yourself into. 'Twould be best for your sanity if 136*404b540aSrobert you dropped the whole idea and took up basket weaving instead. No? 137*404b540aSrobert Fine, you asked for it...) 138*404b540aSrobert </p> 139*404b540aSrobert <p>The task of changing the case of a letter or classifying a character 140*404b540aSrobert as numeric, graphical, etc., all depends on the cultural context of the 141*404b540aSrobert program at runtime. So, first you must take the portability question 142*404b540aSrobert into account. Once you have localized the program to a particular 143*404b540aSrobert natural language, only then can you perform the specific task. 144*404b540aSrobert Unfortunately, specializing a function for a human language is not 145*404b540aSrobert as simple as declaring 146*404b540aSrobert <code> extern "Danish" int tolower (int); </code>. 147*404b540aSrobert </p> 148*404b540aSrobert <p>The C++ code to do all this proceeds in the same way. First, a locale 149*404b540aSrobert is created. Then member functions of that locale are called to 150*404b540aSrobert perform minor tasks. Continuing the example from Chapter 21, we wish 151*404b540aSrobert to use the following convenience functions: 152*404b540aSrobert </p> 153*404b540aSrobert <pre> 154*404b540aSrobert namespace std { 155*404b540aSrobert template <class charT> 156*404b540aSrobert charT 157*404b540aSrobert toupper (charT c, const locale& loc) const; 158*404b540aSrobert template <class charT> 159*404b540aSrobert charT 160*404b540aSrobert tolower (charT c, const locale& loc) const; 161*404b540aSrobert }</pre> 162*404b540aSrobert <p> 163*404b540aSrobert This function extracts the appropriate "facet" from the 164*404b540aSrobert locale <em>loc</em> and calls the appropriate member function of that 165*404b540aSrobert facet, passing <em>c</em> as its argument. The resulting character 166*404b540aSrobert is returned. 167*404b540aSrobert </p> 168*404b540aSrobert <p>For the C/POSIX locale, the results are the same as calling the 169*404b540aSrobert classic C <code>toupper/tolower</code> function that was used in previous 170*404b540aSrobert examples. For other locales, the code should Do The Right Thing. 171*404b540aSrobert </p> 172*404b540aSrobert <p>Of course, these functions take a second argument, and the 173*404b540aSrobert transformation algorithm's operator argument can only take a single 174*404b540aSrobert parameter. So we write simple wrapper structs to handle that. 175*404b540aSrobert </p> 176*404b540aSrobert <p>The next-to-final version of the code started in Chapter 21 looks like: 177*404b540aSrobert </p> 178*404b540aSrobert <pre> 179*404b540aSrobert #include <iterator> // for back_inserter 180*404b540aSrobert #include <locale> 181*404b540aSrobert #include <string> 182*404b540aSrobert #include <algorithm> 183*404b540aSrobert #include <cctype> // old <ctype.h> 184*404b540aSrobert 185*404b540aSrobert struct ToUpper 186*404b540aSrobert { 187*404b540aSrobert ToUpper(std::locale const& l) : loc(l) {;} 188*404b540aSrobert char operator() (char c) const { return std::toupper(c,loc); } 189*404b540aSrobert private: 190*404b540aSrobert std::locale const& loc; 191*404b540aSrobert }; 192*404b540aSrobert 193*404b540aSrobert struct ToLower 194*404b540aSrobert { 195*404b540aSrobert ToLower(std::locale const& l) : loc(l) {;} 196*404b540aSrobert char operator() (char c) const { return std::tolower(c,loc); } 197*404b540aSrobert private: 198*404b540aSrobert std::locale const& loc; 199*404b540aSrobert }; 200*404b540aSrobert 201*404b540aSrobert int main () 202*404b540aSrobert { 203*404b540aSrobert std::string s("Some Kind Of Initial Input Goes Here"); 204*404b540aSrobert ToUpper up(std::locale::classic()); 205*404b540aSrobert ToLower down(std::locale::classic()); 206*404b540aSrobert 207*404b540aSrobert // Change everything into upper case. 208*404b540aSrobert std::transform(s.begin(), s.end(), s.begin(), up); 209*404b540aSrobert 210*404b540aSrobert // Change everything into lower case. 211*404b540aSrobert std::transform(s.begin(), s.end(), s.begin(), down); 212*404b540aSrobert 213*404b540aSrobert // Change everything back into upper case, but store the 214*404b540aSrobert // result in a different string. 215*404b540aSrobert std::string capital_s; 216*404b540aSrobert std::transform(s.begin(), s.end(), std::back_inserter(capital_s), up); 217*404b540aSrobert }</pre> 218*404b540aSrobert <p>The <code>ToUpper</code> and <code>ToLower</code> structs can be 219*404b540aSrobert generalized for other character types by making <code>operator()</code> 220*404b540aSrobert a member function template. 221*404b540aSrobert </p> 222*404b540aSrobert <p>The final version of the code uses <code>bind2nd</code> to eliminate 223*404b540aSrobert the wrapper structs, but the resulting code is tricky. I have not 224*404b540aSrobert shown it here because no compilers currently available to me will 225*404b540aSrobert handle it. 226*404b540aSrobert </p> 227*404b540aSrobert 228*404b540aSrobert 229*404b540aSrobert<!-- ####################################################### --> 230*404b540aSrobert 231*404b540aSrobert<hr /> 232*404b540aSrobert<p class="fineprint"><em> 233*404b540aSrobertSee <a href="../17_intro/license.html">license.html</a> for copying conditions. 234*404b540aSrobertComments and suggestions are welcome, and may be sent to 235*404b540aSrobert<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>. 236*404b540aSrobert</em></p> 237*404b540aSrobert 238*404b540aSrobert 239*404b540aSrobert</body> 240*404b540aSrobert</html> 241