xref: /openbsd-src/gnu/gcc/libstdc++-v3/docs/html/22_locale/howto.html (revision 404b540a9034ac75a6199ad1a32d1bbc7a0d4210)
1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?>
2*404b540aSrobert<!DOCTYPE html
3*404b540aSrobert          PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4*404b540aSrobert          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5*404b540aSrobert
6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7*404b540aSrobert<head>
8*404b540aSrobert   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
9*404b540aSrobert   <meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)" />
10*404b540aSrobert   <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" />
11*404b540aSrobert   <meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 22." />
12*404b540aSrobert   <meta name="GENERATOR" content="vi and eight fingers" />
13*404b540aSrobert   <title>libstdc++-v3 HOWTO:  Chapter 22: Localization</title>
14*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
15*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html"
16*404b540aSrobert  title="GNU C++ Standard Library" />
17*404b540aSrobert<link rel="Prev" href="../21_strings/howto.html" type="text/html"
18*404b540aSrobert  title="Strings" />
19*404b540aSrobert<link rel="Next" href="../23_containers/howto.html" type="text/html"
20*404b540aSrobert  title="Containers" />
21*404b540aSrobert<link rel="Bookmark" href="locale.html" type="text/html" title="class locale" />
22*404b540aSrobert<link rel="Bookmark" href="codecvt.html" type="text/html" title="class codecvt" />
23*404b540aSrobert<link rel="Bookmark" href="ctype.html" type="text/html" title="class ctype" />
24*404b540aSrobert<link rel="Bookmark" href="messages.html" type="text/html" title="class messages" />
25*404b540aSrobert<link rel="Bookmark" href="http://www.research.att.com/~bs/3rd_loc0.html" type="text/html" title="Bjarne Stroustrup on Locales" />
26*404b540aSrobert<link rel="Bookmark" href="http://www.cantrip.org/locale.html" type="text/html" title="Nathan Myers on Locales" />
27*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" />
28*404b540aSrobert<link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." />
29*404b540aSrobert</head>
30*404b540aSrobert<body>
31*404b540aSrobert
32*404b540aSrobert<h1 class="centered"><a name="top">Chapter 22:  Localization</a></h1>
33*404b540aSrobert
34*404b540aSrobert<p>Chapter 22 deals with the C++ localization facilities.
35*404b540aSrobert</p>
36*404b540aSrobert<!-- I wanted to write that sentence in something requiring an exotic font,
37*404b540aSrobert     like Cyrllic or Kanji.  Probably more work than such cuteness is worth,
38*404b540aSrobert     but I still think it'd be funny.
39*404b540aSrobert -->
40*404b540aSrobert
41*404b540aSrobert
42*404b540aSrobert<!-- ####################################################### -->
43*404b540aSrobert<hr />
44*404b540aSrobert<h1>Contents</h1>
45*404b540aSrobert<ul>
46*404b540aSrobert   <li><a href="#1">class locale</a></li>
47*404b540aSrobert   <li><a href="#2">class codecvt</a></li>
48*404b540aSrobert   <li><a href="#3">class ctype</a></li>
49*404b540aSrobert   <li><a href="#4">class messages</a></li>
50*404b540aSrobert   <li><a href="#5">Bjarne Stroustrup on Locales</a></li>
51*404b540aSrobert   <li><a href="#6">Nathan Myers on Locales</a></li>
52*404b540aSrobert   <li><a href="#7">Correct Transformations</a></li>
53*404b540aSrobert</ul>
54*404b540aSrobert
55*404b540aSrobert<!-- ####################################################### -->
56*404b540aSrobert
57*404b540aSrobert<hr />
58*404b540aSrobert<h2><a name="1">class locale</a></h2>
59*404b540aSrobert   <p>Notes made during the implementation of locales can be found
60*404b540aSrobert      <a href="locale.html">here</a>.
61*404b540aSrobert   </p>
62*404b540aSrobert
63*404b540aSrobert<hr />
64*404b540aSrobert<h2><a name="2">class codecvt</a></h2>
65*404b540aSrobert   <p>Notes made during the implementation of codecvt can be found
66*404b540aSrobert      <a href="codecvt.html">here</a>.
67*404b540aSrobert   </p>
68*404b540aSrobert
69*404b540aSrobert   <p>The following is the abstract from the implementation notes:
70*404b540aSrobert   </p>
71*404b540aSrobert   <blockquote>
72*404b540aSrobert   The standard class codecvt attempts to address conversions between
73*404b540aSrobert   different character encoding schemes. In particular, the standard
74*404b540aSrobert   attempts to detail conversions between the implementation-defined
75*404b540aSrobert   wide characters (hereafter referred to as wchar_t) and the standard
76*404b540aSrobert   type char that is so beloved in classic &quot;C&quot; (which can
77*404b540aSrobert   now be referred to as narrow characters.)  This document attempts
78*404b540aSrobert   to describe how the GNU libstdc++-v3 implementation deals with the
79*404b540aSrobert   conversion between wide and narrow characters, and also presents a
80*404b540aSrobert   framework for dealing with the huge number of other encodings that
81*404b540aSrobert   iconv can convert, including Unicode and UTF8. Design issues and
82*404b540aSrobert   requirements are addressed, and examples of correct usage for both
83*404b540aSrobert   the required specializations for wide and narrow characters and the
84*404b540aSrobert   implementation-provided extended functionality are given.
85*404b540aSrobert   </blockquote>
86*404b540aSrobert
87*404b540aSrobert<hr />
88*404b540aSrobert<h2><a name="3">class ctype</a></h2>
89*404b540aSrobert   <p>Notes made during the implementation of ctype can be found
90*404b540aSrobert      <a href="ctype.html">here</a>.
91*404b540aSrobert   </p>
92*404b540aSrobert
93*404b540aSrobert<hr />
94*404b540aSrobert<h2><a name="4">class messages</a></h2>
95*404b540aSrobert   <p>Notes made during the implementation of messages can be found
96*404b540aSrobert      <a href="messages.html">here</a>.
97*404b540aSrobert   </p>
98*404b540aSrobert
99*404b540aSrobert<hr />
100*404b540aSrobert<h2><a name="5">Bjarne Stroustrup on Locales</a></h2>
101*404b540aSrobert   <p>Dr. Bjarne Stroustrup has released a
102*404b540aSrobert      <a href="http://www.research.att.com/~bs/3rd_loc0.html">pointer</a>
103*404b540aSrobert      to Appendix D of his book,
104*404b540aSrobert      <a href="http://www.research.att.com/~bs/3rd.html">The C++
105*404b540aSrobert      Programming Language (3rd Edition)</a>.  It is a detailed
106*404b540aSrobert      description of locales and how to use them.
107*404b540aSrobert   </p>
108*404b540aSrobert   <p>He also writes:
109*404b540aSrobert   </p>
110*404b540aSrobert      <blockquote><em>
111*404b540aSrobert      Please note that I still consider this detailed description of
112*404b540aSrobert      locales beyond the needs of most C++ programmers.  It is written
113*404b540aSrobert      with experienced programmers in mind and novices will do best to
114*404b540aSrobert      avoid it.
115*404b540aSrobert      </em></blockquote>
116*404b540aSrobert
117*404b540aSrobert<hr />
118*404b540aSrobert<h2><a name="6">Nathan Myers on Locales</a></h2>
119*404b540aSrobert   <p>An article entitled &quot;The Standard C++ Locale&quot; was
120*404b540aSrobert      published in Dr. Dobb's Journal and can be found
121*404b540aSrobert      <a href="http://www.cantrip.org/locale.html">here</a>.
122*404b540aSrobert   </p>
123*404b540aSrobert
124*404b540aSrobert<hr />
125*404b540aSrobert<h2><a name="7">Correct Transformations</a></h2>
126*404b540aSrobert   <!-- Jumping directly to here from chapter 21. -->
127*404b540aSrobert   <p>A very common question on newsgroups and mailing lists is, &quot;How
128*404b540aSrobert      do I do &lt;foo&gt; to a character string?&quot; where &lt;foo&gt; is
129*404b540aSrobert      a task such as changing all the letters to uppercase, to lowercase,
130*404b540aSrobert      testing for digits, etc.  A skilled and conscientious programmer
131*404b540aSrobert      will follow the question with another, &quot;And how do I make the
132*404b540aSrobert      code portable?&quot;
133*404b540aSrobert   </p>
134*404b540aSrobert   <p>(Poor innocent programmer, you have no idea the depths of trouble
135*404b540aSrobert      you are getting yourself into.  'Twould be best for your sanity if
136*404b540aSrobert      you dropped the whole idea and took up basket weaving instead.  No?
137*404b540aSrobert      Fine, you asked for it...)
138*404b540aSrobert   </p>
139*404b540aSrobert   <p>The task of changing the case of a letter or classifying a character
140*404b540aSrobert      as numeric, graphical, etc., all depends on the cultural context of the
141*404b540aSrobert      program at runtime.  So, first you must take the portability question
142*404b540aSrobert      into account.  Once you have localized the program to a particular
143*404b540aSrobert      natural language, only then can you perform the specific task.
144*404b540aSrobert      Unfortunately, specializing a function for a human language is not
145*404b540aSrobert      as simple as declaring
146*404b540aSrobert      <code> extern &quot;Danish&quot; int tolower (int); </code>.
147*404b540aSrobert   </p>
148*404b540aSrobert   <p>The C++ code to do all this proceeds in the same way.  First, a locale
149*404b540aSrobert      is created.  Then member functions of that locale are called to
150*404b540aSrobert      perform minor tasks.  Continuing the example from Chapter 21, we wish
151*404b540aSrobert      to use the following convenience functions:
152*404b540aSrobert   </p>
153*404b540aSrobert   <pre>
154*404b540aSrobert   namespace std {
155*404b540aSrobert     template &lt;class charT&gt;
156*404b540aSrobert       charT
157*404b540aSrobert       toupper (charT c, const locale&amp; loc) const;
158*404b540aSrobert     template &lt;class charT&gt;
159*404b540aSrobert       charT
160*404b540aSrobert       tolower (charT c, const locale&amp; loc) const;
161*404b540aSrobert   }</pre>
162*404b540aSrobert   <p>
163*404b540aSrobert      This function extracts the appropriate &quot;facet&quot; from the
164*404b540aSrobert      locale <em>loc</em> and calls the appropriate member function of that
165*404b540aSrobert      facet, passing <em>c</em> as its argument.  The resulting character
166*404b540aSrobert      is returned.
167*404b540aSrobert   </p>
168*404b540aSrobert   <p>For the C/POSIX locale, the results are the same as calling the
169*404b540aSrobert      classic C <code>toupper/tolower</code> function that was used in previous
170*404b540aSrobert      examples.  For other locales, the code should Do The Right Thing.
171*404b540aSrobert   </p>
172*404b540aSrobert   <p>Of course, these functions take a second argument, and the
173*404b540aSrobert      transformation algorithm's operator argument can only take a single
174*404b540aSrobert      parameter.  So we write simple wrapper structs to handle that.
175*404b540aSrobert   </p>
176*404b540aSrobert   <p>The next-to-final version of the code started in Chapter 21 looks like:
177*404b540aSrobert   </p>
178*404b540aSrobert      <pre>
179*404b540aSrobert   #include &lt;iterator&gt;    // for back_inserter
180*404b540aSrobert   #include &lt;locale&gt;
181*404b540aSrobert   #include &lt;string&gt;
182*404b540aSrobert   #include &lt;algorithm&gt;
183*404b540aSrobert   #include &lt;cctype&gt;      // old &lt;ctype.h&gt;
184*404b540aSrobert
185*404b540aSrobert   struct ToUpper
186*404b540aSrobert   {
187*404b540aSrobert       ToUpper(std::locale const&amp; l) : loc(l) {;}
188*404b540aSrobert       char operator() (char c) const  { return std::toupper(c,loc); }
189*404b540aSrobert   private:
190*404b540aSrobert       std::locale const&amp; loc;
191*404b540aSrobert   };
192*404b540aSrobert
193*404b540aSrobert   struct ToLower
194*404b540aSrobert   {
195*404b540aSrobert       ToLower(std::locale const&amp; l) : loc(l) {;}
196*404b540aSrobert       char operator() (char c) const  { return std::tolower(c,loc); }
197*404b540aSrobert   private:
198*404b540aSrobert       std::locale const&amp; loc;
199*404b540aSrobert   };
200*404b540aSrobert
201*404b540aSrobert   int main ()
202*404b540aSrobert   {
203*404b540aSrobert      std::string  s("Some Kind Of Initial Input Goes Here");
204*404b540aSrobert      ToUpper      up(std::locale::classic());
205*404b540aSrobert      ToLower      down(std::locale::classic());
206*404b540aSrobert
207*404b540aSrobert      // Change everything into upper case.
208*404b540aSrobert      std::transform(s.begin(), s.end(), s.begin(), up);
209*404b540aSrobert
210*404b540aSrobert      // Change everything into lower case.
211*404b540aSrobert      std::transform(s.begin(), s.end(), s.begin(), down);
212*404b540aSrobert
213*404b540aSrobert      // Change everything back into upper case, but store the
214*404b540aSrobert      // result in a different string.
215*404b540aSrobert      std::string  capital_s;
216*404b540aSrobert      std::transform(s.begin(), s.end(), std::back_inserter(capital_s), up);
217*404b540aSrobert   }</pre>
218*404b540aSrobert   <p>The <code>ToUpper</code> and <code>ToLower</code> structs can be
219*404b540aSrobert      generalized for other character types by making <code>operator()</code>
220*404b540aSrobert      a member function template.
221*404b540aSrobert   </p>
222*404b540aSrobert   <p>The final version of the code uses <code>bind2nd</code> to eliminate
223*404b540aSrobert      the wrapper structs, but the resulting code is tricky.  I have not
224*404b540aSrobert      shown it here because no compilers currently available to me will
225*404b540aSrobert      handle it.
226*404b540aSrobert   </p>
227*404b540aSrobert
228*404b540aSrobert
229*404b540aSrobert<!-- ####################################################### -->
230*404b540aSrobert
231*404b540aSrobert<hr />
232*404b540aSrobert<p class="fineprint"><em>
233*404b540aSrobertSee <a href="../17_intro/license.html">license.html</a> for copying conditions.
234*404b540aSrobertComments and suggestions are welcome, and may be sent to
235*404b540aSrobert<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
236*404b540aSrobert</em></p>
237*404b540aSrobert
238*404b540aSrobert
239*404b540aSrobert</body>
240*404b540aSrobert</html>
241