1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?> 2*404b540aSrobert<!DOCTYPE html 3*404b540aSrobert PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 4*404b540aSrobert "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 5*404b540aSrobert 6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 7*404b540aSrobert<head> 8*404b540aSrobert <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 9*404b540aSrobert <meta name="AUTHOR" content="bkoz@redhat.com (Benjamin Kosnik)" /> 10*404b540aSrobert <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" /> 11*404b540aSrobert <meta name="DESCRIPTION" content="Notes on the messages implementation." /> 12*404b540aSrobert <title>Notes on the messages implementation.</title> 13*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" /> 14*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html" 15*404b540aSrobert title="GNU C++ Standard Library" /> 16*404b540aSrobert<link rel="Bookmark" href="howto.html" type="text/html" title="Localization" /> 17*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" /> 18*404b540aSrobert<link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." /> 19*404b540aSrobert</head> 20*404b540aSrobert<body> 21*404b540aSrobert<h1> 22*404b540aSrobertNotes on the messages implementation. 23*404b540aSrobert</h1> 24*404b540aSrobert<em> 25*404b540aSrobertprepared by Benjamin Kosnik (bkoz@redhat.com) on August 8, 2001 26*404b540aSrobert</em> 27*404b540aSrobert 28*404b540aSrobert<h2> 29*404b540aSrobert1. Abstract 30*404b540aSrobert</h2> 31*404b540aSrobert<p> 32*404b540aSrobertThe std::messages facet implements message retrieval functionality 33*404b540aSrobertequivalent to Java's java.text.MessageFormat .using either GNU gettext 34*404b540aSrobertor IEEE 1003.1-200 functions. 35*404b540aSrobert</p> 36*404b540aSrobert 37*404b540aSrobert<h2> 38*404b540aSrobert2. What the standard says 39*404b540aSrobert</h2> 40*404b540aSrobertThe std::messages facet is probably the most vaguely defined facet in 41*404b540aSrobertthe standard library. It's assumed that this facility was built into 42*404b540aSrobertthe standard library in order to convert string literals from one 43*404b540aSrobertlocale to the other. For instance, converting the "C" locale's 44*404b540aSrobert<code>const char* c = "please"</code> to a German-localized <code>"bitte"</code> 45*404b540aSrobertduring program execution. 46*404b540aSrobert 47*404b540aSrobert<blockquote> 48*404b540aSrobert22.2.7.1 - Template class messages [lib.locale.messages] 49*404b540aSrobert</blockquote> 50*404b540aSrobert 51*404b540aSrobertThis class has three public member functions, which directly 52*404b540aSrobertcorrespond to three protected virtual member functions. 53*404b540aSrobert 54*404b540aSrobertThe public member functions are: 55*404b540aSrobert 56*404b540aSrobert<p> 57*404b540aSrobert<code>catalog open(const string&, const locale&) const</code> 58*404b540aSrobert</p> 59*404b540aSrobert 60*404b540aSrobert<p> 61*404b540aSrobert<code>string_type get(catalog, int, int, const string_type&) const</code> 62*404b540aSrobert</p> 63*404b540aSrobert 64*404b540aSrobert<p> 65*404b540aSrobert<code>void close(catalog) const</code> 66*404b540aSrobert</p> 67*404b540aSrobert 68*404b540aSrobert<p> 69*404b540aSrobertWhile the virtual functions are: 70*404b540aSrobert</p> 71*404b540aSrobert 72*404b540aSrobert<p> 73*404b540aSrobert<code>catalog do_open(const string&, const locale&) const</code> 74*404b540aSrobert</p> 75*404b540aSrobert<blockquote> 76*404b540aSrobert<em> 77*404b540aSrobert-1- Returns: A value that may be passed to get() to retrieve a 78*404b540aSrobertmessage, from the message catalog identified by the string name 79*404b540aSrobertaccording to an implementation-defined mapping. The result can be used 80*404b540aSrobertuntil it is passed to close(). Returns a value less than 0 if no such 81*404b540aSrobertcatalog can be opened. 82*404b540aSrobert</em> 83*404b540aSrobert</blockquote> 84*404b540aSrobert 85*404b540aSrobert<p> 86*404b540aSrobert<code>string_type do_get(catalog, int, int, const string_type&) const</code> 87*404b540aSrobert</p> 88*404b540aSrobert<blockquote> 89*404b540aSrobert<em> 90*404b540aSrobert-3- Requires: A catalog cat obtained from open() and not yet closed. 91*404b540aSrobert-4- Returns: A message identified by arguments set, msgid, and dfault, 92*404b540aSrobertaccording to an implementation-defined mapping. If no such message can 93*404b540aSrobertbe found, returns dfault. 94*404b540aSrobert</em> 95*404b540aSrobert</blockquote> 96*404b540aSrobert 97*404b540aSrobert<p> 98*404b540aSrobert<code>void do_close(catalog) const</code> 99*404b540aSrobert</p> 100*404b540aSrobert<blockquote> 101*404b540aSrobert<em> 102*404b540aSrobert-5- Requires: A catalog cat obtained from open() and not yet closed. 103*404b540aSrobert-6- Effects: Releases unspecified resources associated with cat. 104*404b540aSrobert-7- Notes: The limit on such resources, if any, is implementation-defined. 105*404b540aSrobert</em> 106*404b540aSrobert</blockquote> 107*404b540aSrobert 108*404b540aSrobert 109*404b540aSrobert<h2> 110*404b540aSrobert3. Problems with "C" messages: thread safety, 111*404b540aSrobertover-specification, and assumptions. 112*404b540aSrobert</h2> 113*404b540aSrobertA couple of notes on the standard. 114*404b540aSrobert 115*404b540aSrobert<p> 116*404b540aSrobertFirst, why is <code>messages_base::catalog</code> specified as a typedef 117*404b540aSrobertto int? This makes sense for implementations that use 118*404b540aSrobert<code>catopen</code>, but not for others. Fortunately, it's not heavily 119*404b540aSrobertused and so only a minor irritant. 120*404b540aSrobert</p> 121*404b540aSrobert 122*404b540aSrobert<p> 123*404b540aSrobertSecond, by making the member functions <code>const</code>, it is 124*404b540aSrobertimpossible to save state in them. Thus, storing away information used 125*404b540aSrobertin the 'open' member function for use in 'get' is impossible. This is 126*404b540aSrobertunfortunate. 127*404b540aSrobert</p> 128*404b540aSrobert 129*404b540aSrobert<p> 130*404b540aSrobertThe 'open' member function in particular seems to be oddly 131*404b540aSrobertdesigned. The signature seems quite peculiar. Why specify a <code>const 132*404b540aSrobertstring& </code> argument, for instance, instead of just <code>const 133*404b540aSrobertchar*</code>? Or, why specify a <code>const locale&</code> argument that is 134*404b540aSrobertto be used in the 'get' member function? How, exactly, is this locale 135*404b540aSrobertargument useful? What was the intent? It might make sense if a locale 136*404b540aSrobertargument was associated with a given default message string in the 137*404b540aSrobert'open' member function, for instance. Quite murky and unclear, on 138*404b540aSrobertreflection. 139*404b540aSrobert</p> 140*404b540aSrobert 141*404b540aSrobert<p> 142*404b540aSrobertLastly, it seems odd that messages, which explicitly require code 143*404b540aSrobertconversion, don't use the codecvt facet. Because the messages facet 144*404b540aSroberthas only one template parameter, it is assumed that ctype, and not 145*404b540aSrobertcodecvt, is to be used to convert between character sets. 146*404b540aSrobert</p> 147*404b540aSrobert 148*404b540aSrobert<p> 149*404b540aSrobertIt is implicitly assumed that the locale for the default message 150*404b540aSrobertstring in 'get' is in the "C" locale. Thus, all source code is assumed 151*404b540aSrobertto be written in English, so translations are always from "en_US" to 152*404b540aSrobertother, explicitly named locales. 153*404b540aSrobert</p> 154*404b540aSrobert 155*404b540aSrobert<h2> 156*404b540aSrobert4. Design and Implementation Details 157*404b540aSrobert</h2> 158*404b540aSrobertThis is a relatively simple class, on the face of it. The standard 159*404b540aSrobertspecifies very little in concrete terms, so generic implementations 160*404b540aSrobertthat are conforming yet do very little are the norm. Adding 161*404b540aSrobertfunctionality that would be useful to programmers and comparable to 162*404b540aSrobertJava's java.text.MessageFormat takes a bit of work, and is highly 163*404b540aSrobertdependent on the capabilities of the underlying operating system. 164*404b540aSrobert 165*404b540aSrobert<p> 166*404b540aSrobertThree different mechanisms have been provided, selectable via 167*404b540aSrobertconfigure flags: 168*404b540aSrobert</p> 169*404b540aSrobert 170*404b540aSrobert<ul> 171*404b540aSrobert <li> generic 172*404b540aSrobert <p> 173*404b540aSrobert This model does very little, and is what is used by default. 174*404b540aSrobert </p> 175*404b540aSrobert </li> 176*404b540aSrobert 177*404b540aSrobert <li> gnu 178*404b540aSrobert <p> 179*404b540aSrobert The gnu model is complete and fully tested. It's based on the 180*404b540aSrobert GNU gettext package, which is part of glibc. It uses the functions 181*404b540aSrobert <code>textdomain, bindtextdomain, gettext</code> 182*404b540aSrobert to implement full functionality. Creating message 183*404b540aSrobert catalogs is a relatively straight-forward process and is 184*404b540aSrobert lightly documented below, and fully documented in gettext's 185*404b540aSrobert distributed documentation. 186*404b540aSrobert </p> 187*404b540aSrobert </li> 188*404b540aSrobert 189*404b540aSrobert <li> ieee_1003.1-200x 190*404b540aSrobert <p> 191*404b540aSrobert This is a complete, though untested, implementation based on 192*404b540aSrobert the IEEE standard. The functions 193*404b540aSrobert <code>catopen, catgets, catclose</code> 194*404b540aSrobert are used to retrieve locale-specific messages given the 195*404b540aSrobert appropriate message catalogs that have been constructed for 196*404b540aSrobert their use. Note, the script <code> po2msg.sed</code> that is part 197*404b540aSrobert of the gettext distribution can convert gettext catalogs into 198*404b540aSrobert catalogs that <code>catopen</code> can use. 199*404b540aSrobert </p> 200*404b540aSrobert </li> 201*404b540aSrobert</ul> 202*404b540aSrobert 203*404b540aSrobert<p> 204*404b540aSrobertA new, standards-conformant non-virtual member function signature was 205*404b540aSrobertadded for 'open' so that a directory could be specified with a given 206*404b540aSrobertmessage catalog. This simplifies calling conventions for the gnu 207*404b540aSrobertmodel. 208*404b540aSrobert</p> 209*404b540aSrobert 210*404b540aSrobert<p> 211*404b540aSrobertThe rest of this document discusses details of the GNU model. 212*404b540aSrobert</p> 213*404b540aSrobert 214*404b540aSrobert<p> 215*404b540aSrobertThe messages facet, because it is retrieving and converting between 216*404b540aSrobertcharacters sets, depends on the ctype and perhaps the codecvt facet in 217*404b540aSroberta given locale. In addition, underlying "C" library locale support is 218*404b540aSrobertnecessary for more than just the <code>LC_MESSAGES</code> mask: 219*404b540aSrobert<code>LC_CTYPE</code> is also necessary. To avoid any unpleasantness, all 220*404b540aSrobertbits of the "C" mask (ie <code>LC_ALL</code>) are set before retrieving 221*404b540aSrobertmessages. 222*404b540aSrobert</p> 223*404b540aSrobert 224*404b540aSrobert<p> 225*404b540aSrobertMaking the message catalogs can be initially tricky, but become quite 226*404b540aSrobertsimple with practice. For complete info, see the gettext 227*404b540aSrobertdocumentation. Here's an idea of what is required: 228*404b540aSrobert</p> 229*404b540aSrobert 230*404b540aSrobert<ul> 231*404b540aSrobert <li> Make a source file with the required string literals 232*404b540aSrobert that need to be translated. See 233*404b540aSrobert <code>intl/string_literals.cc</code> for an example. 234*404b540aSrobert </li> 235*404b540aSrobert 236*404b540aSrobert <li> Make initial catalog (see "4 Making the PO Template File" 237*404b540aSrobert from the gettext docs). 238*404b540aSrobert <p> 239*404b540aSrobert <code> xgettext --c++ --debug string_literals.cc -o libstdc++.pot </code> 240*404b540aSrobert </p> 241*404b540aSrobert </li> 242*404b540aSrobert 243*404b540aSrobert <li> Make language and country-specific locale catalogs. 244*404b540aSrobert <p> 245*404b540aSrobert <code>cp libstdc++.pot fr_FR.po</code> 246*404b540aSrobert </p> 247*404b540aSrobert <p> 248*404b540aSrobert <code>cp libstdc++.pot de_DE.po</code> 249*404b540aSrobert </p> 250*404b540aSrobert </li> 251*404b540aSrobert 252*404b540aSrobert <li> Edit localized catalogs in emacs so that strings are 253*404b540aSrobert translated. 254*404b540aSrobert <p> 255*404b540aSrobert <code>emacs fr_FR.po</code> 256*404b540aSrobert </p> 257*404b540aSrobert </li> 258*404b540aSrobert 259*404b540aSrobert <li> Make the binary mo files. 260*404b540aSrobert <p> 261*404b540aSrobert <code>msgfmt fr_FR.po -o fr_FR.mo</code> 262*404b540aSrobert </p> 263*404b540aSrobert <p> 264*404b540aSrobert <code>msgfmt de_DE.po -o de_DE.mo</code> 265*404b540aSrobert </p> 266*404b540aSrobert </li> 267*404b540aSrobert 268*404b540aSrobert <li> Copy the binary files into the correct directory structure. 269*404b540aSrobert <p> 270*404b540aSrobert <code>cp fr_FR.mo (dir)/fr_FR/LC_MESSAGES/libstdc++-v3.mo</code> 271*404b540aSrobert </p> 272*404b540aSrobert <p> 273*404b540aSrobert <code>cp de_DE.mo (dir)/de_DE/LC_MESSAGES/libstdc++-v3.mo</code> 274*404b540aSrobert </p> 275*404b540aSrobert </li> 276*404b540aSrobert 277*404b540aSrobert <li> Use the new message catalogs. 278*404b540aSrobert <p> 279*404b540aSrobert <code>locale loc_de("de_DE");</code> 280*404b540aSrobert </p> 281*404b540aSrobert <p> 282*404b540aSrobert <code> 283*404b540aSrobert use_facet<messages<char> >(loc_de).open("libstdc++", locale(), dir); 284*404b540aSrobert </code> 285*404b540aSrobert </p> 286*404b540aSrobert </li> 287*404b540aSrobert</ul> 288*404b540aSrobert 289*404b540aSrobert<h2> 290*404b540aSrobert5. Examples 291*404b540aSrobert</h2> 292*404b540aSrobert 293*404b540aSrobert<ul> 294*404b540aSrobert <li> message converting, simple example using the GNU model. 295*404b540aSrobert 296*404b540aSrobert<pre> 297*404b540aSrobert#include <iostream> 298*404b540aSrobert#include <locale> 299*404b540aSrobertusing namespace std; 300*404b540aSrobert 301*404b540aSrobertvoid test01() 302*404b540aSrobert{ 303*404b540aSrobert typedef messages<char>::catalog catalog; 304*404b540aSrobert const char* dir = 305*404b540aSrobert "/mnt/egcs/build/i686-pc-linux-gnu/libstdc++-v3/po/share/locale"; 306*404b540aSrobert const locale loc_de("de_DE"); 307*404b540aSrobert const messages<char>& mssg_de = use_facet<messages<char> >(loc_de); 308*404b540aSrobert 309*404b540aSrobert catalog cat_de = mssg_de.open("libstdc++", loc_de, dir); 310*404b540aSrobert string s01 = mssg_de.get(cat_de, 0, 0, "please"); 311*404b540aSrobert string s02 = mssg_de.get(cat_de, 0, 0, "thank you"); 312*404b540aSrobert cout << "please in german:" << s01 << '\n'; 313*404b540aSrobert cout << "thank you in german:" << s02 << '\n'; 314*404b540aSrobert mssg_de.close(cat_de); 315*404b540aSrobert} 316*404b540aSrobert</pre> 317*404b540aSrobert </li> 318*404b540aSrobert</ul> 319*404b540aSrobert 320*404b540aSrobertMore information can be found in the following testcases: 321*404b540aSrobert<ul> 322*404b540aSrobert<li> testsuite/22_locale/messages.cc </li> 323*404b540aSrobert<li> testsuite/22_locale/messages_byname.cc </li> 324*404b540aSrobert<li> testsuite/22_locale/messages_char_members.cc </li> 325*404b540aSrobert</ul> 326*404b540aSrobert 327*404b540aSrobert<h2> 328*404b540aSrobert6. Unresolved Issues 329*404b540aSrobert</h2> 330*404b540aSrobert<ul> 331*404b540aSrobert<li> Things that are sketchy, or remain unimplemented: 332*404b540aSrobert <ul> 333*404b540aSrobert <li>_M_convert_from_char, _M_convert_to_char are in 334*404b540aSrobert flux, depending on how the library ends up doing 335*404b540aSrobert character set conversions. It might not be possible to 336*404b540aSrobert do a real character set based conversion, due to the 337*404b540aSrobert fact that the template parameter for messages is not 338*404b540aSrobert enough to instantiate the codecvt facet (1 supplied, 339*404b540aSrobert need at least 2 but would prefer 3). 340*404b540aSrobert </li> 341*404b540aSrobert 342*404b540aSrobert <li> There are issues with gettext needing the global 343*404b540aSrobert locale set to extract a message. This dependence on 344*404b540aSrobert the global locale makes the current "gnu" model non 345*404b540aSrobert MT-safe. Future versions of glibc, ie glibc 2.3.x will 346*404b540aSrobert fix this, and the C++ library bits are already in 347*404b540aSrobert place. 348*404b540aSrobert </li> 349*404b540aSrobert </ul> 350*404b540aSrobert</li> 351*404b540aSrobert 352*404b540aSrobert<li> Development versions of the GNU "C" library, glibc 2.3 will allow 353*404b540aSrobert a more efficient, MT implementation of std::messages, and will 354*404b540aSrobert allow the removal of the _M_name_messages data member. If this 355*404b540aSrobert is done, it will change the library ABI. The C++ parts to 356*404b540aSrobert support glibc 2.3 have already been coded, but are not in use: 357*404b540aSrobert once this version of the "C" library is released, the marked 358*404b540aSrobert parts of the messages implementation can be switched over to 359*404b540aSrobert the new "C" library functionality. 360*404b540aSrobert</li> 361*404b540aSrobert<li> At some point in the near future, std::numpunct will probably use 362*404b540aSrobert std::messages facilities to implement truename/falename 363*404b540aSrobert correctly. This is currently not done, but entries in 364*404b540aSrobert libstdc++.pot have already been made for "true" and "false" 365*404b540aSrobert string literals, so all that remains is the std::numpunct 366*404b540aSrobert coding and the configure/make hassles to make the installed 367*404b540aSrobert library search its own catalog. Currently the libstdc++.mo 368*404b540aSrobert catalog is only searched for the testsuite cases involving 369*404b540aSrobert messages members. 370*404b540aSrobert</li> 371*404b540aSrobert 372*404b540aSrobert<li> The following member functions: 373*404b540aSrobert 374*404b540aSrobert <p> 375*404b540aSrobert <code> 376*404b540aSrobert catalog 377*404b540aSrobert open(const basic_string<char>& __s, const locale& __loc) const 378*404b540aSrobert </code> 379*404b540aSrobert </p> 380*404b540aSrobert 381*404b540aSrobert <p> 382*404b540aSrobert <code> 383*404b540aSrobert catalog 384*404b540aSrobert open(const basic_string<char>&, const locale&, const char*) const; 385*404b540aSrobert </code> 386*404b540aSrobert </p> 387*404b540aSrobert 388*404b540aSrobert <p> 389*404b540aSrobert Don't actually return a "value less than 0 if no such catalog 390*404b540aSrobert can be opened" as required by the standard in the "gnu" 391*404b540aSrobert model. As of this writing, it is unknown how to query to see 392*404b540aSrobert if a specified message catalog exists using the gettext 393*404b540aSrobert package. 394*404b540aSrobert </p> 395*404b540aSrobert</li> 396*404b540aSrobert</ul> 397*404b540aSrobert 398*404b540aSrobert<h2> 399*404b540aSrobert7. Acknowledgments 400*404b540aSrobert</h2> 401*404b540aSrobertUlrich Drepper for the character set explanations, gettext details, 402*404b540aSrobertand patient answering of late-night questions, Tom Tromey for the java details. 403*404b540aSrobert 404*404b540aSrobert 405*404b540aSrobert<h2> 406*404b540aSrobert8. Bibliography / Referenced Documents 407*404b540aSrobert</h2> 408*404b540aSrobert 409*404b540aSrobertDrepper, Ulrich, GNU libc (glibc) 2.2 manual. In particular, Chapters 410*404b540aSrobert"7 Locales and Internationalization" 411*404b540aSrobert 412*404b540aSrobert<p> 413*404b540aSrobertDrepper, Ulrich, Thread-Aware Locale Model, A proposal. This is a 414*404b540aSrobertdraft document describing the design of glibc 2.3 MT locale 415*404b540aSrobertfunctionality. 416*404b540aSrobert</p> 417*404b540aSrobert 418*404b540aSrobert<p> 419*404b540aSrobertDrepper, Ulrich, Numerous, late-night email correspondence 420*404b540aSrobert</p> 421*404b540aSrobert 422*404b540aSrobert<p> 423*404b540aSrobertISO/IEC 9899:1999 Programming languages - C 424*404b540aSrobert</p> 425*404b540aSrobert 426*404b540aSrobert<p> 427*404b540aSrobertISO/IEC 14882:1998 Programming languages - C++ 428*404b540aSrobert</p> 429*404b540aSrobert 430*404b540aSrobert<p> 431*404b540aSrobertJava 2 Platform, Standard Edition, v 1.3.1 API Specification. In 432*404b540aSrobertparticular, java.util.Properties, java.text.MessageFormat, 433*404b540aSrobertjava.util.Locale, java.util.ResourceBundle. 434*404b540aSroberthttp://java.sun.com/j2se/1.3/docs/api 435*404b540aSrobert</p> 436*404b540aSrobert 437*404b540aSrobert<p> 438*404b540aSrobertSystem Interface Definitions, Issue 7 (IEEE Std. 1003.1-200x) 439*404b540aSrobertThe Open Group/The Institute of Electrical and Electronics Engineers, Inc. 440*404b540aSrobertIn particular see lines 5268-5427. 441*404b540aSroberthttp://www.opennc.org/austin/docreg.html 442*404b540aSrobert</p> 443*404b540aSrobert 444*404b540aSrobert<p> GNU gettext tools, version 0.10.38, Native Language Support 445*404b540aSrobertLibrary and Tools. 446*404b540aSroberthttp://sources.redhat.com/gettext 447*404b540aSrobert</p> 448*404b540aSrobert 449*404b540aSrobert<p> 450*404b540aSrobertLanger, Angelika and Klaus Kreft, Standard C++ IOStreams and Locales, 451*404b540aSrobertAdvanced Programmer's Guide and Reference, Addison Wesley Longman, 452*404b540aSrobertInc. 2000. See page 725, Internationalized Messages. 453*404b540aSrobert</p> 454*404b540aSrobert 455*404b540aSrobert<p> 456*404b540aSrobertStroustrup, Bjarne, Appendix D, The C++ Programming Language, Special Edition, Addison Wesley, Inc. 2000 457*404b540aSrobert</p> 458*404b540aSrobert 459*404b540aSrobert</body> 460*404b540aSrobert</html> 461*404b540aSrobert 462