1*946379e7Schristos<HTML> 2*946379e7Schristos<HEAD> 3*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b 4*946379e7Schristos from gettext.texi on 27 November 2006 --> 5*946379e7Schristos 6*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7*946379e7Schristos<TITLE>GNU gettext utilities - 11 The Programmer's View</TITLE> 8*946379e7Schristos</HEAD> 9*946379e7Schristos<BODY> 10*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 11*946379e7Schristos<P><HR><P> 12*946379e7Schristos 13*946379e7Schristos 14*946379e7Schristos<H1><A NAME="SEC164" HREF="gettext_toc.html#TOC164">11 The Programmer's View</A></H1> 15*946379e7Schristos 16*946379e7Schristos<P> 17*946379e7SchristosOne aim of the current message catalog implementation provided by 18*946379e7SchristosGNU <CODE>gettext</CODE> was to use the system's message catalog handling, if the 19*946379e7Schristosinstaller wishes to do so. So we perhaps should first take a look at 20*946379e7Schristosthe solutions we know about. The people in the POSIX committee did not 21*946379e7Schristosmanage to agree on one of the semi-official standards which we'll 22*946379e7Schristosdescribe below. In fact they couldn't agree on anything, so they decided 23*946379e7Schristosonly to include an example of an interface. The major Unix vendors 24*946379e7Schristosare split in the usage of the two most important specifications: X/Open's 25*946379e7Schristoscatgets vs. Uniforum's gettext interface. We'll describe them both and 26*946379e7Schristoslater explain our solution of this dilemma. 27*946379e7Schristos 28*946379e7Schristos</P> 29*946379e7Schristos 30*946379e7Schristos 31*946379e7Schristos 32*946379e7Schristos<H2><A NAME="SEC165" HREF="gettext_toc.html#TOC165">11.1 About <CODE>catgets</CODE></A></H2> 33*946379e7Schristos<P> 34*946379e7Schristos<A NAME="IDX1006"></A> 35*946379e7Schristos 36*946379e7Schristos</P> 37*946379e7Schristos<P> 38*946379e7SchristosThe <CODE>catgets</CODE> implementation is defined in the X/Open Portability 39*946379e7SchristosGuide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the 40*946379e7Schristosprocess of creating this standard seemed to be too slow for some of 41*946379e7Schristosthe Unix vendors so they created their implementations on preliminary 42*946379e7Schristosversions of the standard. Of course this leads again to problems while 43*946379e7Schristoswriting platform independent programs: even the usage of <CODE>catgets</CODE> 44*946379e7Schristosdoes not guarantee a unique interface. 45*946379e7Schristos 46*946379e7Schristos</P> 47*946379e7Schristos<P> 48*946379e7SchristosAnother, personal comment on this that only a bunch of committee members 49*946379e7Schristoscould have made this interface. They never really tried to program 50*946379e7Schristosusing this interface. It is a fast, memory-saving implementation, an 51*946379e7Schristosuser can happily live with it. But programmers hate it (at least I and 52*946379e7Schristossome others do...) 53*946379e7Schristos 54*946379e7Schristos</P> 55*946379e7Schristos<P> 56*946379e7SchristosBut we must not forget one point: after all the trouble with transferring 57*946379e7Schristosthe rights on Unix(tm) they at last came to X/Open, the very same who 58*946379e7Schristospublished this specification. This leads me to making the prediction 59*946379e7Schristosthat this interface will be in future Unix standards (e.g. Spec1170) and 60*946379e7Schristostherefore part of all Unix implementation (implementations, which are 61*946379e7Schristos<EM>allowed</EM> to wear this name). 62*946379e7Schristos 63*946379e7Schristos</P> 64*946379e7Schristos 65*946379e7Schristos 66*946379e7Schristos 67*946379e7Schristos<H3><A NAME="SEC166" HREF="gettext_toc.html#TOC166">11.1.1 The Interface</A></H3> 68*946379e7Schristos<P> 69*946379e7Schristos<A NAME="IDX1007"></A> 70*946379e7Schristos 71*946379e7Schristos</P> 72*946379e7Schristos<P> 73*946379e7SchristosThe interface to the <CODE>catgets</CODE> implementation consists of three 74*946379e7Schristosfunctions which correspond to those used in file access: <CODE>catopen</CODE> 75*946379e7Schristosto open the catalog for using, <CODE>catgets</CODE> for accessing the message 76*946379e7Schristostables, and <CODE>catclose</CODE> for closing after work is done. Prototypes 77*946379e7Schristosfor the functions and the needed definitions are in the 78*946379e7Schristos<CODE><nl_types.h></CODE> header file. 79*946379e7Schristos 80*946379e7Schristos</P> 81*946379e7Schristos<P> 82*946379e7Schristos<A NAME="IDX1008"></A> 83*946379e7Schristos<CODE>catopen</CODE> is used like in this: 84*946379e7Schristos 85*946379e7Schristos</P> 86*946379e7Schristos 87*946379e7Schristos<PRE> 88*946379e7Schristosnl_catd catd = catopen ("catalog_name", 0); 89*946379e7Schristos</PRE> 90*946379e7Schristos 91*946379e7Schristos<P> 92*946379e7SchristosThe function takes as the argument the name of the catalog. This usual 93*946379e7Schristosrefers to the name of the program or the package. The second parameter 94*946379e7Schristosis not further specified in the standard. I don't even know whether it 95*946379e7Schristosis implemented consistently among various systems. So the common advice 96*946379e7Schristosis to use <CODE>0</CODE> as the value. The return value is a handle to the 97*946379e7Schristosmessage catalog, equivalent to handles to file returned by <CODE>open</CODE>. 98*946379e7Schristos 99*946379e7Schristos</P> 100*946379e7Schristos<P> 101*946379e7Schristos<A NAME="IDX1009"></A> 102*946379e7SchristosThis handle is of course used in the <CODE>catgets</CODE> function which can 103*946379e7Schristosbe used like this: 104*946379e7Schristos 105*946379e7Schristos</P> 106*946379e7Schristos 107*946379e7Schristos<PRE> 108*946379e7Schristoschar *translation = catgets (catd, set_no, msg_id, "original string"); 109*946379e7Schristos</PRE> 110*946379e7Schristos 111*946379e7Schristos<P> 112*946379e7SchristosThe first parameter is this catalog descriptor. The second parameter 113*946379e7Schristosspecifies the set of messages in this catalog, in which the message 114*946379e7Schristosdescribed by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a 115*946379e7Schristosthree-stage addressing: 116*946379e7Schristos 117*946379e7Schristos</P> 118*946379e7Schristos 119*946379e7Schristos<PRE> 120*946379e7Schristoscatalog name => set number => message ID => translation 121*946379e7Schristos</PRE> 122*946379e7Schristos 123*946379e7Schristos<P> 124*946379e7SchristosThe fourth argument is not used to address the translation. It is given 125*946379e7Schristosas a default value in case when one of the addressing stages fail. One 126*946379e7Schristosimportant thing to remember is that although the return type of catgets 127*946379e7Schristosis <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It 128*946379e7Schristosshould better be <CODE>const char *</CODE>, but the standard is published in 129*946379e7Schristos1988, one year before ANSI C. 130*946379e7Schristos 131*946379e7Schristos</P> 132*946379e7Schristos<P> 133*946379e7Schristos<A NAME="IDX1010"></A> 134*946379e7SchristosThe last of these functions is used and behaves as expected: 135*946379e7Schristos 136*946379e7Schristos</P> 137*946379e7Schristos 138*946379e7Schristos<PRE> 139*946379e7Schristoscatclose (catd); 140*946379e7Schristos</PRE> 141*946379e7Schristos 142*946379e7Schristos<P> 143*946379e7SchristosAfter this no <CODE>catgets</CODE> call using the descriptor is legal anymore. 144*946379e7Schristos 145*946379e7Schristos</P> 146*946379e7Schristos 147*946379e7Schristos 148*946379e7Schristos<H3><A NAME="SEC167" HREF="gettext_toc.html#TOC167">11.1.2 Problems with the <CODE>catgets</CODE> Interface?!</A></H3> 149*946379e7Schristos<P> 150*946379e7Schristos<A NAME="IDX1011"></A> 151*946379e7Schristos 152*946379e7Schristos</P> 153*946379e7Schristos<P> 154*946379e7SchristosNow that this description seemed to be really easy -- where are the 155*946379e7Schristosproblems we speak of? In fact the interface could be used in a 156*946379e7Schristosreasonable way, but constructing the message catalogs is a pain. The 157*946379e7Schristosreason for this lies in the third argument of <CODE>catgets</CODE>: the unique 158*946379e7Schristosmessage ID. This has to be a numeric value for all messages in a single 159*946379e7Schristosset. Perhaps you could imagine the problems keeping such a list while 160*946379e7Schristoschanging the source code. Add a new message here, remove one there. Of 161*946379e7Schristoscourse there have been developed a lot of tools helping to organize this 162*946379e7Schristoschaos but one as the other fails in one aspect or the other. We don't 163*946379e7Schristoswant to say that the other approach has no problems but they are far 164*946379e7Schristosmore easy to manage. 165*946379e7Schristos 166*946379e7Schristos</P> 167*946379e7Schristos 168*946379e7Schristos 169*946379e7Schristos<H2><A NAME="SEC168" HREF="gettext_toc.html#TOC168">11.2 About <CODE>gettext</CODE></A></H2> 170*946379e7Schristos<P> 171*946379e7Schristos<A NAME="IDX1012"></A> 172*946379e7Schristos 173*946379e7Schristos</P> 174*946379e7Schristos<P> 175*946379e7SchristosThe definition of the <CODE>gettext</CODE> interface comes from a Uniforum 176*946379e7Schristosproposal. It was submitted there by Sun, who had implemented the 177*946379e7Schristos<CODE>gettext</CODE> function in SunOS 4, around 1990. Nowadays, the 178*946379e7Schristos<CODE>gettext</CODE> interface is specified by the OpenI18N standard. 179*946379e7Schristos 180*946379e7Schristos</P> 181*946379e7Schristos<P> 182*946379e7SchristosThe main point about this solution is that it does not follow the 183*946379e7Schristosmethod of normal file handling (open-use-close) and that it does not 184*946379e7Schristosburden the programmer with so many tasks, especially the unique key handling. 185*946379e7SchristosOf course here also a unique key is needed, but this key is the message 186*946379e7Schristositself (how long or short it is). See section <A HREF="gettext_11.html#SEC176">11.3 Comparing the Two Interfaces</A> for a more 187*946379e7Schristosdetailed comparison of the two methods. 188*946379e7Schristos 189*946379e7Schristos</P> 190*946379e7Schristos<P> 191*946379e7SchristosThe following section contains a rather detailed description of the 192*946379e7Schristosinterface. We make it that detailed because this is the interface 193*946379e7Schristoswe chose for the GNU <CODE>gettext</CODE> Library. Programmers interested 194*946379e7Schristosin using this library will be interested in this description. 195*946379e7Schristos 196*946379e7Schristos</P> 197*946379e7Schristos 198*946379e7Schristos 199*946379e7Schristos 200*946379e7Schristos<H3><A NAME="SEC169" HREF="gettext_toc.html#TOC169">11.2.1 The Interface</A></H3> 201*946379e7Schristos<P> 202*946379e7Schristos<A NAME="IDX1013"></A> 203*946379e7Schristos 204*946379e7Schristos</P> 205*946379e7Schristos<P> 206*946379e7SchristosThe minimal functionality an interface must have is a) to select a 207*946379e7Schristosdomain the strings are coming from (a single domain for all programs is 208*946379e7Schristosnot reasonable because its construction and maintenance is difficult, 209*946379e7Schristosperhaps impossible) and b) to access a string in a selected domain. 210*946379e7Schristos 211*946379e7Schristos</P> 212*946379e7Schristos<P> 213*946379e7SchristosThis is principally the description of the <CODE>gettext</CODE> interface. It 214*946379e7Schristoshas a global domain which unqualified usages reference. Of course this 215*946379e7Schristosdomain is selectable by the user. 216*946379e7Schristos 217*946379e7Schristos</P> 218*946379e7Schristos 219*946379e7Schristos<PRE> 220*946379e7Schristoschar *textdomain (const char *domain_name); 221*946379e7Schristos</PRE> 222*946379e7Schristos 223*946379e7Schristos<P> 224*946379e7SchristosThis provides the possibility to change or query the current status of 225*946379e7Schristosthe current global domain of the <CODE>LC_MESSAGE</CODE> category. The 226*946379e7Schristosargument is a null-terminated string, whose characters must be legal in 227*946379e7Schristosthe use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>, 228*946379e7Schristosthe function returns the current value. If no value has been set 229*946379e7Schristosbefore, the name of the default domain is returned: <EM>messages</EM>. 230*946379e7SchristosPlease note that although the return value of <CODE>textdomain</CODE> is of 231*946379e7Schristostype <CODE>char *</CODE> no changing is allowed. It is also important to know 232*946379e7Schristosthat no checks of the availability are made. If the name is not 233*946379e7Schristosavailable you will see this by the fact that no translations are provided. 234*946379e7Schristos 235*946379e7Schristos</P> 236*946379e7Schristos<P> 237*946379e7SchristosTo use a domain set by <CODE>textdomain</CODE> the function 238*946379e7Schristos 239*946379e7Schristos</P> 240*946379e7Schristos 241*946379e7Schristos<PRE> 242*946379e7Schristoschar *gettext (const char *msgid); 243*946379e7Schristos</PRE> 244*946379e7Schristos 245*946379e7Schristos<P> 246*946379e7Schristosis to be used. This is the simplest reasonable form one can imagine. 247*946379e7SchristosThe translation of the string <VAR>msgid</VAR> is returned if it is available 248*946379e7Schristosin the current domain. If it is not available, the argument itself is 249*946379e7Schristosreturned. If the argument is <CODE>NULL</CODE> the result is undefined. 250*946379e7Schristos 251*946379e7Schristos</P> 252*946379e7Schristos<P> 253*946379e7SchristosOne thing which should come into mind is that no explicit dependency to 254*946379e7Schristosthe used domain is given. The current value of the domain for the 255*946379e7Schristos<CODE>LC_MESSAGES</CODE> locale is used. If this changes between two 256*946379e7Schristosexecutions of the same <CODE>gettext</CODE> call in the program, both calls 257*946379e7Schristosreference a different message catalog. 258*946379e7Schristos 259*946379e7Schristos</P> 260*946379e7Schristos<P> 261*946379e7SchristosFor the easiest case, which is normally used in internationalized 262*946379e7Schristospackages, once at the beginning of execution a call to <CODE>textdomain</CODE> 263*946379e7Schristosis issued, setting the domain to a unique name, normally the package 264*946379e7Schristosname. In the following code all strings which have to be translated are 265*946379e7Schristosfiltered through the gettext function. That's all, the package speaks 266*946379e7Schristosyour language. 267*946379e7Schristos 268*946379e7Schristos</P> 269*946379e7Schristos 270*946379e7Schristos 271*946379e7Schristos<H3><A NAME="SEC170" HREF="gettext_toc.html#TOC170">11.2.2 Solving Ambiguities</A></H3> 272*946379e7Schristos<P> 273*946379e7Schristos<A NAME="IDX1014"></A> 274*946379e7Schristos<A NAME="IDX1015"></A> 275*946379e7Schristos<A NAME="IDX1016"></A> 276*946379e7Schristos 277*946379e7Schristos</P> 278*946379e7Schristos<P> 279*946379e7SchristosWhile this single name domain works well for most applications there 280*946379e7Schristosmight be the need to get translations from more than one domain. Of 281*946379e7Schristoscourse one could switch between different domains with calls to 282*946379e7Schristos<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A 283*946379e7Schristospossible situation could be one case subject to discussion during this 284*946379e7Schristoswriting: all 285*946379e7Schristoserror messages of functions in the set of common used functions should 286*946379e7Schristosgo into a separate domain <CODE>error</CODE>. By this mean we would only need 287*946379e7Schristosto translate them once. 288*946379e7SchristosAnother case are messages from a library, as these <EM>have</EM> to be 289*946379e7Schristosindependent of the current domain set by the application. 290*946379e7Schristos 291*946379e7Schristos</P> 292*946379e7Schristos<P> 293*946379e7SchristosFor this reasons there are two more functions to retrieve strings: 294*946379e7Schristos 295*946379e7Schristos</P> 296*946379e7Schristos 297*946379e7Schristos<PRE> 298*946379e7Schristoschar *dgettext (const char *domain_name, const char *msgid); 299*946379e7Schristoschar *dcgettext (const char *domain_name, const char *msgid, 300*946379e7Schristos int category); 301*946379e7Schristos</PRE> 302*946379e7Schristos 303*946379e7Schristos<P> 304*946379e7SchristosBoth take an additional argument at the first place, which corresponds 305*946379e7Schristosto the argument of <CODE>textdomain</CODE>. The third argument of 306*946379e7Schristos<CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>. 307*946379e7SchristosBut I really don't know where this can be useful. If the 308*946379e7Schristos<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside 309*946379e7Schristosthe known ones, the result is undefined. It should also be noted that 310*946379e7Schristosthis function is not part of the second known implementation of this 311*946379e7Schristosfunction family, the one found in Solaris. 312*946379e7Schristos 313*946379e7Schristos</P> 314*946379e7Schristos<P> 315*946379e7SchristosA second ambiguity can arise by the fact, that perhaps more than one 316*946379e7Schristosdomain has the same name. This can be solved by specifying where the 317*946379e7Schristosneeded message catalog files can be found. 318*946379e7Schristos 319*946379e7Schristos</P> 320*946379e7Schristos 321*946379e7Schristos<PRE> 322*946379e7Schristoschar *bindtextdomain (const char *domain_name, 323*946379e7Schristos const char *dir_name); 324*946379e7Schristos</PRE> 325*946379e7Schristos 326*946379e7Schristos<P> 327*946379e7SchristosCalling this function binds the given domain to a file in the specified 328*946379e7Schristosdirectory (how this file is determined follows below). Especially a 329*946379e7Schristosfile in the systems default place is not favored against the specified 330*946379e7Schristosfile anymore (as it would be by solely using <CODE>textdomain</CODE>). A 331*946379e7Schristos<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding 332*946379e7Schristosassociated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is 333*946379e7Schristos<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here 334*946379e7Schristosagain as for all the other functions is true that none of the return 335*946379e7Schristosvalue must be changed! 336*946379e7Schristos 337*946379e7Schristos</P> 338*946379e7Schristos<P> 339*946379e7SchristosIt is important to remember that relative path names for the 340*946379e7Schristos<VAR>dir_name</VAR> parameter can be trouble. Since the path is always 341*946379e7Schristoscomputed relative to the current directory different results will be 342*946379e7Schristosachieved when the program executes a <CODE>chdir</CODE> command. Relative 343*946379e7Schristospaths should always be avoided to avoid dependencies and 344*946379e7Schristosunreliabilities. 345*946379e7Schristos 346*946379e7Schristos</P> 347*946379e7Schristos 348*946379e7Schristos 349*946379e7Schristos<H3><A NAME="SEC171" HREF="gettext_toc.html#TOC171">11.2.3 Locating Message Catalog Files</A></H3> 350*946379e7Schristos<P> 351*946379e7Schristos<A NAME="IDX1017"></A> 352*946379e7Schristos 353*946379e7Schristos</P> 354*946379e7Schristos<P> 355*946379e7SchristosBecause many different languages for many different packages have to be 356*946379e7Schristosstored we need some way to add these information to file message catalog 357*946379e7Schristosfiles. The way usually used in Unix environments is have this encoding 358*946379e7Schristosin the file name. This is also done here. The directory name given in 359*946379e7Schristos<CODE>bindtextdomain</CODE>s second argument (or the default directory), 360*946379e7Schristosfollowed by the value and name of the locale and the domain name are 361*946379e7Schristosconcatenated: 362*946379e7Schristos 363*946379e7Schristos</P> 364*946379e7Schristos 365*946379e7Schristos<PRE> 366*946379e7Schristos<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo 367*946379e7Schristos</PRE> 368*946379e7Schristos 369*946379e7Schristos<P> 370*946379e7SchristosThe default value for <VAR>dir_name</VAR> is system specific. For the GNU 371*946379e7Schristoslibrary, and for packages adhering to its conventions, it's: 372*946379e7Schristos 373*946379e7Schristos<PRE> 374*946379e7Schristos/usr/local/share/locale 375*946379e7Schristos</PRE> 376*946379e7Schristos 377*946379e7Schristos<P> 378*946379e7Schristos<VAR>locale</VAR> is the value of the locale whose name is this 379*946379e7Schristos<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this 380*946379e7Schristos<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A> 381*946379e7SchristosThe value of the locale is determined through 382*946379e7Schristos<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>. 383*946379e7Schristos<A NAME="DOCF4" HREF="gettext_foot.html#FOOT4">(4)</A> 384*946379e7Schristos<CODE>dcgettext</CODE> specifies the locale category by the third argument. 385*946379e7Schristos 386*946379e7Schristos</P> 387*946379e7Schristos 388*946379e7Schristos 389*946379e7Schristos<H3><A NAME="SEC172" HREF="gettext_toc.html#TOC172">11.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A></H3> 390*946379e7Schristos<P> 391*946379e7Schristos<A NAME="IDX1018"></A> 392*946379e7Schristos<A NAME="IDX1019"></A> 393*946379e7Schristos 394*946379e7Schristos</P> 395*946379e7Schristos<P> 396*946379e7Schristos<CODE>gettext</CODE> not only looks up a translation in a message catalog. It 397*946379e7Schristosalso converts the translation on the fly to the desired output character 398*946379e7Schristosset. This is useful if the user is working in a different character set 399*946379e7Schristosthan the translator who created the message catalog, because it avoids 400*946379e7Schristosdistributing variants of message catalogs which differ only in the 401*946379e7Schristoscharacter set. 402*946379e7Schristos 403*946379e7Schristos</P> 404*946379e7Schristos<P> 405*946379e7SchristosThe output character set is, by default, the value of <CODE>nl_langinfo 406*946379e7Schristos(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current 407*946379e7Schristoslocale. But programs which store strings in a locale independent way 408*946379e7Schristos(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions 409*946379e7Schristosreturn the translations in that encoding, by use of the 410*946379e7Schristos<CODE>bind_textdomain_codeset</CODE> function. 411*946379e7Schristos 412*946379e7Schristos</P> 413*946379e7Schristos<P> 414*946379e7SchristosNote that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to 415*946379e7Schristoscharacter set conversion. Also, when <CODE>gettext</CODE> does not find a 416*946379e7Schristostranslation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged -- 417*946379e7Schristosindependently of the current output character set. It is therefore 418*946379e7Schristosrecommended that all <VAR>msgid</VAR>s be US-ASCII strings. 419*946379e7Schristos 420*946379e7Schristos</P> 421*946379e7Schristos<P> 422*946379e7Schristos<DL> 423*946379e7Schristos<DT><U>Function:</U> char * <B>bind_textdomain_codeset</B> <I>(const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)</I> 424*946379e7Schristos<DD><A NAME="IDX1020"></A> 425*946379e7SchristosThe <CODE>bind_textdomain_codeset</CODE> function can be used to specify the 426*946379e7Schristosoutput character set for message catalogs for domain <VAR>domainname</VAR>. 427*946379e7SchristosThe <VAR>codeset</VAR> argument must be a valid codeset name which can be used 428*946379e7Schristosfor the <CODE>iconv_open</CODE> function, or a null pointer. 429*946379e7Schristos 430*946379e7Schristos</P> 431*946379e7Schristos<P> 432*946379e7SchristosIf the <VAR>codeset</VAR> parameter is the null pointer, 433*946379e7Schristos<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset 434*946379e7Schristosfor the domain with the name <VAR>domainname</VAR>. It returns <CODE>NULL</CODE> if 435*946379e7Schristosno codeset has yet been selected. 436*946379e7Schristos 437*946379e7Schristos</P> 438*946379e7Schristos<P> 439*946379e7SchristosThe <CODE>bind_textdomain_codeset</CODE> function can be used several times. 440*946379e7SchristosIf used multiple times with the same <VAR>domainname</VAR> argument, the 441*946379e7Schristoslater call overrides the settings made by the earlier one. 442*946379e7Schristos 443*946379e7Schristos</P> 444*946379e7Schristos<P> 445*946379e7SchristosThe <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a 446*946379e7Schristosstring containing the name of the selected codeset. The string is 447*946379e7Schristosallocated internally in the function and must not be changed by the 448*946379e7Schristosuser. If the system went out of core during the execution of 449*946379e7Schristos<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the 450*946379e7Schristosglobal variable <VAR>errno</VAR> is set accordingly. 451*946379e7Schristos</DL> 452*946379e7Schristos 453*946379e7Schristos</P> 454*946379e7Schristos 455*946379e7Schristos 456*946379e7Schristos<H3><A NAME="SEC173" HREF="gettext_toc.html#TOC173">11.2.5 Using contexts for solving ambiguities</A></H3> 457*946379e7Schristos<P> 458*946379e7Schristos<A NAME="IDX1021"></A> 459*946379e7Schristos<A NAME="IDX1022"></A> 460*946379e7Schristos<A NAME="IDX1023"></A> 461*946379e7Schristos<A NAME="IDX1024"></A> 462*946379e7Schristos 463*946379e7Schristos</P> 464*946379e7Schristos<P> 465*946379e7SchristosOne place where the <CODE>gettext</CODE> functions, if used normally, have big 466*946379e7Schristosproblems is within programs with graphical user interfaces (GUIs). The 467*946379e7Schristosproblem is that many of the strings which have to be translated are very 468*946379e7Schristosshort. They have to appear in pull-down menus which restricts the 469*946379e7Schristoslength. But strings which are not containing entire sentences or at 470*946379e7Schristosleast large fragments of a sentence may appear in more than one 471*946379e7Schristossituation in the program but might have different translations. This is 472*946379e7Schristosespecially true for the one-word strings which are frequently used in 473*946379e7SchristosGUI programs. 474*946379e7Schristos 475*946379e7Schristos</P> 476*946379e7Schristos<P> 477*946379e7SchristosAs a consequence many people say that the <CODE>gettext</CODE> approach is 478*946379e7Schristoswrong and instead <CODE>catgets</CODE> should be used which indeed does not 479*946379e7Schristoshave this problem. But there is a very simple and powerful method to 480*946379e7Schristoshandle this kind of problems with the <CODE>gettext</CODE> functions. 481*946379e7Schristos 482*946379e7Schristos</P> 483*946379e7Schristos<P> 484*946379e7SchristosContexts can be added to strings to be translated. A context dependent 485*946379e7Schristostranslation lookup is when a translation for a given string is searched, 486*946379e7Schristosthat is limited to a given context. The translation for the same string 487*946379e7Schristosin a different context can be different. The different translations of 488*946379e7Schristosthe same string in different contexts can be stored in the in the same 489*946379e7SchristosMO file, and can be edited by the translator in the same PO file. 490*946379e7Schristos 491*946379e7Schristos</P> 492*946379e7Schristos<P> 493*946379e7SchristosThe <TT>‘gettext.h’</TT> include file contains the lookup macros for strings 494*946379e7Schristoswith contexts. They are implemented as thin macros and inline functions 495*946379e7Schristosover the functions from <CODE><libintl.h></CODE>. 496*946379e7Schristos 497*946379e7Schristos</P> 498*946379e7Schristos<P> 499*946379e7Schristos<A NAME="IDX1025"></A> 500*946379e7Schristos 501*946379e7Schristos<PRE> 502*946379e7Schristosconst char *pgettext (const char *msgctxt, const char *msgid); 503*946379e7Schristos</PRE> 504*946379e7Schristos 505*946379e7Schristos<P> 506*946379e7SchristosIn a call of this macro, <VAR>msgctxt</VAR> and <VAR>msgid</VAR> must be string 507*946379e7Schristosliterals. The macro returns the translation of <VAR>msgid</VAR>, restricted 508*946379e7Schristosto the context given by <VAR>msgctxt</VAR>. 509*946379e7Schristos 510*946379e7Schristos</P> 511*946379e7Schristos<P> 512*946379e7SchristosThe <VAR>msgctxt</VAR> string is visible in the PO file to the translator. 513*946379e7SchristosYou should try to make it somehow canonical and never changing. Because 514*946379e7Schristosevery time you change an <VAR>msgctxt</VAR>, the translator will have to review 515*946379e7Schristosthe translation of <VAR>msgid</VAR>. 516*946379e7Schristos 517*946379e7Schristos</P> 518*946379e7Schristos<P> 519*946379e7SchristosFinding a canonical <VAR>msgctxt</VAR> string that doesn't change over time can 520*946379e7Schristosbe hard. But you shouldn't use the file name or class name containing the 521*946379e7Schristos<CODE>pgettext</CODE> call -- because it is a common development task to rename 522*946379e7Schristosa file or a class, and it shouldn't cause translator work. Also you shouldn't 523*946379e7Schristosuse a comment in the form of a complete English sentence as <VAR>msgctxt</VAR> -- 524*946379e7Schristosbecause orthography or grammar changes are often applied to such sentences, 525*946379e7Schristosand again, it shouldn't force the translator to do a review. 526*946379e7Schristos 527*946379e7Schristos</P> 528*946379e7Schristos<P> 529*946379e7SchristosThe <SAMP>‘p’</SAMP> in <SAMP>‘pgettext’</SAMP> stands for “particular”: <CODE>pgettext</CODE> 530*946379e7Schristosfetches a particular translation of the <VAR>msgid</VAR>. 531*946379e7Schristos 532*946379e7Schristos</P> 533*946379e7Schristos<P> 534*946379e7Schristos<A NAME="IDX1026"></A> 535*946379e7Schristos<A NAME="IDX1027"></A> 536*946379e7Schristos 537*946379e7Schristos<PRE> 538*946379e7Schristosconst char *dpgettext (const char *domain_name, 539*946379e7Schristos const char *msgctxt, const char *msgid); 540*946379e7Schristosconst char *dcpgettext (const char *domain_name, 541*946379e7Schristos const char *msgctxt, const char *msgid, 542*946379e7Schristos int category); 543*946379e7Schristos</PRE> 544*946379e7Schristos 545*946379e7Schristos<P> 546*946379e7SchristosThese are generalizations of <CODE>pgettext</CODE>. They behave similarly to 547*946379e7Schristos<CODE>dgettext</CODE> and <CODE>dcgettext</CODE>, respectively. The <VAR>domain_name</VAR> 548*946379e7Schristosargument defines the translation domain. The <VAR>category</VAR> argument 549*946379e7Schristosallows to use another locale facet than <CODE>LC_MESSAGES</CODE>. 550*946379e7Schristos 551*946379e7Schristos</P> 552*946379e7Schristos<P> 553*946379e7SchristosAs as example consider the following fictional situation. A GUI program 554*946379e7Schristoshas a menu bar with the following entries: 555*946379e7Schristos 556*946379e7Schristos</P> 557*946379e7Schristos 558*946379e7Schristos<PRE> 559*946379e7Schristos+------------+------------+--------------------------------------+ 560*946379e7Schristos| File | Printer | | 561*946379e7Schristos+------------+------------+--------------------------------------+ 562*946379e7Schristos| Open | | Select | 563*946379e7Schristos| New | | Open | 564*946379e7Schristos+----------+ | Connect | 565*946379e7Schristos +----------+ 566*946379e7Schristos</PRE> 567*946379e7Schristos 568*946379e7Schristos<P> 569*946379e7SchristosTo have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>, 570*946379e7Schristos<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be 571*946379e7Schristosat some point in the code a call to a function of the <CODE>gettext</CODE> 572*946379e7Schristosfamily. But in two places the string passed into the function would be 573*946379e7Schristos<CODE>Open</CODE>. The translations might not be the same and therefore we 574*946379e7Schristosare in the dilemma described above. 575*946379e7Schristos 576*946379e7Schristos</P> 577*946379e7Schristos<P> 578*946379e7SchristosWhat distinguishes the two places is the menu path from the menu root to 579*946379e7Schristosthe particular menu entries: 580*946379e7Schristos 581*946379e7Schristos</P> 582*946379e7Schristos 583*946379e7Schristos<PRE> 584*946379e7SchristosMenu|File 585*946379e7SchristosMenu|Printer 586*946379e7SchristosMenu|File|Open 587*946379e7SchristosMenu|File|New 588*946379e7SchristosMenu|Printer|Select 589*946379e7SchristosMenu|Printer|Open 590*946379e7SchristosMenu|Printer|Connect 591*946379e7Schristos</PRE> 592*946379e7Schristos 593*946379e7Schristos<P> 594*946379e7SchristosThe context is thus the menu path without its last part. So, the calls 595*946379e7Schristoslook like this: 596*946379e7Schristos 597*946379e7Schristos</P> 598*946379e7Schristos 599*946379e7Schristos<PRE> 600*946379e7Schristospgettext ("Menu|", "File") 601*946379e7Schristospgettext ("Menu|", "Printer") 602*946379e7Schristospgettext ("Menu|File|", "Open") 603*946379e7Schristospgettext ("Menu|File|", "New") 604*946379e7Schristospgettext ("Menu|Printer|", "Select") 605*946379e7Schristospgettext ("Menu|Printer|", "Open") 606*946379e7Schristospgettext ("Menu|Printer|", "Connect") 607*946379e7Schristos</PRE> 608*946379e7Schristos 609*946379e7Schristos<P> 610*946379e7SchristosWhether or not to use the <SAMP>‘|’</SAMP> character at the end of the context is a 611*946379e7Schristosmatter of style. 612*946379e7Schristos 613*946379e7Schristos</P> 614*946379e7Schristos<P> 615*946379e7SchristosFor more complex cases, where the <VAR>msgctxt</VAR> or <VAR>msgid</VAR> are not 616*946379e7Schristosstring literals, more general macros are available: 617*946379e7Schristos 618*946379e7Schristos</P> 619*946379e7Schristos<P> 620*946379e7Schristos<A NAME="IDX1028"></A> 621*946379e7Schristos<A NAME="IDX1029"></A> 622*946379e7Schristos<A NAME="IDX1030"></A> 623*946379e7Schristos 624*946379e7Schristos<PRE> 625*946379e7Schristosconst char *pgettext_expr (const char *msgctxt, const char *msgid); 626*946379e7Schristosconst char *dpgettext_expr (const char *domain_name, 627*946379e7Schristos const char *msgctxt, const char *msgid); 628*946379e7Schristosconst char *dcpgettext_expr (const char *domain_name, 629*946379e7Schristos const char *msgctxt, const char *msgid, 630*946379e7Schristos int category); 631*946379e7Schristos</PRE> 632*946379e7Schristos 633*946379e7Schristos<P> 634*946379e7SchristosHere <VAR>msgctxt</VAR> and <VAR>msgid</VAR> can be arbitrary string-valued expressions. 635*946379e7SchristosThese macros are more general. But in the case that both argument expressions 636*946379e7Schristosare string literals, the macros without the <SAMP>‘_expr’</SAMP> suffix are more 637*946379e7Schristosefficient. 638*946379e7Schristos 639*946379e7Schristos</P> 640*946379e7Schristos 641*946379e7Schristos 642*946379e7Schristos<H3><A NAME="SEC174" HREF="gettext_toc.html#TOC174">11.2.6 Additional functions for plural forms</A></H3> 643*946379e7Schristos<P> 644*946379e7Schristos<A NAME="IDX1031"></A> 645*946379e7Schristos 646*946379e7Schristos</P> 647*946379e7Schristos<P> 648*946379e7SchristosThe functions of the <CODE>gettext</CODE> family described so far (and all the 649*946379e7Schristos<CODE>catgets</CODE> functions as well) have one problem in the real world 650*946379e7Schristoswhich have been neglected completely in all existing approaches. What 651*946379e7Schristosis meant here is the handling of plural forms. 652*946379e7Schristos 653*946379e7Schristos</P> 654*946379e7Schristos<P> 655*946379e7SchristosLooking through Unix source code before the time anybody thought about 656*946379e7Schristosinternationalization (and, sadly, even afterwards) one can often find 657*946379e7Schristoscode similar to the following: 658*946379e7Schristos 659*946379e7Schristos</P> 660*946379e7Schristos 661*946379e7Schristos<PRE> 662*946379e7Schristos printf ("%d file%s deleted", n, n == 1 ? "" : "s"); 663*946379e7Schristos</PRE> 664*946379e7Schristos 665*946379e7Schristos<P> 666*946379e7SchristosAfter the first complaints from people internationalizing the code people 667*946379e7Schristoseither completely avoided formulations like this or used strings like 668*946379e7Schristos<CODE>"file(s)"</CODE>. Both look unnatural and should be avoided. First 669*946379e7Schristostries to solve the problem correctly looked like this: 670*946379e7Schristos 671*946379e7Schristos</P> 672*946379e7Schristos 673*946379e7Schristos<PRE> 674*946379e7Schristos if (n == 1) 675*946379e7Schristos printf ("%d file deleted", n); 676*946379e7Schristos else 677*946379e7Schristos printf ("%d files deleted", n); 678*946379e7Schristos</PRE> 679*946379e7Schristos 680*946379e7Schristos<P> 681*946379e7SchristosBut this does not solve the problem. It helps languages where the 682*946379e7Schristosplural form of a noun is not simply constructed by adding an 683*946379e7Schristos‘s’ 684*946379e7Schristosbut that is all. Once again people fell into the trap of believing the 685*946379e7Schristosrules their language is using are universal. But the handling of plural 686*946379e7Schristosforms differs widely between the language families. For example, 687*946379e7SchristosRafal Maszkowski <CODE><rzm@mat.uni.torun.pl></CODE> reports: 688*946379e7Schristos 689*946379e7Schristos</P> 690*946379e7Schristos 691*946379e7Schristos<BLOCKQUOTE> 692*946379e7Schristos<P> 693*946379e7SchristosIn Polish we use e.g. plik (file) this way: 694*946379e7Schristos 695*946379e7Schristos<PRE> 696*946379e7Schristos1 plik 697*946379e7Schristos2,3,4 pliki 698*946379e7Schristos5-21 pliko'w 699*946379e7Schristos22-24 pliki 700*946379e7Schristos25-31 pliko'w 701*946379e7Schristos</PRE> 702*946379e7Schristos 703*946379e7Schristos<P> 704*946379e7Schristosand so on (o' means 8859-2 oacute which should be rather okreska, 705*946379e7Schristossimilar to aogonek). 706*946379e7Schristos</BLOCKQUOTE> 707*946379e7Schristos 708*946379e7Schristos<P> 709*946379e7SchristosThere are two things which can differ between languages (and even inside 710*946379e7Schristoslanguage families); 711*946379e7Schristos 712*946379e7Schristos</P> 713*946379e7Schristos 714*946379e7Schristos<UL> 715*946379e7Schristos<LI> 716*946379e7Schristos 717*946379e7SchristosThe form how plural forms are built differs. This is a problem with 718*946379e7Schristoslanguages which have many irregularities. German, for instance, is a 719*946379e7Schristosdrastic case. Though English and German are part of the same language 720*946379e7Schristosfamily (Germanic), the almost regular forming of plural noun forms 721*946379e7Schristos(appending an 722*946379e7Schristos‘s’) 723*946379e7Schristosis hardly found in German. 724*946379e7Schristos 725*946379e7Schristos<LI> 726*946379e7Schristos 727*946379e7SchristosThe number of plural forms differ. This is somewhat surprising for 728*946379e7Schristosthose who only have experiences with Romanic and Germanic languages 729*946379e7Schristossince here the number is the same (there are two). 730*946379e7Schristos 731*946379e7SchristosBut other language families have only one form or many forms. More 732*946379e7Schristosinformation on this in an extra section. 733*946379e7Schristos</UL> 734*946379e7Schristos 735*946379e7Schristos<P> 736*946379e7SchristosThe consequence of this is that application writers should not try to 737*946379e7Schristossolve the problem in their code. This would be localization since it is 738*946379e7Schristosonly usable for certain, hardcoded language environments. Instead the 739*946379e7Schristosextended <CODE>gettext</CODE> interface should be used. 740*946379e7Schristos 741*946379e7Schristos</P> 742*946379e7Schristos<P> 743*946379e7SchristosThese extra functions are taking instead of the one key string two 744*946379e7Schristosstrings and a numerical argument. The idea behind this is that using 745*946379e7Schristosthe numerical argument and the first string as a key, the implementation 746*946379e7Schristoscan select using rules specified by the translator the right plural 747*946379e7Schristosform. The two string arguments then will be used to provide a return 748*946379e7Schristosvalue in case no message catalog is found (similar to the normal 749*946379e7Schristos<CODE>gettext</CODE> behavior). In this case the rules for Germanic language 750*946379e7Schristosis used and it is assumed that the first string argument is the singular 751*946379e7Schristosform, the second the plural form. 752*946379e7Schristos 753*946379e7Schristos</P> 754*946379e7Schristos<P> 755*946379e7SchristosThis has the consequence that programs without language catalogs can 756*946379e7Schristosdisplay the correct strings only if the program itself is written using 757*946379e7Schristosa Germanic language. This is a limitation but since the GNU C library 758*946379e7Schristos(as well as the GNU <CODE>gettext</CODE> package) are written as part of the 759*946379e7SchristosGNU package and the coding standards for the GNU project require program 760*946379e7Schristosbeing written in English, this solution nevertheless fulfills its 761*946379e7Schristospurpose. 762*946379e7Schristos 763*946379e7Schristos</P> 764*946379e7Schristos<P> 765*946379e7Schristos<DL> 766*946379e7Schristos<DT><U>Function:</U> char * <B>ngettext</B> <I>(const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> 767*946379e7Schristos<DD><A NAME="IDX1032"></A> 768*946379e7SchristosThe <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function 769*946379e7Schristosas it finds the message catalogs in the same way. But it takes two 770*946379e7Schristosextra arguments. The <VAR>msgid1</VAR> parameter must contain the singular 771*946379e7Schristosform of the string to be converted. It is also used as the key for the 772*946379e7Schristossearch in the catalog. The <VAR>msgid2</VAR> parameter is the plural form. 773*946379e7SchristosThe parameter <VAR>n</VAR> is used to determine the plural form. If no 774*946379e7Schristosmessage catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>, 775*946379e7Schristosotherwise <CODE>msgid2</CODE>. 776*946379e7Schristos 777*946379e7Schristos</P> 778*946379e7Schristos<P> 779*946379e7SchristosAn example for the use of this function is: 780*946379e7Schristos 781*946379e7Schristos</P> 782*946379e7Schristos 783*946379e7Schristos<PRE> 784*946379e7Schristosprintf (ngettext ("%d file removed", "%d files removed", n), n); 785*946379e7Schristos</PRE> 786*946379e7Schristos 787*946379e7Schristos<P> 788*946379e7SchristosPlease note that the numeric value <VAR>n</VAR> has to be passed to the 789*946379e7Schristos<CODE>printf</CODE> function as well. It is not sufficient to pass it only to 790*946379e7Schristos<CODE>ngettext</CODE>. 791*946379e7Schristos 792*946379e7Schristos</P> 793*946379e7Schristos<P> 794*946379e7SchristosIn the English singular case, the number -- always 1 -- can be replaced with 795*946379e7Schristos"one": 796*946379e7Schristos 797*946379e7Schristos</P> 798*946379e7Schristos 799*946379e7Schristos<PRE> 800*946379e7Schristosprintf (ngettext ("One file removed", "%d files removed", n), n); 801*946379e7Schristos</PRE> 802*946379e7Schristos 803*946379e7Schristos<P> 804*946379e7SchristosThis works because the <SAMP>‘printf’</SAMP> function discards excess arguments that 805*946379e7Schristosare not consumed by the format string. 806*946379e7Schristos 807*946379e7Schristos</P> 808*946379e7Schristos<P> 809*946379e7SchristosIt is also possible to use this function when the strings don't contain a 810*946379e7Schristoscardinal number: 811*946379e7Schristos 812*946379e7Schristos</P> 813*946379e7Schristos 814*946379e7Schristos<PRE> 815*946379e7Schristosputs (ngettext ("Delete the selected file?", 816*946379e7Schristos "Delete the selected files?", 817*946379e7Schristos n)); 818*946379e7Schristos</PRE> 819*946379e7Schristos 820*946379e7Schristos<P> 821*946379e7SchristosIn this case the number <VAR>n</VAR> is only used to choose the plural form. 822*946379e7Schristos</DL> 823*946379e7Schristos 824*946379e7Schristos</P> 825*946379e7Schristos<P> 826*946379e7Schristos<DL> 827*946379e7Schristos<DT><U>Function:</U> char * <B>dngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> 828*946379e7Schristos<DD><A NAME="IDX1033"></A> 829*946379e7SchristosThe <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the 830*946379e7Schristosway the message catalog is selected. The difference is that it takes 831*946379e7Schristostwo extra parameter to provide the correct plural form. These two 832*946379e7Schristosparameters are handled in the same way <CODE>ngettext</CODE> handles them. 833*946379e7Schristos</DL> 834*946379e7Schristos 835*946379e7Schristos</P> 836*946379e7Schristos<P> 837*946379e7Schristos<DL> 838*946379e7Schristos<DT><U>Function:</U> char * <B>dcngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)</I> 839*946379e7Schristos<DD><A NAME="IDX1034"></A> 840*946379e7SchristosThe <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the 841*946379e7Schristosway the message catalog is selected. The difference is that it takes 842*946379e7Schristostwo extra parameter to provide the correct plural form. These two 843*946379e7Schristosparameters are handled in the same way <CODE>ngettext</CODE> handles them. 844*946379e7Schristos</DL> 845*946379e7Schristos 846*946379e7Schristos</P> 847*946379e7Schristos<P> 848*946379e7SchristosNow, how do these functions solve the problem of the plural forms? 849*946379e7SchristosWithout the input of linguists (which was not available) it was not 850*946379e7Schristospossible to determine whether there are only a few different forms in 851*946379e7Schristoswhich plural forms are formed or whether the number can increase with 852*946379e7Schristosevery new supported language. 853*946379e7Schristos 854*946379e7Schristos</P> 855*946379e7Schristos<P> 856*946379e7SchristosTherefore the solution implemented is to allow the translator to specify 857*946379e7Schristosthe rules of how to select the plural form. Since the formula varies 858*946379e7Schristoswith every language this is the only viable solution except for 859*946379e7Schristoshardcoding the information in the code (which still would require the 860*946379e7Schristospossibility of extensions to not prevent the use of new languages). 861*946379e7Schristos 862*946379e7Schristos</P> 863*946379e7Schristos<P> 864*946379e7Schristos<A NAME="IDX1035"></A> 865*946379e7Schristos<A NAME="IDX1036"></A> 866*946379e7Schristos<A NAME="IDX1037"></A> 867*946379e7SchristosThe information about the plural form selection has to be stored in the 868*946379e7Schristosheader entry of the PO file (the one with the empty <CODE>msgid</CODE> string). 869*946379e7SchristosThe plural form information looks like this: 870*946379e7Schristos 871*946379e7Schristos</P> 872*946379e7Schristos 873*946379e7Schristos<PRE> 874*946379e7SchristosPlural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; 875*946379e7Schristos</PRE> 876*946379e7Schristos 877*946379e7Schristos<P> 878*946379e7SchristosThe <CODE>nplurals</CODE> value must be a decimal number which specifies how 879*946379e7Schristosmany different plural forms exist for this language. The string 880*946379e7Schristosfollowing <CODE>plural</CODE> is an expression which is using the C language 881*946379e7Schristossyntax. Exceptions are that no negative numbers are allowed, numbers 882*946379e7Schristosmust be decimal, and the only variable allowed is <CODE>n</CODE>. Spaces are 883*946379e7Schristosallowed in the expression, but backslash-newlines are not; in the 884*946379e7Schristosexamples below the backslash-newlines are present for formatting purposes 885*946379e7Schristosonly. This expression will be evaluated whenever one of the functions 886*946379e7Schristos<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called. The 887*946379e7Schristosnumeric value passed to these functions is then substituted for all uses 888*946379e7Schristosof the variable <CODE>n</CODE> in the expression. The resulting value then 889*946379e7Schristosmust be greater or equal to zero and smaller than the value given as the 890*946379e7Schristosvalue of <CODE>nplurals</CODE>. 891*946379e7Schristos 892*946379e7Schristos</P> 893*946379e7Schristos<P> 894*946379e7Schristos<A NAME="IDX1038"></A> 895*946379e7SchristosThe following rules are known at this point. The language with families 896*946379e7Schristosare listed. But this does not necessarily mean the information can be 897*946379e7Schristosgeneralized for the whole family (as can be easily seen in the table 898*946379e7Schristosbelow).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A> 899*946379e7Schristos 900*946379e7Schristos</P> 901*946379e7Schristos<DL COMPACT> 902*946379e7Schristos 903*946379e7Schristos<DT>Only one form: 904*946379e7Schristos<DD> 905*946379e7SchristosSome languages only require one single form. There is no distinction 906*946379e7Schristosbetween the singular and plural form. An appropriate header entry 907*946379e7Schristoswould look like this: 908*946379e7Schristos 909*946379e7Schristos 910*946379e7Schristos<PRE> 911*946379e7SchristosPlural-Forms: nplurals=1; plural=0; 912*946379e7Schristos</PRE> 913*946379e7Schristos 914*946379e7SchristosLanguages with this property include: 915*946379e7Schristos 916*946379e7Schristos<DL COMPACT> 917*946379e7Schristos 918*946379e7Schristos<DT>Asian family 919*946379e7Schristos<DD> 920*946379e7SchristosJapanese, Korean, Vietnamese 921*946379e7Schristos<DT>Turkic/Altaic family 922*946379e7Schristos<DD> 923*946379e7SchristosTurkish 924*946379e7Schristos</DL> 925*946379e7Schristos 926*946379e7Schristos<DT>Two forms, singular used for one only 927*946379e7Schristos<DD> 928*946379e7SchristosThis is the form used in most existing programs since it is what English 929*946379e7Schristosis using. A header entry would look like this: 930*946379e7Schristos 931*946379e7Schristos 932*946379e7Schristos<PRE> 933*946379e7SchristosPlural-Forms: nplurals=2; plural=n != 1; 934*946379e7Schristos</PRE> 935*946379e7Schristos 936*946379e7Schristos(Note: this uses the feature of C expressions that boolean expressions 937*946379e7Schristoshave to value zero or one.) 938*946379e7Schristos 939*946379e7SchristosLanguages with this property include: 940*946379e7Schristos 941*946379e7Schristos<DL COMPACT> 942*946379e7Schristos 943*946379e7Schristos<DT>Germanic family 944*946379e7Schristos<DD> 945*946379e7SchristosDanish, Dutch, English, Faroese, German, Norwegian, Swedish 946*946379e7Schristos<DT>Finno-Ugric family 947*946379e7Schristos<DD> 948*946379e7SchristosEstonian, Finnish 949*946379e7Schristos<DT>Latin/Greek family 950*946379e7Schristos<DD> 951*946379e7SchristosGreek 952*946379e7Schristos<DT>Semitic family 953*946379e7Schristos<DD> 954*946379e7SchristosHebrew 955*946379e7Schristos<DT>Romanic family 956*946379e7Schristos<DD> 957*946379e7SchristosItalian, Portuguese, Spanish 958*946379e7Schristos<DT>Artificial 959*946379e7Schristos<DD> 960*946379e7SchristosEsperanto 961*946379e7Schristos</DL> 962*946379e7Schristos 963*946379e7SchristosAnother language using the same header entry is: 964*946379e7Schristos 965*946379e7Schristos<DL COMPACT> 966*946379e7Schristos 967*946379e7Schristos<DT>Finno-Ugric family 968*946379e7Schristos<DD> 969*946379e7SchristosHungarian 970*946379e7Schristos</DL> 971*946379e7Schristos 972*946379e7SchristosHungarian does not appear to have a plural if you look at sentences involving 973*946379e7Schristoscardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is 974*946379e7Schristos“123 alma”. But when the number is not explicit, the distinction between 975*946379e7Schristossingular and plural exists: “the apple” is “az alma”, and “the apples” is 976*946379e7Schristos“az alm'{a}k”. Since <CODE>ngettext</CODE> has to support both types of sentences, 977*946379e7Schristosit is classified here, under “two forms”. 978*946379e7Schristos 979*946379e7Schristos<DT>Two forms, singular used for zero and one 980*946379e7Schristos<DD> 981*946379e7SchristosExceptional case in the language family. The header entry would be: 982*946379e7Schristos 983*946379e7Schristos 984*946379e7Schristos<PRE> 985*946379e7SchristosPlural-Forms: nplurals=2; plural=n>1; 986*946379e7Schristos</PRE> 987*946379e7Schristos 988*946379e7SchristosLanguages with this property include: 989*946379e7Schristos 990*946379e7Schristos<DL COMPACT> 991*946379e7Schristos 992*946379e7Schristos<DT>Romanic family 993*946379e7Schristos<DD> 994*946379e7SchristosFrench, Brazilian Portuguese 995*946379e7Schristos</DL> 996*946379e7Schristos 997*946379e7Schristos<DT>Three forms, special case for zero 998*946379e7Schristos<DD> 999*946379e7SchristosThe header entry would be: 1000*946379e7Schristos 1001*946379e7Schristos 1002*946379e7Schristos<PRE> 1003*946379e7SchristosPlural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; 1004*946379e7Schristos</PRE> 1005*946379e7Schristos 1006*946379e7SchristosLanguages with this property include: 1007*946379e7Schristos 1008*946379e7Schristos<DL COMPACT> 1009*946379e7Schristos 1010*946379e7Schristos<DT>Baltic family 1011*946379e7Schristos<DD> 1012*946379e7SchristosLatvian 1013*946379e7Schristos</DL> 1014*946379e7Schristos 1015*946379e7Schristos<DT>Three forms, special cases for one and two 1016*946379e7Schristos<DD> 1017*946379e7SchristosThe header entry would be: 1018*946379e7Schristos 1019*946379e7Schristos 1020*946379e7Schristos<PRE> 1021*946379e7SchristosPlural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; 1022*946379e7Schristos</PRE> 1023*946379e7Schristos 1024*946379e7SchristosLanguages with this property include: 1025*946379e7Schristos 1026*946379e7Schristos<DL COMPACT> 1027*946379e7Schristos 1028*946379e7Schristos<DT>Celtic 1029*946379e7Schristos<DD> 1030*946379e7SchristosGaeilge (Irish) 1031*946379e7Schristos</DL> 1032*946379e7Schristos 1033*946379e7Schristos<DT>Three forms, special case for numbers ending in 00 or [2-9][0-9] 1034*946379e7Schristos<DD> 1035*946379e7SchristosThe header entry would be: 1036*946379e7Schristos 1037*946379e7Schristos 1038*946379e7Schristos<PRE> 1039*946379e7SchristosPlural-Forms: nplurals=3; \ 1040*946379e7Schristos plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; 1041*946379e7Schristos</PRE> 1042*946379e7Schristos 1043*946379e7SchristosLanguages with this property include: 1044*946379e7Schristos 1045*946379e7Schristos<DL COMPACT> 1046*946379e7Schristos 1047*946379e7Schristos<DT>Romanic family 1048*946379e7Schristos<DD> 1049*946379e7SchristosRomanian 1050*946379e7Schristos</DL> 1051*946379e7Schristos 1052*946379e7Schristos<DT>Three forms, special case for numbers ending in 1[2-9] 1053*946379e7Schristos<DD> 1054*946379e7SchristosThe header entry would look like this: 1055*946379e7Schristos 1056*946379e7Schristos 1057*946379e7Schristos<PRE> 1058*946379e7SchristosPlural-Forms: nplurals=3; \ 1059*946379e7Schristos plural=n%10==1 && n%100!=11 ? 0 : \ 1060*946379e7Schristos n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; 1061*946379e7Schristos</PRE> 1062*946379e7Schristos 1063*946379e7SchristosLanguages with this property include: 1064*946379e7Schristos 1065*946379e7Schristos<DL COMPACT> 1066*946379e7Schristos 1067*946379e7Schristos<DT>Baltic family 1068*946379e7Schristos<DD> 1069*946379e7SchristosLithuanian 1070*946379e7Schristos</DL> 1071*946379e7Schristos 1072*946379e7Schristos<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] 1073*946379e7Schristos<DD> 1074*946379e7SchristosThe header entry would look like this: 1075*946379e7Schristos 1076*946379e7Schristos 1077*946379e7Schristos<PRE> 1078*946379e7SchristosPlural-Forms: nplurals=3; \ 1079*946379e7Schristos plural=n%10==1 && n%100!=11 ? 0 : \ 1080*946379e7Schristos n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 1081*946379e7Schristos</PRE> 1082*946379e7Schristos 1083*946379e7SchristosLanguages with this property include: 1084*946379e7Schristos 1085*946379e7Schristos<DL COMPACT> 1086*946379e7Schristos 1087*946379e7Schristos<DT>Slavic family 1088*946379e7Schristos<DD> 1089*946379e7SchristosCroatian, Serbian, Russian, Ukrainian 1090*946379e7Schristos</DL> 1091*946379e7Schristos 1092*946379e7Schristos<DT>Three forms, special cases for 1 and 2, 3, 4 1093*946379e7Schristos<DD> 1094*946379e7SchristosThe header entry would look like this: 1095*946379e7Schristos 1096*946379e7Schristos 1097*946379e7Schristos<PRE> 1098*946379e7SchristosPlural-Forms: nplurals=3; \ 1099*946379e7Schristos plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; 1100*946379e7Schristos</PRE> 1101*946379e7Schristos 1102*946379e7SchristosLanguages with this property include: 1103*946379e7Schristos 1104*946379e7Schristos<DL COMPACT> 1105*946379e7Schristos 1106*946379e7Schristos<DT>Slavic family 1107*946379e7Schristos<DD> 1108*946379e7SchristosSlovak, Czech 1109*946379e7Schristos</DL> 1110*946379e7Schristos 1111*946379e7Schristos<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4 1112*946379e7Schristos<DD> 1113*946379e7SchristosThe header entry would look like this: 1114*946379e7Schristos 1115*946379e7Schristos 1116*946379e7Schristos<PRE> 1117*946379e7SchristosPlural-Forms: nplurals=3; \ 1118*946379e7Schristos plural=n==1 ? 0 : \ 1119*946379e7Schristos n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 1120*946379e7Schristos</PRE> 1121*946379e7Schristos 1122*946379e7SchristosLanguages with this property include: 1123*946379e7Schristos 1124*946379e7Schristos<DL COMPACT> 1125*946379e7Schristos 1126*946379e7Schristos<DT>Slavic family 1127*946379e7Schristos<DD> 1128*946379e7SchristosPolish 1129*946379e7Schristos</DL> 1130*946379e7Schristos 1131*946379e7Schristos<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04 1132*946379e7Schristos<DD> 1133*946379e7SchristosThe header entry would look like this: 1134*946379e7Schristos 1135*946379e7Schristos 1136*946379e7Schristos<PRE> 1137*946379e7SchristosPlural-Forms: nplurals=4; \ 1138*946379e7Schristos plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; 1139*946379e7Schristos</PRE> 1140*946379e7Schristos 1141*946379e7SchristosLanguages with this property include: 1142*946379e7Schristos 1143*946379e7Schristos<DL COMPACT> 1144*946379e7Schristos 1145*946379e7Schristos<DT>Slavic family 1146*946379e7Schristos<DD> 1147*946379e7SchristosSlovenian 1148*946379e7Schristos</DL> 1149*946379e7Schristos</DL> 1150*946379e7Schristos 1151*946379e7Schristos<P> 1152*946379e7SchristosYou might now ask, <CODE>ngettext</CODE> handles only numbers <VAR>n</VAR> of type 1153*946379e7Schristos<SAMP>‘unsigned long’</SAMP>. What about larger integer types? What about negative 1154*946379e7Schristosnumbers? What about floating-point numbers? 1155*946379e7Schristos 1156*946379e7Schristos</P> 1157*946379e7Schristos<P> 1158*946379e7SchristosAbout larger integer types, such as <SAMP>‘uintmax_t’</SAMP> or 1159*946379e7Schristos<SAMP>‘unsigned long long’</SAMP>: they can be handled by reducing the value to a 1160*946379e7Schristosrange that fits in an <SAMP>‘unsigned long’</SAMP>. Simply casting the value to 1161*946379e7Schristos<SAMP>‘unsigned long’</SAMP> would not do the right thing, since it would treat 1162*946379e7Schristos<CODE>ULONG_MAX + 1</CODE> like zero, <CODE>ULONG_MAX + 2</CODE> like singular, and 1163*946379e7Schristosthe like. Here you can exploit the fact that all mentioned plural form 1164*946379e7Schristosformulas eventually become periodic, with a period that is a divisor of 100 1165*946379e7Schristos(or 1000 or 1000000). So, when you reduce a large value to another one in 1166*946379e7Schristosthe range [1000000, 1999999] that ends in the same 6 decimal digits, you 1167*946379e7Schristoscan assume that it will lead to the same plural form selection. This code 1168*946379e7Schristosdoes this: 1169*946379e7Schristos 1170*946379e7Schristos</P> 1171*946379e7Schristos 1172*946379e7Schristos<PRE> 1173*946379e7Schristos#include <inttypes.h> 1174*946379e7Schristosuintmax_t nbytes = ...; 1175*946379e7Schristosprintf (ngettext ("The file has %"PRIuMAX" byte.", 1176*946379e7Schristos "The file has %"PRIuMAX" bytes.", 1177*946379e7Schristos (nbytes > ULONG_MAX 1178*946379e7Schristos ? (nbytes % 1000000) + 1000000 1179*946379e7Schristos : nbytes)), 1180*946379e7Schristos nbytes); 1181*946379e7Schristos</PRE> 1182*946379e7Schristos 1183*946379e7Schristos<P> 1184*946379e7SchristosNegative and floating-point values usually represent physical entities for 1185*946379e7Schristoswhich singular and plural don't clearly apply. In such cases, there is no 1186*946379e7Schristosneed to use <CODE>ngettext</CODE>; a simple <CODE>gettext</CODE> call with a form suitable 1187*946379e7Schristosfor all values will do. For example: 1188*946379e7Schristos 1189*946379e7Schristos</P> 1190*946379e7Schristos 1191*946379e7Schristos<PRE> 1192*946379e7Schristosprintf (gettext ("Time elapsed: %.3f seconds"), 1193*946379e7Schristos num_milliseconds * 0.001); 1194*946379e7Schristos</PRE> 1195*946379e7Schristos 1196*946379e7Schristos<P> 1197*946379e7SchristosEven if <VAR>num_milliseconds</VAR> happens to be a multiple of 1000, the output 1198*946379e7Schristos 1199*946379e7Schristos<PRE> 1200*946379e7SchristosTime elapsed: 1.000 seconds 1201*946379e7Schristos</PRE> 1202*946379e7Schristos 1203*946379e7Schristos<P> 1204*946379e7Schristosis acceptable in English, and similarly for other languages. 1205*946379e7Schristos 1206*946379e7Schristos</P> 1207*946379e7Schristos 1208*946379e7Schristos 1209*946379e7Schristos<H3><A NAME="SEC175" HREF="gettext_toc.html#TOC175">11.2.7 Optimization of the *gettext functions</A></H3> 1210*946379e7Schristos<P> 1211*946379e7Schristos<A NAME="IDX1039"></A> 1212*946379e7Schristos 1213*946379e7Schristos</P> 1214*946379e7Schristos<P> 1215*946379e7SchristosAt this point of the discussion we should talk about an advantage of the 1216*946379e7SchristosGNU <CODE>gettext</CODE> implementation. Some readers might have pointed out 1217*946379e7Schristosthat an internationalized program might have a poor performance if some 1218*946379e7Schristosstring has to be translated in an inner loop. While this is unavoidable 1219*946379e7Schristoswhen the string varies from one run of the loop to the other it is 1220*946379e7Schristossimply a waste of time when the string is always the same. Take the 1221*946379e7Schristosfollowing example: 1222*946379e7Schristos 1223*946379e7Schristos</P> 1224*946379e7Schristos 1225*946379e7Schristos<PRE> 1226*946379e7Schristos{ 1227*946379e7Schristos while (...) 1228*946379e7Schristos { 1229*946379e7Schristos puts (gettext ("Hello world")); 1230*946379e7Schristos } 1231*946379e7Schristos} 1232*946379e7Schristos</PRE> 1233*946379e7Schristos 1234*946379e7Schristos<P> 1235*946379e7SchristosWhen the locale selection does not change between two runs the resulting 1236*946379e7Schristosstring is always the same. One way to use this is: 1237*946379e7Schristos 1238*946379e7Schristos</P> 1239*946379e7Schristos 1240*946379e7Schristos<PRE> 1241*946379e7Schristos{ 1242*946379e7Schristos str = gettext ("Hello world"); 1243*946379e7Schristos while (...) 1244*946379e7Schristos { 1245*946379e7Schristos puts (str); 1246*946379e7Schristos } 1247*946379e7Schristos} 1248*946379e7Schristos</PRE> 1249*946379e7Schristos 1250*946379e7Schristos<P> 1251*946379e7SchristosBut this solution is not usable in all situation (e.g. when the locale 1252*946379e7Schristosselection changes) nor does it lead to legible code. 1253*946379e7Schristos 1254*946379e7Schristos</P> 1255*946379e7Schristos<P> 1256*946379e7SchristosFor this reason, GNU <CODE>gettext</CODE> caches previous translation results. 1257*946379e7SchristosWhen the same translation is requested twice, with no new message 1258*946379e7Schristoscatalogs being loaded in between, <CODE>gettext</CODE> will, the second time, 1259*946379e7Schristosfind the result through a single cache lookup. 1260*946379e7Schristos 1261*946379e7Schristos</P> 1262*946379e7Schristos 1263*946379e7Schristos 1264*946379e7Schristos<H2><A NAME="SEC176" HREF="gettext_toc.html#TOC176">11.3 Comparing the Two Interfaces</A></H2> 1265*946379e7Schristos<P> 1266*946379e7Schristos<A NAME="IDX1040"></A> 1267*946379e7Schristos<A NAME="IDX1041"></A> 1268*946379e7Schristos 1269*946379e7Schristos</P> 1270*946379e7Schristos 1271*946379e7Schristos<P> 1272*946379e7SchristosThe following discussion is perhaps a little bit colored. As said 1273*946379e7Schristosabove we implemented GNU <CODE>gettext</CODE> following the Uniforum 1274*946379e7Schristosproposal and this surely has its reasons. But it should show how we 1275*946379e7Schristoscame to this decision. 1276*946379e7Schristos 1277*946379e7Schristos</P> 1278*946379e7Schristos<P> 1279*946379e7SchristosFirst we take a look at the developing process. When we write an 1280*946379e7Schristosapplication using NLS provided by <CODE>gettext</CODE> we proceed as always. 1281*946379e7SchristosOnly when we come to a string which might be seen by the users and thus 1282*946379e7Schristoshas to be translated we use <CODE>gettext("...")</CODE> instead of 1283*946379e7Schristos<CODE>"..."</CODE>. At the beginning of each source file (or in a central 1284*946379e7Schristosheader file) we define 1285*946379e7Schristos 1286*946379e7Schristos</P> 1287*946379e7Schristos 1288*946379e7Schristos<PRE> 1289*946379e7Schristos#define gettext(String) (String) 1290*946379e7Schristos</PRE> 1291*946379e7Schristos 1292*946379e7Schristos<P> 1293*946379e7SchristosEven this definition can be avoided when the system supports the 1294*946379e7Schristos<CODE>gettext</CODE> function in its C library. When we compile this code the 1295*946379e7Schristosresult is the same as if no NLS code is used. When you take a look at 1296*946379e7Schristosthe GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE> 1297*946379e7Schristosinstead of <CODE>gettext("...")</CODE>. This reduces the number of 1298*946379e7Schristosadditional characters per translatable string to <EM>3</EM> (in words: 1299*946379e7Schristosthree). 1300*946379e7Schristos 1301*946379e7Schristos</P> 1302*946379e7Schristos<P> 1303*946379e7SchristosWhen now a production version of the program is needed we simply replace 1304*946379e7Schristosthe definition 1305*946379e7Schristos 1306*946379e7Schristos</P> 1307*946379e7Schristos 1308*946379e7Schristos<PRE> 1309*946379e7Schristos#define _(String) (String) 1310*946379e7Schristos</PRE> 1311*946379e7Schristos 1312*946379e7Schristos<P> 1313*946379e7Schristosby 1314*946379e7Schristos 1315*946379e7Schristos</P> 1316*946379e7Schristos<P> 1317*946379e7Schristos<A NAME="IDX1042"></A> 1318*946379e7Schristos 1319*946379e7Schristos<PRE> 1320*946379e7Schristos#include <libintl.h> 1321*946379e7Schristos#define _(String) gettext (String) 1322*946379e7Schristos</PRE> 1323*946379e7Schristos 1324*946379e7Schristos<P> 1325*946379e7SchristosAdditionally we run the program <TT>‘xgettext’</TT> on all source code file 1326*946379e7Schristoswhich contain translatable strings and that's it: we have a running 1327*946379e7Schristosprogram which does not depend on translations to be available, but which 1328*946379e7Schristoscan use any that becomes available. 1329*946379e7Schristos 1330*946379e7Schristos</P> 1331*946379e7Schristos<P> 1332*946379e7Schristos<A NAME="IDX1043"></A> 1333*946379e7SchristosThe same procedure can be done for the <CODE>gettext_noop</CODE> invocations 1334*946379e7Schristos(see section <A HREF="gettext_4.html#SEC18">4.7 Special Cases of Translatable Strings</A>). One usually defines <CODE>gettext_noop</CODE> as a 1335*946379e7Schristosno-op macro. So you should consider the following code for your project: 1336*946379e7Schristos 1337*946379e7Schristos</P> 1338*946379e7Schristos 1339*946379e7Schristos<PRE> 1340*946379e7Schristos#define gettext_noop(String) String 1341*946379e7Schristos#define N_(String) gettext_noop (String) 1342*946379e7Schristos</PRE> 1343*946379e7Schristos 1344*946379e7Schristos<P> 1345*946379e7Schristos<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>‘Makefile’</TT> in 1346*946379e7Schristosthe <TT>‘po/’</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the 1347*946379e7Schristosmentioned short forms so you are invited to follow this proposal for 1348*946379e7Schristosyour own ease. 1349*946379e7Schristos 1350*946379e7Schristos</P> 1351*946379e7Schristos<P> 1352*946379e7SchristosNow to <CODE>catgets</CODE>. The main problem is the work for the 1353*946379e7Schristosprogrammer. Every time he comes to a translatable string he has to 1354*946379e7Schristosdefine a number (or a symbolic constant) which has also be defined in 1355*946379e7Schristosthe message catalog file. He also has to take care for duplicate 1356*946379e7Schristosentries, duplicate message IDs etc. If he wants to have the same 1357*946379e7Schristosquality in the message catalog as the GNU <CODE>gettext</CODE> program 1358*946379e7Schristosprovides he also has to put the descriptive comments for the strings and 1359*946379e7Schristosthe location in all source code files in the message catalog. This is 1360*946379e7Schristosnearly a Mission: Impossible. 1361*946379e7Schristos 1362*946379e7Schristos</P> 1363*946379e7Schristos<P> 1364*946379e7SchristosBut there are also some points people might call advantages speaking for 1365*946379e7Schristos<CODE>catgets</CODE>. If you have a single word in a string and this string 1366*946379e7Schristosis used in different contexts it is likely that in one or the other 1367*946379e7Schristoslanguage the word has different translations. Example: 1368*946379e7Schristos 1369*946379e7Schristos</P> 1370*946379e7Schristos 1371*946379e7Schristos<PRE> 1372*946379e7Schristosprintf ("%s: %d", gettext ("number"), number_of_errors) 1373*946379e7Schristos 1374*946379e7Schristosprintf ("you should see %d %s", number_count, 1375*946379e7Schristos number_count == 1 ? gettext ("number") : gettext ("numbers")) 1376*946379e7Schristos</PRE> 1377*946379e7Schristos 1378*946379e7Schristos<P> 1379*946379e7SchristosHere we have to translate two times the string <CODE>"number"</CODE>. Even 1380*946379e7Schristosif you do not speak a language beside English it might be possible to 1381*946379e7Schristosrecognize that the two words have a different meaning. In German the 1382*946379e7Schristosfirst appearance has to be translated to <CODE>"Anzahl"</CODE> and the second 1383*946379e7Schristosto <CODE>"Zahl"</CODE>. 1384*946379e7Schristos 1385*946379e7Schristos</P> 1386*946379e7Schristos<P> 1387*946379e7SchristosNow you can say that this example is really esoteric. And you are 1388*946379e7Schristosright! This is exactly how we felt about this problem and decide that 1389*946379e7Schristosit does not weight that much. The solution for the above problem could 1390*946379e7Schristosbe very easy: 1391*946379e7Schristos 1392*946379e7Schristos</P> 1393*946379e7Schristos 1394*946379e7Schristos<PRE> 1395*946379e7Schristosprintf ("%s %d", gettext ("number:"), number_of_errors) 1396*946379e7Schristos 1397*946379e7Schristosprintf (number_count == 1 ? gettext ("you should see %d number") 1398*946379e7Schristos : gettext ("you should see %d numbers"), 1399*946379e7Schristos number_count) 1400*946379e7Schristos</PRE> 1401*946379e7Schristos 1402*946379e7Schristos<P> 1403*946379e7SchristosWe believe that we can solve all conflicts with this method. If it is 1404*946379e7Schristosdifficult one can also consider changing one of the conflicting string a 1405*946379e7Schristoslittle bit. But it is not impossible to overcome. 1406*946379e7Schristos 1407*946379e7Schristos</P> 1408*946379e7Schristos<P> 1409*946379e7Schristos<CODE>catgets</CODE> allows same original entry to have different translations, 1410*946379e7Schristosbut <CODE>gettext</CODE> has another, scalable approach for solving ambiguities 1411*946379e7Schristosof this kind: See section <A HREF="gettext_11.html#SEC170">11.2.2 Solving Ambiguities</A>. 1412*946379e7Schristos 1413*946379e7Schristos</P> 1414*946379e7Schristos 1415*946379e7Schristos 1416*946379e7Schristos<H2><A NAME="SEC177" HREF="gettext_toc.html#TOC177">11.4 Using libintl.a in own programs</A></H2> 1417*946379e7Schristos 1418*946379e7Schristos<P> 1419*946379e7SchristosStarting with version 0.9.4 the library <CODE>libintl.h</CODE> should be 1420*946379e7Schristosself-contained. I.e., you can use it in your own programs without 1421*946379e7Schristosproviding additional functions. The <TT>‘Makefile’</TT> will put the header 1422*946379e7Schristosand the library in directories selected using the <CODE>$(prefix)</CODE>. 1423*946379e7Schristos 1424*946379e7Schristos</P> 1425*946379e7Schristos 1426*946379e7Schristos 1427*946379e7Schristos<H2><A NAME="SEC178" HREF="gettext_toc.html#TOC178">11.5 Being a <CODE>gettext</CODE> grok</A></H2> 1428*946379e7Schristos 1429*946379e7Schristos<P> 1430*946379e7Schristos<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be 1431*946379e7Schristosrevised. 1432*946379e7Schristos 1433*946379e7Schristos</P> 1434*946379e7Schristos<P> 1435*946379e7SchristosTo fully exploit the functionality of the GNU <CODE>gettext</CODE> library it 1436*946379e7Schristosis surely helpful to read the source code. But for those who don't want 1437*946379e7Schristosto spend that much time in reading the (sometimes complicated) code here 1438*946379e7Schristosis a list comments: 1439*946379e7Schristos 1440*946379e7Schristos</P> 1441*946379e7Schristos 1442*946379e7Schristos<UL> 1443*946379e7Schristos<LI>Changing the language at runtime 1444*946379e7Schristos 1445*946379e7Schristos<A NAME="IDX1044"></A> 1446*946379e7Schristos 1447*946379e7SchristosFor interactive programs it might be useful to offer a selection of the 1448*946379e7Schristosused language at runtime. To understand how to do this one need to know 1449*946379e7Schristoshow the used language is determined while executing the <CODE>gettext</CODE> 1450*946379e7Schristosfunction. The method which is presented here only works correctly 1451*946379e7Schristoswith the GNU implementation of the <CODE>gettext</CODE> functions. 1452*946379e7Schristos 1453*946379e7SchristosIn the function <CODE>dcgettext</CODE> at every call the current setting of 1454*946379e7Schristosthe highest priority environment variable is determined and used. 1455*946379e7SchristosHighest priority means here the following list with decreasing 1456*946379e7Schristospriority: 1457*946379e7Schristos 1458*946379e7Schristos 1459*946379e7Schristos<OL> 1460*946379e7Schristos<LI><CODE>LANGUAGE</CODE> 1461*946379e7Schristos 1462*946379e7Schristos<A NAME="IDX1045"></A> 1463*946379e7Schristos 1464*946379e7Schristos<A NAME="IDX1046"></A> 1465*946379e7Schristos<LI><CODE>LC_ALL</CODE> 1466*946379e7Schristos 1467*946379e7Schristos<A NAME="IDX1047"></A> 1468*946379e7Schristos<A NAME="IDX1048"></A> 1469*946379e7Schristos<A NAME="IDX1049"></A> 1470*946379e7Schristos<A NAME="IDX1050"></A> 1471*946379e7Schristos<A NAME="IDX1051"></A> 1472*946379e7Schristos<A NAME="IDX1052"></A> 1473*946379e7Schristos<LI><CODE>LC_xxx</CODE>, according to selected locale 1474*946379e7Schristos 1475*946379e7Schristos<A NAME="IDX1053"></A> 1476*946379e7Schristos<LI><CODE>LANG</CODE> 1477*946379e7Schristos 1478*946379e7Schristos</OL> 1479*946379e7Schristos 1480*946379e7SchristosAfterwards the path is constructed using the found value and the 1481*946379e7Schristostranslation file is loaded if available. 1482*946379e7Schristos 1483*946379e7SchristosWhat happens now when the value for, say, <CODE>LANGUAGE</CODE> changes? According 1484*946379e7Schristosto the process explained above the new value of this variable is found 1485*946379e7Schristosas soon as the <CODE>dcgettext</CODE> function is called. But this also means 1486*946379e7Schristosthe (perhaps) different message catalog file is loaded. In other 1487*946379e7Schristoswords: the used language is changed. 1488*946379e7Schristos 1489*946379e7SchristosBut there is one little hook. The code for gcc-2.7.0 and up provides 1490*946379e7Schristossome optimization. This optimization normally prevents the calling of 1491*946379e7Schristosthe <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But 1492*946379e7Schristosif <CODE>dcgettext</CODE> is not called the program also cannot find the 1493*946379e7Schristos<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_11.html#SEC175">11.2.7 Optimization of the *gettext functions</A>). A 1494*946379e7Schristossolution for this is very easy. Include the following code in the 1495*946379e7Schristoslanguage switching function. 1496*946379e7Schristos 1497*946379e7Schristos 1498*946379e7Schristos<PRE> 1499*946379e7Schristos /* Change language. */ 1500*946379e7Schristos setenv ("LANGUAGE", "fr", 1); 1501*946379e7Schristos 1502*946379e7Schristos /* Make change known. */ 1503*946379e7Schristos { 1504*946379e7Schristos extern int _nl_msg_cat_cntr; 1505*946379e7Schristos ++_nl_msg_cat_cntr; 1506*946379e7Schristos } 1507*946379e7Schristos</PRE> 1508*946379e7Schristos 1509*946379e7Schristos<A NAME="IDX1054"></A> 1510*946379e7SchristosThe variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>‘loadmsgcat.c’</TT>. 1511*946379e7SchristosYou don't need to know what this is for. But it can be used to detect 1512*946379e7Schristoswhether a <CODE>gettext</CODE> implementation is GNU gettext and not non-GNU 1513*946379e7Schristossystem's native gettext implementation. 1514*946379e7Schristos 1515*946379e7Schristos</UL> 1516*946379e7Schristos 1517*946379e7Schristos 1518*946379e7Schristos 1519*946379e7Schristos<H2><A NAME="SEC179" HREF="gettext_toc.html#TOC179">11.6 Temporary Notes for the Programmers Chapter</A></H2> 1520*946379e7Schristos 1521*946379e7Schristos<P> 1522*946379e7Schristos<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be 1523*946379e7Schristosrevised. 1524*946379e7Schristos 1525*946379e7Schristos</P> 1526*946379e7Schristos 1527*946379e7Schristos 1528*946379e7Schristos 1529*946379e7Schristos<H3><A NAME="SEC180" HREF="gettext_toc.html#TOC180">11.6.1 Temporary - Two Possible Implementations</A></H3> 1530*946379e7Schristos 1531*946379e7Schristos<P> 1532*946379e7SchristosThere are two competing methods for language independent messages: 1533*946379e7Schristosthe X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE> 1534*946379e7Schristosmethod. The <CODE>catgets</CODE> method indexes messages by integers; the 1535*946379e7Schristos<CODE>gettext</CODE> method indexes them by their English translations. 1536*946379e7SchristosThe <CODE>catgets</CODE> method has been around longer and is supported 1537*946379e7Schristosby more vendors. The <CODE>gettext</CODE> method is supported by Sun, 1538*946379e7Schristosand it has been heard that the COSE multi-vendor initiative is 1539*946379e7Schristossupporting it. Neither method is a POSIX standard; the POSIX.1 1540*946379e7Schristoscommittee had a lot of disagreement in this area. 1541*946379e7Schristos 1542*946379e7Schristos</P> 1543*946379e7Schristos<P> 1544*946379e7SchristosNeither one is in the POSIX standard. There was much disagreement 1545*946379e7Schristosin the POSIX.1 committee about using the <CODE>gettext</CODE> routines 1546*946379e7Schristosvs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't 1547*946379e7Schristosagree on anything, so no messaging system was included as part 1548*946379e7Schristosof the standard. I believe the informative annex of the standard 1549*946379e7Schristosincludes the XPG3 messaging interfaces, “...as an example of 1550*946379e7Schristosa messaging system that has been implemented...” 1551*946379e7Schristos 1552*946379e7Schristos</P> 1553*946379e7Schristos<P> 1554*946379e7SchristosThey were very careful not to say anywhere that you should use one 1555*946379e7Schristosset of interfaces over the other. For more on this topic please 1556*946379e7Schristossee the Programming for Internationalization FAQ. 1557*946379e7Schristos 1558*946379e7Schristos</P> 1559*946379e7Schristos 1560*946379e7Schristos 1561*946379e7Schristos<H3><A NAME="SEC181" HREF="gettext_toc.html#TOC181">11.6.2 Temporary - About <CODE>catgets</CODE></A></H3> 1562*946379e7Schristos 1563*946379e7Schristos<P> 1564*946379e7SchristosThere have been a few discussions of late on the use of 1565*946379e7Schristos<CODE>catgets</CODE> as a base. I think it important to present both 1566*946379e7Schristossides of the argument and hence am opting to play devil's advocate 1567*946379e7Schristosfor a little bit. 1568*946379e7Schristos 1569*946379e7Schristos</P> 1570*946379e7Schristos<P> 1571*946379e7SchristosI'll not deny the fact that <CODE>catgets</CODE> could have been designed 1572*946379e7Schristosa lot better. It currently has quite a number of limitations and 1573*946379e7Schristosthese have already been pointed out. 1574*946379e7Schristos 1575*946379e7Schristos</P> 1576*946379e7Schristos<P> 1577*946379e7SchristosHowever there is a great deal to be said for consistency and 1578*946379e7Schristosstandardization. A common recurring problem when writing Unix 1579*946379e7Schristossoftware is the myriad portability problems across Unix platforms. 1580*946379e7SchristosIt seems as if every Unix vendor had a look at the operating system 1581*946379e7Schristosand found parts they could improve upon. Undoubtedly, these 1582*946379e7Schristosmodifications are probably innovative and solve real problems. 1583*946379e7SchristosHowever, software developers have a hard time keeping up with all 1584*946379e7Schristosthese changes across so many platforms. 1585*946379e7Schristos 1586*946379e7Schristos</P> 1587*946379e7Schristos<P> 1588*946379e7SchristosAnd this has prompted the Unix vendors to begin to standardize their 1589*946379e7Schristossystems. Hence the impetus for Spec1170. Every major Unix vendor 1590*946379e7Schristoshas committed to supporting this standard and every Unix software 1591*946379e7Schristosdeveloper waits with glee the day they can write software to this 1592*946379e7Schristosstandard and simply recompile (without having to use autoconf) 1593*946379e7Schristosacross different platforms. 1594*946379e7Schristos 1595*946379e7Schristos</P> 1596*946379e7Schristos<P> 1597*946379e7SchristosAs I understand it, Spec1170 is roughly based upon version 4 of the 1598*946379e7SchristosX/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and 1599*946379e7Schristosfriends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE> 1600*946379e7Schristosis a part of Spec1170 and hence will become a standardized component 1601*946379e7Schristosof all Unix systems. 1602*946379e7Schristos 1603*946379e7Schristos</P> 1604*946379e7Schristos 1605*946379e7Schristos 1606*946379e7Schristos<H3><A NAME="SEC182" HREF="gettext_toc.html#TOC182">11.6.3 Temporary - Why a single implementation</A></H3> 1607*946379e7Schristos 1608*946379e7Schristos<P> 1609*946379e7SchristosNow it seems kind of wasteful to me to have two different systems 1610*946379e7Schristosinstalled for accessing message catalogs. If we do want to remedy 1611*946379e7Schristos<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE> 1612*946379e7Schristos(in a compatible manner) rather than implement an entirely new system. 1613*946379e7SchristosOtherwise, we'll end up with two message catalog access systems installed 1614*946379e7Schristoswith an operating system - one set of routines for packages using GNU 1615*946379e7Schristos<CODE>gettext</CODE> for their internationalization, and another set of routines 1616*946379e7Schristos(catgets) for all other software. Bloated? 1617*946379e7Schristos 1618*946379e7Schristos</P> 1619*946379e7Schristos<P> 1620*946379e7SchristosSupposing another catalog access system is implemented. Which do 1621*946379e7Schristoswe recommend? At least for Linux, we need to attract as many 1622*946379e7Schristossoftware developers as possible. Hence we need to make it as easy 1623*946379e7Schristosfor them to port their software as possible. Which means supporting 1624*946379e7Schristos<CODE>catgets</CODE>. We will be implementing the <CODE>libintl</CODE> code 1625*946379e7Schristoswithin our <CODE>libc</CODE>, but does this mean we also have to incorporate 1626*946379e7Schristosanother message catalog access scheme within our <CODE>libc</CODE> as well? 1627*946379e7SchristosAnd what about people who are going to be using the <CODE>libintl</CODE> 1628*946379e7Schristos+ non-<CODE>catgets</CODE> routines. When they port their software to 1629*946379e7Schristosother platforms, they're now going to have to include the front-end 1630*946379e7Schristos(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE> 1631*946379e7Schristosaccess routines) with their software instead of just including the 1632*946379e7Schristos<CODE>libintl</CODE> code with their software. 1633*946379e7Schristos 1634*946379e7Schristos</P> 1635*946379e7Schristos<P> 1636*946379e7SchristosMessage catalog support is however only the tip of the iceberg. 1637*946379e7SchristosWhat about the data for the other locale categories. They also have 1638*946379e7Schristosa number of deficiencies. Are we going to abandon them as well and 1639*946379e7Schristosdevelop another duplicate set of routines (should <CODE>libintl</CODE> 1640*946379e7Schristosexpand beyond message catalog support)? 1641*946379e7Schristos 1642*946379e7Schristos</P> 1643*946379e7Schristos<P> 1644*946379e7SchristosLike many parts of Unix that can be improved upon, we're stuck with balancing 1645*946379e7Schristoscompatibility with the past with useful improvements and innovations for 1646*946379e7Schristosthe future. 1647*946379e7Schristos 1648*946379e7Schristos</P> 1649*946379e7Schristos 1650*946379e7Schristos 1651*946379e7Schristos<H3><A NAME="SEC183" HREF="gettext_toc.html#TOC183">11.6.4 Temporary - Notes</A></H3> 1652*946379e7Schristos 1653*946379e7Schristos<P> 1654*946379e7SchristosX/Open agreed very late on the standard form so that many 1655*946379e7Schristosimplementations differ from the final form. Both of my system (old 1656*946379e7SchristosLinux catgets and Ultrix-4) have a strange variation. 1657*946379e7Schristos 1658*946379e7Schristos</P> 1659*946379e7Schristos<P> 1660*946379e7SchristosOK. After incorporating the last changes I have to spend some time on 1661*946379e7Schristosmaking the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future 1662*946379e7SchristosSolaris is not the only system having <CODE>gettext</CODE>. 1663*946379e7Schristos 1664*946379e7Schristos</P> 1665*946379e7Schristos<P><HR><P> 1666*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 1667*946379e7Schristos</BODY> 1668*946379e7Schristos</HTML> 1669