1<HTML> 2<HEAD> 3<!-- This HTML file has been created by texi2html 1.52b 4 from gettext.texi on 27 November 2006 --> 5 6<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7<TITLE>GNU gettext utilities - 11 The Programmer's View</TITLE> 8</HEAD> 9<BODY> 10Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 11<P><HR><P> 12 13 14<H1><A NAME="SEC164" HREF="gettext_toc.html#TOC164">11 The Programmer's View</A></H1> 15 16<P> 17One aim of the current message catalog implementation provided by 18GNU <CODE>gettext</CODE> was to use the system's message catalog handling, if the 19installer wishes to do so. So we perhaps should first take a look at 20the solutions we know about. The people in the POSIX committee did not 21manage to agree on one of the semi-official standards which we'll 22describe below. In fact they couldn't agree on anything, so they decided 23only to include an example of an interface. The major Unix vendors 24are split in the usage of the two most important specifications: X/Open's 25catgets vs. Uniforum's gettext interface. We'll describe them both and 26later explain our solution of this dilemma. 27 28</P> 29 30 31 32<H2><A NAME="SEC165" HREF="gettext_toc.html#TOC165">11.1 About <CODE>catgets</CODE></A></H2> 33<P> 34<A NAME="IDX1006"></A> 35 36</P> 37<P> 38The <CODE>catgets</CODE> implementation is defined in the X/Open Portability 39Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the 40process of creating this standard seemed to be too slow for some of 41the Unix vendors so they created their implementations on preliminary 42versions of the standard. Of course this leads again to problems while 43writing platform independent programs: even the usage of <CODE>catgets</CODE> 44does not guarantee a unique interface. 45 46</P> 47<P> 48Another, personal comment on this that only a bunch of committee members 49could have made this interface. They never really tried to program 50using this interface. It is a fast, memory-saving implementation, an 51user can happily live with it. But programmers hate it (at least I and 52some others do...) 53 54</P> 55<P> 56But we must not forget one point: after all the trouble with transferring 57the rights on Unix(tm) they at last came to X/Open, the very same who 58published this specification. This leads me to making the prediction 59that this interface will be in future Unix standards (e.g. Spec1170) and 60therefore part of all Unix implementation (implementations, which are 61<EM>allowed</EM> to wear this name). 62 63</P> 64 65 66 67<H3><A NAME="SEC166" HREF="gettext_toc.html#TOC166">11.1.1 The Interface</A></H3> 68<P> 69<A NAME="IDX1007"></A> 70 71</P> 72<P> 73The interface to the <CODE>catgets</CODE> implementation consists of three 74functions which correspond to those used in file access: <CODE>catopen</CODE> 75to open the catalog for using, <CODE>catgets</CODE> for accessing the message 76tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes 77for the functions and the needed definitions are in the 78<CODE><nl_types.h></CODE> header file. 79 80</P> 81<P> 82<A NAME="IDX1008"></A> 83<CODE>catopen</CODE> is used like in this: 84 85</P> 86 87<PRE> 88nl_catd catd = catopen ("catalog_name", 0); 89</PRE> 90 91<P> 92The function takes as the argument the name of the catalog. This usual 93refers to the name of the program or the package. The second parameter 94is not further specified in the standard. I don't even know whether it 95is implemented consistently among various systems. So the common advice 96is to use <CODE>0</CODE> as the value. The return value is a handle to the 97message catalog, equivalent to handles to file returned by <CODE>open</CODE>. 98 99</P> 100<P> 101<A NAME="IDX1009"></A> 102This handle is of course used in the <CODE>catgets</CODE> function which can 103be used like this: 104 105</P> 106 107<PRE> 108char *translation = catgets (catd, set_no, msg_id, "original string"); 109</PRE> 110 111<P> 112The first parameter is this catalog descriptor. The second parameter 113specifies the set of messages in this catalog, in which the message 114described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a 115three-stage addressing: 116 117</P> 118 119<PRE> 120catalog name => set number => message ID => translation 121</PRE> 122 123<P> 124The fourth argument is not used to address the translation. It is given 125as a default value in case when one of the addressing stages fail. One 126important thing to remember is that although the return type of catgets 127is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It 128should better be <CODE>const char *</CODE>, but the standard is published in 1291988, one year before ANSI C. 130 131</P> 132<P> 133<A NAME="IDX1010"></A> 134The last of these functions is used and behaves as expected: 135 136</P> 137 138<PRE> 139catclose (catd); 140</PRE> 141 142<P> 143After this no <CODE>catgets</CODE> call using the descriptor is legal anymore. 144 145</P> 146 147 148<H3><A NAME="SEC167" HREF="gettext_toc.html#TOC167">11.1.2 Problems with the <CODE>catgets</CODE> Interface?!</A></H3> 149<P> 150<A NAME="IDX1011"></A> 151 152</P> 153<P> 154Now that this description seemed to be really easy -- where are the 155problems we speak of? In fact the interface could be used in a 156reasonable way, but constructing the message catalogs is a pain. The 157reason for this lies in the third argument of <CODE>catgets</CODE>: the unique 158message ID. This has to be a numeric value for all messages in a single 159set. Perhaps you could imagine the problems keeping such a list while 160changing the source code. Add a new message here, remove one there. Of 161course there have been developed a lot of tools helping to organize this 162chaos but one as the other fails in one aspect or the other. We don't 163want to say that the other approach has no problems but they are far 164more easy to manage. 165 166</P> 167 168 169<H2><A NAME="SEC168" HREF="gettext_toc.html#TOC168">11.2 About <CODE>gettext</CODE></A></H2> 170<P> 171<A NAME="IDX1012"></A> 172 173</P> 174<P> 175The definition of the <CODE>gettext</CODE> interface comes from a Uniforum 176proposal. It was submitted there by Sun, who had implemented the 177<CODE>gettext</CODE> function in SunOS 4, around 1990. Nowadays, the 178<CODE>gettext</CODE> interface is specified by the OpenI18N standard. 179 180</P> 181<P> 182The main point about this solution is that it does not follow the 183method of normal file handling (open-use-close) and that it does not 184burden the programmer with so many tasks, especially the unique key handling. 185Of course here also a unique key is needed, but this key is the message 186itself (how long or short it is). See section <A HREF="gettext_11.html#SEC176">11.3 Comparing the Two Interfaces</A> for a more 187detailed comparison of the two methods. 188 189</P> 190<P> 191The following section contains a rather detailed description of the 192interface. We make it that detailed because this is the interface 193we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested 194in using this library will be interested in this description. 195 196</P> 197 198 199 200<H3><A NAME="SEC169" HREF="gettext_toc.html#TOC169">11.2.1 The Interface</A></H3> 201<P> 202<A NAME="IDX1013"></A> 203 204</P> 205<P> 206The minimal functionality an interface must have is a) to select a 207domain the strings are coming from (a single domain for all programs is 208not reasonable because its construction and maintenance is difficult, 209perhaps impossible) and b) to access a string in a selected domain. 210 211</P> 212<P> 213This is principally the description of the <CODE>gettext</CODE> interface. It 214has a global domain which unqualified usages reference. Of course this 215domain is selectable by the user. 216 217</P> 218 219<PRE> 220char *textdomain (const char *domain_name); 221</PRE> 222 223<P> 224This provides the possibility to change or query the current status of 225the current global domain of the <CODE>LC_MESSAGE</CODE> category. The 226argument is a null-terminated string, whose characters must be legal in 227the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>, 228the function returns the current value. If no value has been set 229before, the name of the default domain is returned: <EM>messages</EM>. 230Please note that although the return value of <CODE>textdomain</CODE> is of 231type <CODE>char *</CODE> no changing is allowed. It is also important to know 232that no checks of the availability are made. If the name is not 233available you will see this by the fact that no translations are provided. 234 235</P> 236<P> 237To use a domain set by <CODE>textdomain</CODE> the function 238 239</P> 240 241<PRE> 242char *gettext (const char *msgid); 243</PRE> 244 245<P> 246is to be used. This is the simplest reasonable form one can imagine. 247The translation of the string <VAR>msgid</VAR> is returned if it is available 248in the current domain. If it is not available, the argument itself is 249returned. If the argument is <CODE>NULL</CODE> the result is undefined. 250 251</P> 252<P> 253One thing which should come into mind is that no explicit dependency to 254the used domain is given. The current value of the domain for the 255<CODE>LC_MESSAGES</CODE> locale is used. If this changes between two 256executions of the same <CODE>gettext</CODE> call in the program, both calls 257reference a different message catalog. 258 259</P> 260<P> 261For the easiest case, which is normally used in internationalized 262packages, once at the beginning of execution a call to <CODE>textdomain</CODE> 263is issued, setting the domain to a unique name, normally the package 264name. In the following code all strings which have to be translated are 265filtered through the gettext function. That's all, the package speaks 266your language. 267 268</P> 269 270 271<H3><A NAME="SEC170" HREF="gettext_toc.html#TOC170">11.2.2 Solving Ambiguities</A></H3> 272<P> 273<A NAME="IDX1014"></A> 274<A NAME="IDX1015"></A> 275<A NAME="IDX1016"></A> 276 277</P> 278<P> 279While this single name domain works well for most applications there 280might be the need to get translations from more than one domain. Of 281course one could switch between different domains with calls to 282<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A 283possible situation could be one case subject to discussion during this 284writing: all 285error messages of functions in the set of common used functions should 286go into a separate domain <CODE>error</CODE>. By this mean we would only need 287to translate them once. 288Another case are messages from a library, as these <EM>have</EM> to be 289independent of the current domain set by the application. 290 291</P> 292<P> 293For this reasons there are two more functions to retrieve strings: 294 295</P> 296 297<PRE> 298char *dgettext (const char *domain_name, const char *msgid); 299char *dcgettext (const char *domain_name, const char *msgid, 300 int category); 301</PRE> 302 303<P> 304Both take an additional argument at the first place, which corresponds 305to the argument of <CODE>textdomain</CODE>. The third argument of 306<CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>. 307But I really don't know where this can be useful. If the 308<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside 309the known ones, the result is undefined. It should also be noted that 310this function is not part of the second known implementation of this 311function family, the one found in Solaris. 312 313</P> 314<P> 315A second ambiguity can arise by the fact, that perhaps more than one 316domain has the same name. This can be solved by specifying where the 317needed message catalog files can be found. 318 319</P> 320 321<PRE> 322char *bindtextdomain (const char *domain_name, 323 const char *dir_name); 324</PRE> 325 326<P> 327Calling this function binds the given domain to a file in the specified 328directory (how this file is determined follows below). Especially a 329file in the systems default place is not favored against the specified 330file anymore (as it would be by solely using <CODE>textdomain</CODE>). A 331<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding 332associated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is 333<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here 334again as for all the other functions is true that none of the return 335value must be changed! 336 337</P> 338<P> 339It is important to remember that relative path names for the 340<VAR>dir_name</VAR> parameter can be trouble. Since the path is always 341computed relative to the current directory different results will be 342achieved when the program executes a <CODE>chdir</CODE> command. Relative 343paths should always be avoided to avoid dependencies and 344unreliabilities. 345 346</P> 347 348 349<H3><A NAME="SEC171" HREF="gettext_toc.html#TOC171">11.2.3 Locating Message Catalog Files</A></H3> 350<P> 351<A NAME="IDX1017"></A> 352 353</P> 354<P> 355Because many different languages for many different packages have to be 356stored we need some way to add these information to file message catalog 357files. The way usually used in Unix environments is have this encoding 358in the file name. This is also done here. The directory name given in 359<CODE>bindtextdomain</CODE>s second argument (or the default directory), 360followed by the value and name of the locale and the domain name are 361concatenated: 362 363</P> 364 365<PRE> 366<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo 367</PRE> 368 369<P> 370The default value for <VAR>dir_name</VAR> is system specific. For the GNU 371library, and for packages adhering to its conventions, it's: 372 373<PRE> 374/usr/local/share/locale 375</PRE> 376 377<P> 378<VAR>locale</VAR> is the value of the locale whose name is this 379<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this 380<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A> 381The value of the locale is determined through 382<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>. 383<A NAME="DOCF4" HREF="gettext_foot.html#FOOT4">(4)</A> 384<CODE>dcgettext</CODE> specifies the locale category by the third argument. 385 386</P> 387 388 389<H3><A NAME="SEC172" HREF="gettext_toc.html#TOC172">11.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A></H3> 390<P> 391<A NAME="IDX1018"></A> 392<A NAME="IDX1019"></A> 393 394</P> 395<P> 396<CODE>gettext</CODE> not only looks up a translation in a message catalog. It 397also converts the translation on the fly to the desired output character 398set. This is useful if the user is working in a different character set 399than the translator who created the message catalog, because it avoids 400distributing variants of message catalogs which differ only in the 401character set. 402 403</P> 404<P> 405The output character set is, by default, the value of <CODE>nl_langinfo 406(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current 407locale. But programs which store strings in a locale independent way 408(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions 409return the translations in that encoding, by use of the 410<CODE>bind_textdomain_codeset</CODE> function. 411 412</P> 413<P> 414Note that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to 415character set conversion. Also, when <CODE>gettext</CODE> does not find a 416translation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged -- 417independently of the current output character set. It is therefore 418recommended that all <VAR>msgid</VAR>s be US-ASCII strings. 419 420</P> 421<P> 422<DL> 423<DT><U>Function:</U> char * <B>bind_textdomain_codeset</B> <I>(const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)</I> 424<DD><A NAME="IDX1020"></A> 425The <CODE>bind_textdomain_codeset</CODE> function can be used to specify the 426output character set for message catalogs for domain <VAR>domainname</VAR>. 427The <VAR>codeset</VAR> argument must be a valid codeset name which can be used 428for the <CODE>iconv_open</CODE> function, or a null pointer. 429 430</P> 431<P> 432If the <VAR>codeset</VAR> parameter is the null pointer, 433<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset 434for the domain with the name <VAR>domainname</VAR>. It returns <CODE>NULL</CODE> if 435no codeset has yet been selected. 436 437</P> 438<P> 439The <CODE>bind_textdomain_codeset</CODE> function can be used several times. 440If used multiple times with the same <VAR>domainname</VAR> argument, the 441later call overrides the settings made by the earlier one. 442 443</P> 444<P> 445The <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a 446string containing the name of the selected codeset. The string is 447allocated internally in the function and must not be changed by the 448user. If the system went out of core during the execution of 449<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the 450global variable <VAR>errno</VAR> is set accordingly. 451</DL> 452 453</P> 454 455 456<H3><A NAME="SEC173" HREF="gettext_toc.html#TOC173">11.2.5 Using contexts for solving ambiguities</A></H3> 457<P> 458<A NAME="IDX1021"></A> 459<A NAME="IDX1022"></A> 460<A NAME="IDX1023"></A> 461<A NAME="IDX1024"></A> 462 463</P> 464<P> 465One place where the <CODE>gettext</CODE> functions, if used normally, have big 466problems is within programs with graphical user interfaces (GUIs). The 467problem is that many of the strings which have to be translated are very 468short. They have to appear in pull-down menus which restricts the 469length. But strings which are not containing entire sentences or at 470least large fragments of a sentence may appear in more than one 471situation in the program but might have different translations. This is 472especially true for the one-word strings which are frequently used in 473GUI programs. 474 475</P> 476<P> 477As a consequence many people say that the <CODE>gettext</CODE> approach is 478wrong and instead <CODE>catgets</CODE> should be used which indeed does not 479have this problem. But there is a very simple and powerful method to 480handle this kind of problems with the <CODE>gettext</CODE> functions. 481 482</P> 483<P> 484Contexts can be added to strings to be translated. A context dependent 485translation lookup is when a translation for a given string is searched, 486that is limited to a given context. The translation for the same string 487in a different context can be different. The different translations of 488the same string in different contexts can be stored in the in the same 489MO file, and can be edited by the translator in the same PO file. 490 491</P> 492<P> 493The <TT>‘gettext.h’</TT> include file contains the lookup macros for strings 494with contexts. They are implemented as thin macros and inline functions 495over the functions from <CODE><libintl.h></CODE>. 496 497</P> 498<P> 499<A NAME="IDX1025"></A> 500 501<PRE> 502const char *pgettext (const char *msgctxt, const char *msgid); 503</PRE> 504 505<P> 506In a call of this macro, <VAR>msgctxt</VAR> and <VAR>msgid</VAR> must be string 507literals. The macro returns the translation of <VAR>msgid</VAR>, restricted 508to the context given by <VAR>msgctxt</VAR>. 509 510</P> 511<P> 512The <VAR>msgctxt</VAR> string is visible in the PO file to the translator. 513You should try to make it somehow canonical and never changing. Because 514every time you change an <VAR>msgctxt</VAR>, the translator will have to review 515the translation of <VAR>msgid</VAR>. 516 517</P> 518<P> 519Finding a canonical <VAR>msgctxt</VAR> string that doesn't change over time can 520be hard. But you shouldn't use the file name or class name containing the 521<CODE>pgettext</CODE> call -- because it is a common development task to rename 522a file or a class, and it shouldn't cause translator work. Also you shouldn't 523use a comment in the form of a complete English sentence as <VAR>msgctxt</VAR> -- 524because orthography or grammar changes are often applied to such sentences, 525and again, it shouldn't force the translator to do a review. 526 527</P> 528<P> 529The <SAMP>‘p’</SAMP> in <SAMP>‘pgettext’</SAMP> stands for “particular”: <CODE>pgettext</CODE> 530fetches a particular translation of the <VAR>msgid</VAR>. 531 532</P> 533<P> 534<A NAME="IDX1026"></A> 535<A NAME="IDX1027"></A> 536 537<PRE> 538const char *dpgettext (const char *domain_name, 539 const char *msgctxt, const char *msgid); 540const char *dcpgettext (const char *domain_name, 541 const char *msgctxt, const char *msgid, 542 int category); 543</PRE> 544 545<P> 546These are generalizations of <CODE>pgettext</CODE>. They behave similarly to 547<CODE>dgettext</CODE> and <CODE>dcgettext</CODE>, respectively. The <VAR>domain_name</VAR> 548argument defines the translation domain. The <VAR>category</VAR> argument 549allows to use another locale facet than <CODE>LC_MESSAGES</CODE>. 550 551</P> 552<P> 553As as example consider the following fictional situation. A GUI program 554has a menu bar with the following entries: 555 556</P> 557 558<PRE> 559+------------+------------+--------------------------------------+ 560| File | Printer | | 561+------------+------------+--------------------------------------+ 562| Open | | Select | 563| New | | Open | 564+----------+ | Connect | 565 +----------+ 566</PRE> 567 568<P> 569To have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>, 570<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be 571at some point in the code a call to a function of the <CODE>gettext</CODE> 572family. But in two places the string passed into the function would be 573<CODE>Open</CODE>. The translations might not be the same and therefore we 574are in the dilemma described above. 575 576</P> 577<P> 578What distinguishes the two places is the menu path from the menu root to 579the particular menu entries: 580 581</P> 582 583<PRE> 584Menu|File 585Menu|Printer 586Menu|File|Open 587Menu|File|New 588Menu|Printer|Select 589Menu|Printer|Open 590Menu|Printer|Connect 591</PRE> 592 593<P> 594The context is thus the menu path without its last part. So, the calls 595look like this: 596 597</P> 598 599<PRE> 600pgettext ("Menu|", "File") 601pgettext ("Menu|", "Printer") 602pgettext ("Menu|File|", "Open") 603pgettext ("Menu|File|", "New") 604pgettext ("Menu|Printer|", "Select") 605pgettext ("Menu|Printer|", "Open") 606pgettext ("Menu|Printer|", "Connect") 607</PRE> 608 609<P> 610Whether or not to use the <SAMP>‘|’</SAMP> character at the end of the context is a 611matter of style. 612 613</P> 614<P> 615For more complex cases, where the <VAR>msgctxt</VAR> or <VAR>msgid</VAR> are not 616string literals, more general macros are available: 617 618</P> 619<P> 620<A NAME="IDX1028"></A> 621<A NAME="IDX1029"></A> 622<A NAME="IDX1030"></A> 623 624<PRE> 625const char *pgettext_expr (const char *msgctxt, const char *msgid); 626const char *dpgettext_expr (const char *domain_name, 627 const char *msgctxt, const char *msgid); 628const char *dcpgettext_expr (const char *domain_name, 629 const char *msgctxt, const char *msgid, 630 int category); 631</PRE> 632 633<P> 634Here <VAR>msgctxt</VAR> and <VAR>msgid</VAR> can be arbitrary string-valued expressions. 635These macros are more general. But in the case that both argument expressions 636are string literals, the macros without the <SAMP>‘_expr’</SAMP> suffix are more 637efficient. 638 639</P> 640 641 642<H3><A NAME="SEC174" HREF="gettext_toc.html#TOC174">11.2.6 Additional functions for plural forms</A></H3> 643<P> 644<A NAME="IDX1031"></A> 645 646</P> 647<P> 648The functions of the <CODE>gettext</CODE> family described so far (and all the 649<CODE>catgets</CODE> functions as well) have one problem in the real world 650which have been neglected completely in all existing approaches. What 651is meant here is the handling of plural forms. 652 653</P> 654<P> 655Looking through Unix source code before the time anybody thought about 656internationalization (and, sadly, even afterwards) one can often find 657code similar to the following: 658 659</P> 660 661<PRE> 662 printf ("%d file%s deleted", n, n == 1 ? "" : "s"); 663</PRE> 664 665<P> 666After the first complaints from people internationalizing the code people 667either completely avoided formulations like this or used strings like 668<CODE>"file(s)"</CODE>. Both look unnatural and should be avoided. First 669tries to solve the problem correctly looked like this: 670 671</P> 672 673<PRE> 674 if (n == 1) 675 printf ("%d file deleted", n); 676 else 677 printf ("%d files deleted", n); 678</PRE> 679 680<P> 681But this does not solve the problem. It helps languages where the 682plural form of a noun is not simply constructed by adding an 683‘s’ 684but that is all. Once again people fell into the trap of believing the 685rules their language is using are universal. But the handling of plural 686forms differs widely between the language families. For example, 687Rafal Maszkowski <CODE><rzm@mat.uni.torun.pl></CODE> reports: 688 689</P> 690 691<BLOCKQUOTE> 692<P> 693In Polish we use e.g. plik (file) this way: 694 695<PRE> 6961 plik 6972,3,4 pliki 6985-21 pliko'w 69922-24 pliki 70025-31 pliko'w 701</PRE> 702 703<P> 704and so on (o' means 8859-2 oacute which should be rather okreska, 705similar to aogonek). 706</BLOCKQUOTE> 707 708<P> 709There are two things which can differ between languages (and even inside 710language families); 711 712</P> 713 714<UL> 715<LI> 716 717The form how plural forms are built differs. This is a problem with 718languages which have many irregularities. German, for instance, is a 719drastic case. Though English and German are part of the same language 720family (Germanic), the almost regular forming of plural noun forms 721(appending an 722‘s’) 723is hardly found in German. 724 725<LI> 726 727The number of plural forms differ. This is somewhat surprising for 728those who only have experiences with Romanic and Germanic languages 729since here the number is the same (there are two). 730 731But other language families have only one form or many forms. More 732information on this in an extra section. 733</UL> 734 735<P> 736The consequence of this is that application writers should not try to 737solve the problem in their code. This would be localization since it is 738only usable for certain, hardcoded language environments. Instead the 739extended <CODE>gettext</CODE> interface should be used. 740 741</P> 742<P> 743These extra functions are taking instead of the one key string two 744strings and a numerical argument. The idea behind this is that using 745the numerical argument and the first string as a key, the implementation 746can select using rules specified by the translator the right plural 747form. The two string arguments then will be used to provide a return 748value in case no message catalog is found (similar to the normal 749<CODE>gettext</CODE> behavior). In this case the rules for Germanic language 750is used and it is assumed that the first string argument is the singular 751form, the second the plural form. 752 753</P> 754<P> 755This has the consequence that programs without language catalogs can 756display the correct strings only if the program itself is written using 757a Germanic language. This is a limitation but since the GNU C library 758(as well as the GNU <CODE>gettext</CODE> package) are written as part of the 759GNU package and the coding standards for the GNU project require program 760being written in English, this solution nevertheless fulfills its 761purpose. 762 763</P> 764<P> 765<DL> 766<DT><U>Function:</U> char * <B>ngettext</B> <I>(const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> 767<DD><A NAME="IDX1032"></A> 768The <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function 769as it finds the message catalogs in the same way. But it takes two 770extra arguments. The <VAR>msgid1</VAR> parameter must contain the singular 771form of the string to be converted. It is also used as the key for the 772search in the catalog. The <VAR>msgid2</VAR> parameter is the plural form. 773The parameter <VAR>n</VAR> is used to determine the plural form. If no 774message catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>, 775otherwise <CODE>msgid2</CODE>. 776 777</P> 778<P> 779An example for the use of this function is: 780 781</P> 782 783<PRE> 784printf (ngettext ("%d file removed", "%d files removed", n), n); 785</PRE> 786 787<P> 788Please note that the numeric value <VAR>n</VAR> has to be passed to the 789<CODE>printf</CODE> function as well. It is not sufficient to pass it only to 790<CODE>ngettext</CODE>. 791 792</P> 793<P> 794In the English singular case, the number -- always 1 -- can be replaced with 795"one": 796 797</P> 798 799<PRE> 800printf (ngettext ("One file removed", "%d files removed", n), n); 801</PRE> 802 803<P> 804This works because the <SAMP>‘printf’</SAMP> function discards excess arguments that 805are not consumed by the format string. 806 807</P> 808<P> 809It is also possible to use this function when the strings don't contain a 810cardinal number: 811 812</P> 813 814<PRE> 815puts (ngettext ("Delete the selected file?", 816 "Delete the selected files?", 817 n)); 818</PRE> 819 820<P> 821In this case the number <VAR>n</VAR> is only used to choose the plural form. 822</DL> 823 824</P> 825<P> 826<DL> 827<DT><U>Function:</U> char * <B>dngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I> 828<DD><A NAME="IDX1033"></A> 829The <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the 830way the message catalog is selected. The difference is that it takes 831two extra parameter to provide the correct plural form. These two 832parameters are handled in the same way <CODE>ngettext</CODE> handles them. 833</DL> 834 835</P> 836<P> 837<DL> 838<DT><U>Function:</U> char * <B>dcngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)</I> 839<DD><A NAME="IDX1034"></A> 840The <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the 841way the message catalog is selected. The difference is that it takes 842two extra parameter to provide the correct plural form. These two 843parameters are handled in the same way <CODE>ngettext</CODE> handles them. 844</DL> 845 846</P> 847<P> 848Now, how do these functions solve the problem of the plural forms? 849Without the input of linguists (which was not available) it was not 850possible to determine whether there are only a few different forms in 851which plural forms are formed or whether the number can increase with 852every new supported language. 853 854</P> 855<P> 856Therefore the solution implemented is to allow the translator to specify 857the rules of how to select the plural form. Since the formula varies 858with every language this is the only viable solution except for 859hardcoding the information in the code (which still would require the 860possibility of extensions to not prevent the use of new languages). 861 862</P> 863<P> 864<A NAME="IDX1035"></A> 865<A NAME="IDX1036"></A> 866<A NAME="IDX1037"></A> 867The information about the plural form selection has to be stored in the 868header entry of the PO file (the one with the empty <CODE>msgid</CODE> string). 869The plural form information looks like this: 870 871</P> 872 873<PRE> 874Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; 875</PRE> 876 877<P> 878The <CODE>nplurals</CODE> value must be a decimal number which specifies how 879many different plural forms exist for this language. The string 880following <CODE>plural</CODE> is an expression which is using the C language 881syntax. Exceptions are that no negative numbers are allowed, numbers 882must be decimal, and the only variable allowed is <CODE>n</CODE>. Spaces are 883allowed in the expression, but backslash-newlines are not; in the 884examples below the backslash-newlines are present for formatting purposes 885only. This expression will be evaluated whenever one of the functions 886<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called. The 887numeric value passed to these functions is then substituted for all uses 888of the variable <CODE>n</CODE> in the expression. The resulting value then 889must be greater or equal to zero and smaller than the value given as the 890value of <CODE>nplurals</CODE>. 891 892</P> 893<P> 894<A NAME="IDX1038"></A> 895The following rules are known at this point. The language with families 896are listed. But this does not necessarily mean the information can be 897generalized for the whole family (as can be easily seen in the table 898below).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A> 899 900</P> 901<DL COMPACT> 902 903<DT>Only one form: 904<DD> 905Some languages only require one single form. There is no distinction 906between the singular and plural form. An appropriate header entry 907would look like this: 908 909 910<PRE> 911Plural-Forms: nplurals=1; plural=0; 912</PRE> 913 914Languages with this property include: 915 916<DL COMPACT> 917 918<DT>Asian family 919<DD> 920Japanese, Korean, Vietnamese 921<DT>Turkic/Altaic family 922<DD> 923Turkish 924</DL> 925 926<DT>Two forms, singular used for one only 927<DD> 928This is the form used in most existing programs since it is what English 929is using. A header entry would look like this: 930 931 932<PRE> 933Plural-Forms: nplurals=2; plural=n != 1; 934</PRE> 935 936(Note: this uses the feature of C expressions that boolean expressions 937have to value zero or one.) 938 939Languages with this property include: 940 941<DL COMPACT> 942 943<DT>Germanic family 944<DD> 945Danish, Dutch, English, Faroese, German, Norwegian, Swedish 946<DT>Finno-Ugric family 947<DD> 948Estonian, Finnish 949<DT>Latin/Greek family 950<DD> 951Greek 952<DT>Semitic family 953<DD> 954Hebrew 955<DT>Romanic family 956<DD> 957Italian, Portuguese, Spanish 958<DT>Artificial 959<DD> 960Esperanto 961</DL> 962 963Another language using the same header entry is: 964 965<DL COMPACT> 966 967<DT>Finno-Ugric family 968<DD> 969Hungarian 970</DL> 971 972Hungarian does not appear to have a plural if you look at sentences involving 973cardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is 974“123 alma”. But when the number is not explicit, the distinction between 975singular and plural exists: “the apple” is “az alma”, and “the apples” is 976“az alm'{a}k”. Since <CODE>ngettext</CODE> has to support both types of sentences, 977it is classified here, under “two forms”. 978 979<DT>Two forms, singular used for zero and one 980<DD> 981Exceptional case in the language family. The header entry would be: 982 983 984<PRE> 985Plural-Forms: nplurals=2; plural=n>1; 986</PRE> 987 988Languages with this property include: 989 990<DL COMPACT> 991 992<DT>Romanic family 993<DD> 994French, Brazilian Portuguese 995</DL> 996 997<DT>Three forms, special case for zero 998<DD> 999The header entry would be: 1000 1001 1002<PRE> 1003Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; 1004</PRE> 1005 1006Languages with this property include: 1007 1008<DL COMPACT> 1009 1010<DT>Baltic family 1011<DD> 1012Latvian 1013</DL> 1014 1015<DT>Three forms, special cases for one and two 1016<DD> 1017The header entry would be: 1018 1019 1020<PRE> 1021Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; 1022</PRE> 1023 1024Languages with this property include: 1025 1026<DL COMPACT> 1027 1028<DT>Celtic 1029<DD> 1030Gaeilge (Irish) 1031</DL> 1032 1033<DT>Three forms, special case for numbers ending in 00 or [2-9][0-9] 1034<DD> 1035The header entry would be: 1036 1037 1038<PRE> 1039Plural-Forms: nplurals=3; \ 1040 plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; 1041</PRE> 1042 1043Languages with this property include: 1044 1045<DL COMPACT> 1046 1047<DT>Romanic family 1048<DD> 1049Romanian 1050</DL> 1051 1052<DT>Three forms, special case for numbers ending in 1[2-9] 1053<DD> 1054The header entry would look like this: 1055 1056 1057<PRE> 1058Plural-Forms: nplurals=3; \ 1059 plural=n%10==1 && n%100!=11 ? 0 : \ 1060 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; 1061</PRE> 1062 1063Languages with this property include: 1064 1065<DL COMPACT> 1066 1067<DT>Baltic family 1068<DD> 1069Lithuanian 1070</DL> 1071 1072<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] 1073<DD> 1074The header entry would look like this: 1075 1076 1077<PRE> 1078Plural-Forms: nplurals=3; \ 1079 plural=n%10==1 && n%100!=11 ? 0 : \ 1080 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 1081</PRE> 1082 1083Languages with this property include: 1084 1085<DL COMPACT> 1086 1087<DT>Slavic family 1088<DD> 1089Croatian, Serbian, Russian, Ukrainian 1090</DL> 1091 1092<DT>Three forms, special cases for 1 and 2, 3, 4 1093<DD> 1094The header entry would look like this: 1095 1096 1097<PRE> 1098Plural-Forms: nplurals=3; \ 1099 plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; 1100</PRE> 1101 1102Languages with this property include: 1103 1104<DL COMPACT> 1105 1106<DT>Slavic family 1107<DD> 1108Slovak, Czech 1109</DL> 1110 1111<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4 1112<DD> 1113The header entry would look like this: 1114 1115 1116<PRE> 1117Plural-Forms: nplurals=3; \ 1118 plural=n==1 ? 0 : \ 1119 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 1120</PRE> 1121 1122Languages with this property include: 1123 1124<DL COMPACT> 1125 1126<DT>Slavic family 1127<DD> 1128Polish 1129</DL> 1130 1131<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04 1132<DD> 1133The header entry would look like this: 1134 1135 1136<PRE> 1137Plural-Forms: nplurals=4; \ 1138 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; 1139</PRE> 1140 1141Languages with this property include: 1142 1143<DL COMPACT> 1144 1145<DT>Slavic family 1146<DD> 1147Slovenian 1148</DL> 1149</DL> 1150 1151<P> 1152You might now ask, <CODE>ngettext</CODE> handles only numbers <VAR>n</VAR> of type 1153<SAMP>‘unsigned long’</SAMP>. What about larger integer types? What about negative 1154numbers? What about floating-point numbers? 1155 1156</P> 1157<P> 1158About larger integer types, such as <SAMP>‘uintmax_t’</SAMP> or 1159<SAMP>‘unsigned long long’</SAMP>: they can be handled by reducing the value to a 1160range that fits in an <SAMP>‘unsigned long’</SAMP>. Simply casting the value to 1161<SAMP>‘unsigned long’</SAMP> would not do the right thing, since it would treat 1162<CODE>ULONG_MAX + 1</CODE> like zero, <CODE>ULONG_MAX + 2</CODE> like singular, and 1163the like. Here you can exploit the fact that all mentioned plural form 1164formulas eventually become periodic, with a period that is a divisor of 100 1165(or 1000 or 1000000). So, when you reduce a large value to another one in 1166the range [1000000, 1999999] that ends in the same 6 decimal digits, you 1167can assume that it will lead to the same plural form selection. This code 1168does this: 1169 1170</P> 1171 1172<PRE> 1173#include <inttypes.h> 1174uintmax_t nbytes = ...; 1175printf (ngettext ("The file has %"PRIuMAX" byte.", 1176 "The file has %"PRIuMAX" bytes.", 1177 (nbytes > ULONG_MAX 1178 ? (nbytes % 1000000) + 1000000 1179 : nbytes)), 1180 nbytes); 1181</PRE> 1182 1183<P> 1184Negative and floating-point values usually represent physical entities for 1185which singular and plural don't clearly apply. In such cases, there is no 1186need to use <CODE>ngettext</CODE>; a simple <CODE>gettext</CODE> call with a form suitable 1187for all values will do. For example: 1188 1189</P> 1190 1191<PRE> 1192printf (gettext ("Time elapsed: %.3f seconds"), 1193 num_milliseconds * 0.001); 1194</PRE> 1195 1196<P> 1197Even if <VAR>num_milliseconds</VAR> happens to be a multiple of 1000, the output 1198 1199<PRE> 1200Time elapsed: 1.000 seconds 1201</PRE> 1202 1203<P> 1204is acceptable in English, and similarly for other languages. 1205 1206</P> 1207 1208 1209<H3><A NAME="SEC175" HREF="gettext_toc.html#TOC175">11.2.7 Optimization of the *gettext functions</A></H3> 1210<P> 1211<A NAME="IDX1039"></A> 1212 1213</P> 1214<P> 1215At this point of the discussion we should talk about an advantage of the 1216GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out 1217that an internationalized program might have a poor performance if some 1218string has to be translated in an inner loop. While this is unavoidable 1219when the string varies from one run of the loop to the other it is 1220simply a waste of time when the string is always the same. Take the 1221following example: 1222 1223</P> 1224 1225<PRE> 1226{ 1227 while (...) 1228 { 1229 puts (gettext ("Hello world")); 1230 } 1231} 1232</PRE> 1233 1234<P> 1235When the locale selection does not change between two runs the resulting 1236string is always the same. One way to use this is: 1237 1238</P> 1239 1240<PRE> 1241{ 1242 str = gettext ("Hello world"); 1243 while (...) 1244 { 1245 puts (str); 1246 } 1247} 1248</PRE> 1249 1250<P> 1251But this solution is not usable in all situation (e.g. when the locale 1252selection changes) nor does it lead to legible code. 1253 1254</P> 1255<P> 1256For this reason, GNU <CODE>gettext</CODE> caches previous translation results. 1257When the same translation is requested twice, with no new message 1258catalogs being loaded in between, <CODE>gettext</CODE> will, the second time, 1259find the result through a single cache lookup. 1260 1261</P> 1262 1263 1264<H2><A NAME="SEC176" HREF="gettext_toc.html#TOC176">11.3 Comparing the Two Interfaces</A></H2> 1265<P> 1266<A NAME="IDX1040"></A> 1267<A NAME="IDX1041"></A> 1268 1269</P> 1270 1271<P> 1272The following discussion is perhaps a little bit colored. As said 1273above we implemented GNU <CODE>gettext</CODE> following the Uniforum 1274proposal and this surely has its reasons. But it should show how we 1275came to this decision. 1276 1277</P> 1278<P> 1279First we take a look at the developing process. When we write an 1280application using NLS provided by <CODE>gettext</CODE> we proceed as always. 1281Only when we come to a string which might be seen by the users and thus 1282has to be translated we use <CODE>gettext("...")</CODE> instead of 1283<CODE>"..."</CODE>. At the beginning of each source file (or in a central 1284header file) we define 1285 1286</P> 1287 1288<PRE> 1289#define gettext(String) (String) 1290</PRE> 1291 1292<P> 1293Even this definition can be avoided when the system supports the 1294<CODE>gettext</CODE> function in its C library. When we compile this code the 1295result is the same as if no NLS code is used. When you take a look at 1296the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE> 1297instead of <CODE>gettext("...")</CODE>. This reduces the number of 1298additional characters per translatable string to <EM>3</EM> (in words: 1299three). 1300 1301</P> 1302<P> 1303When now a production version of the program is needed we simply replace 1304the definition 1305 1306</P> 1307 1308<PRE> 1309#define _(String) (String) 1310</PRE> 1311 1312<P> 1313by 1314 1315</P> 1316<P> 1317<A NAME="IDX1042"></A> 1318 1319<PRE> 1320#include <libintl.h> 1321#define _(String) gettext (String) 1322</PRE> 1323 1324<P> 1325Additionally we run the program <TT>‘xgettext’</TT> on all source code file 1326which contain translatable strings and that's it: we have a running 1327program which does not depend on translations to be available, but which 1328can use any that becomes available. 1329 1330</P> 1331<P> 1332<A NAME="IDX1043"></A> 1333The same procedure can be done for the <CODE>gettext_noop</CODE> invocations 1334(see section <A HREF="gettext_4.html#SEC18">4.7 Special Cases of Translatable Strings</A>). One usually defines <CODE>gettext_noop</CODE> as a 1335no-op macro. So you should consider the following code for your project: 1336 1337</P> 1338 1339<PRE> 1340#define gettext_noop(String) String 1341#define N_(String) gettext_noop (String) 1342</PRE> 1343 1344<P> 1345<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>‘Makefile’</TT> in 1346the <TT>‘po/’</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the 1347mentioned short forms so you are invited to follow this proposal for 1348your own ease. 1349 1350</P> 1351<P> 1352Now to <CODE>catgets</CODE>. The main problem is the work for the 1353programmer. Every time he comes to a translatable string he has to 1354define a number (or a symbolic constant) which has also be defined in 1355the message catalog file. He also has to take care for duplicate 1356entries, duplicate message IDs etc. If he wants to have the same 1357quality in the message catalog as the GNU <CODE>gettext</CODE> program 1358provides he also has to put the descriptive comments for the strings and 1359the location in all source code files in the message catalog. This is 1360nearly a Mission: Impossible. 1361 1362</P> 1363<P> 1364But there are also some points people might call advantages speaking for 1365<CODE>catgets</CODE>. If you have a single word in a string and this string 1366is used in different contexts it is likely that in one or the other 1367language the word has different translations. Example: 1368 1369</P> 1370 1371<PRE> 1372printf ("%s: %d", gettext ("number"), number_of_errors) 1373 1374printf ("you should see %d %s", number_count, 1375 number_count == 1 ? gettext ("number") : gettext ("numbers")) 1376</PRE> 1377 1378<P> 1379Here we have to translate two times the string <CODE>"number"</CODE>. Even 1380if you do not speak a language beside English it might be possible to 1381recognize that the two words have a different meaning. In German the 1382first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second 1383to <CODE>"Zahl"</CODE>. 1384 1385</P> 1386<P> 1387Now you can say that this example is really esoteric. And you are 1388right! This is exactly how we felt about this problem and decide that 1389it does not weight that much. The solution for the above problem could 1390be very easy: 1391 1392</P> 1393 1394<PRE> 1395printf ("%s %d", gettext ("number:"), number_of_errors) 1396 1397printf (number_count == 1 ? gettext ("you should see %d number") 1398 : gettext ("you should see %d numbers"), 1399 number_count) 1400</PRE> 1401 1402<P> 1403We believe that we can solve all conflicts with this method. If it is 1404difficult one can also consider changing one of the conflicting string a 1405little bit. But it is not impossible to overcome. 1406 1407</P> 1408<P> 1409<CODE>catgets</CODE> allows same original entry to have different translations, 1410but <CODE>gettext</CODE> has another, scalable approach for solving ambiguities 1411of this kind: See section <A HREF="gettext_11.html#SEC170">11.2.2 Solving Ambiguities</A>. 1412 1413</P> 1414 1415 1416<H2><A NAME="SEC177" HREF="gettext_toc.html#TOC177">11.4 Using libintl.a in own programs</A></H2> 1417 1418<P> 1419Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be 1420self-contained. I.e., you can use it in your own programs without 1421providing additional functions. The <TT>‘Makefile’</TT> will put the header 1422and the library in directories selected using the <CODE>$(prefix)</CODE>. 1423 1424</P> 1425 1426 1427<H2><A NAME="SEC178" HREF="gettext_toc.html#TOC178">11.5 Being a <CODE>gettext</CODE> grok</A></H2> 1428 1429<P> 1430<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be 1431revised. 1432 1433</P> 1434<P> 1435To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it 1436is surely helpful to read the source code. But for those who don't want 1437to spend that much time in reading the (sometimes complicated) code here 1438is a list comments: 1439 1440</P> 1441 1442<UL> 1443<LI>Changing the language at runtime 1444 1445<A NAME="IDX1044"></A> 1446 1447For interactive programs it might be useful to offer a selection of the 1448used language at runtime. To understand how to do this one need to know 1449how the used language is determined while executing the <CODE>gettext</CODE> 1450function. The method which is presented here only works correctly 1451with the GNU implementation of the <CODE>gettext</CODE> functions. 1452 1453In the function <CODE>dcgettext</CODE> at every call the current setting of 1454the highest priority environment variable is determined and used. 1455Highest priority means here the following list with decreasing 1456priority: 1457 1458 1459<OL> 1460<LI><CODE>LANGUAGE</CODE> 1461 1462<A NAME="IDX1045"></A> 1463 1464<A NAME="IDX1046"></A> 1465<LI><CODE>LC_ALL</CODE> 1466 1467<A NAME="IDX1047"></A> 1468<A NAME="IDX1048"></A> 1469<A NAME="IDX1049"></A> 1470<A NAME="IDX1050"></A> 1471<A NAME="IDX1051"></A> 1472<A NAME="IDX1052"></A> 1473<LI><CODE>LC_xxx</CODE>, according to selected locale 1474 1475<A NAME="IDX1053"></A> 1476<LI><CODE>LANG</CODE> 1477 1478</OL> 1479 1480Afterwards the path is constructed using the found value and the 1481translation file is loaded if available. 1482 1483What happens now when the value for, say, <CODE>LANGUAGE</CODE> changes? According 1484to the process explained above the new value of this variable is found 1485as soon as the <CODE>dcgettext</CODE> function is called. But this also means 1486the (perhaps) different message catalog file is loaded. In other 1487words: the used language is changed. 1488 1489But there is one little hook. The code for gcc-2.7.0 and up provides 1490some optimization. This optimization normally prevents the calling of 1491the <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But 1492if <CODE>dcgettext</CODE> is not called the program also cannot find the 1493<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_11.html#SEC175">11.2.7 Optimization of the *gettext functions</A>). A 1494solution for this is very easy. Include the following code in the 1495language switching function. 1496 1497 1498<PRE> 1499 /* Change language. */ 1500 setenv ("LANGUAGE", "fr", 1); 1501 1502 /* Make change known. */ 1503 { 1504 extern int _nl_msg_cat_cntr; 1505 ++_nl_msg_cat_cntr; 1506 } 1507</PRE> 1508 1509<A NAME="IDX1054"></A> 1510The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>‘loadmsgcat.c’</TT>. 1511You don't need to know what this is for. But it can be used to detect 1512whether a <CODE>gettext</CODE> implementation is GNU gettext and not non-GNU 1513system's native gettext implementation. 1514 1515</UL> 1516 1517 1518 1519<H2><A NAME="SEC179" HREF="gettext_toc.html#TOC179">11.6 Temporary Notes for the Programmers Chapter</A></H2> 1520 1521<P> 1522<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be 1523revised. 1524 1525</P> 1526 1527 1528 1529<H3><A NAME="SEC180" HREF="gettext_toc.html#TOC180">11.6.1 Temporary - Two Possible Implementations</A></H3> 1530 1531<P> 1532There are two competing methods for language independent messages: 1533the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE> 1534method. The <CODE>catgets</CODE> method indexes messages by integers; the 1535<CODE>gettext</CODE> method indexes them by their English translations. 1536The <CODE>catgets</CODE> method has been around longer and is supported 1537by more vendors. The <CODE>gettext</CODE> method is supported by Sun, 1538and it has been heard that the COSE multi-vendor initiative is 1539supporting it. Neither method is a POSIX standard; the POSIX.1 1540committee had a lot of disagreement in this area. 1541 1542</P> 1543<P> 1544Neither one is in the POSIX standard. There was much disagreement 1545in the POSIX.1 committee about using the <CODE>gettext</CODE> routines 1546vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't 1547agree on anything, so no messaging system was included as part 1548of the standard. I believe the informative annex of the standard 1549includes the XPG3 messaging interfaces, “...as an example of 1550a messaging system that has been implemented...” 1551 1552</P> 1553<P> 1554They were very careful not to say anywhere that you should use one 1555set of interfaces over the other. For more on this topic please 1556see the Programming for Internationalization FAQ. 1557 1558</P> 1559 1560 1561<H3><A NAME="SEC181" HREF="gettext_toc.html#TOC181">11.6.2 Temporary - About <CODE>catgets</CODE></A></H3> 1562 1563<P> 1564There have been a few discussions of late on the use of 1565<CODE>catgets</CODE> as a base. I think it important to present both 1566sides of the argument and hence am opting to play devil's advocate 1567for a little bit. 1568 1569</P> 1570<P> 1571I'll not deny the fact that <CODE>catgets</CODE> could have been designed 1572a lot better. It currently has quite a number of limitations and 1573these have already been pointed out. 1574 1575</P> 1576<P> 1577However there is a great deal to be said for consistency and 1578standardization. A common recurring problem when writing Unix 1579software is the myriad portability problems across Unix platforms. 1580It seems as if every Unix vendor had a look at the operating system 1581and found parts they could improve upon. Undoubtedly, these 1582modifications are probably innovative and solve real problems. 1583However, software developers have a hard time keeping up with all 1584these changes across so many platforms. 1585 1586</P> 1587<P> 1588And this has prompted the Unix vendors to begin to standardize their 1589systems. Hence the impetus for Spec1170. Every major Unix vendor 1590has committed to supporting this standard and every Unix software 1591developer waits with glee the day they can write software to this 1592standard and simply recompile (without having to use autoconf) 1593across different platforms. 1594 1595</P> 1596<P> 1597As I understand it, Spec1170 is roughly based upon version 4 of the 1598X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and 1599friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE> 1600is a part of Spec1170 and hence will become a standardized component 1601of all Unix systems. 1602 1603</P> 1604 1605 1606<H3><A NAME="SEC182" HREF="gettext_toc.html#TOC182">11.6.3 Temporary - Why a single implementation</A></H3> 1607 1608<P> 1609Now it seems kind of wasteful to me to have two different systems 1610installed for accessing message catalogs. If we do want to remedy 1611<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE> 1612(in a compatible manner) rather than implement an entirely new system. 1613Otherwise, we'll end up with two message catalog access systems installed 1614with an operating system - one set of routines for packages using GNU 1615<CODE>gettext</CODE> for their internationalization, and another set of routines 1616(catgets) for all other software. Bloated? 1617 1618</P> 1619<P> 1620Supposing another catalog access system is implemented. Which do 1621we recommend? At least for Linux, we need to attract as many 1622software developers as possible. Hence we need to make it as easy 1623for them to port their software as possible. Which means supporting 1624<CODE>catgets</CODE>. We will be implementing the <CODE>libintl</CODE> code 1625within our <CODE>libc</CODE>, but does this mean we also have to incorporate 1626another message catalog access scheme within our <CODE>libc</CODE> as well? 1627And what about people who are going to be using the <CODE>libintl</CODE> 1628+ non-<CODE>catgets</CODE> routines. When they port their software to 1629other platforms, they're now going to have to include the front-end 1630(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE> 1631access routines) with their software instead of just including the 1632<CODE>libintl</CODE> code with their software. 1633 1634</P> 1635<P> 1636Message catalog support is however only the tip of the iceberg. 1637What about the data for the other locale categories. They also have 1638a number of deficiencies. Are we going to abandon them as well and 1639develop another duplicate set of routines (should <CODE>libintl</CODE> 1640expand beyond message catalog support)? 1641 1642</P> 1643<P> 1644Like many parts of Unix that can be improved upon, we're stuck with balancing 1645compatibility with the past with useful improvements and innovations for 1646the future. 1647 1648</P> 1649 1650 1651<H3><A NAME="SEC183" HREF="gettext_toc.html#TOC183">11.6.4 Temporary - Notes</A></H3> 1652 1653<P> 1654X/Open agreed very late on the standard form so that many 1655implementations differ from the final form. Both of my system (old 1656Linux catgets and Ultrix-4) have a strange variation. 1657 1658</P> 1659<P> 1660OK. After incorporating the last changes I have to spend some time on 1661making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future 1662Solaris is not the only system having <CODE>gettext</CODE>. 1663 1664</P> 1665<P><HR><P> 1666Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 1667</BODY> 1668</HTML> 1669