1*946379e7Schristos<HTML> 2*946379e7Schristos<HEAD> 3*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b 4*946379e7Schristos from gettext.texi on 27 November 2006 --> 5*946379e7Schristos 6*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7*946379e7Schristos<TITLE>GNU gettext utilities - 4 Preparing Program Sources</TITLE> 8*946379e7Schristos</HEAD> 9*946379e7Schristos<BODY> 10*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 11*946379e7Schristos<P><HR><P> 12*946379e7Schristos 13*946379e7Schristos 14*946379e7Schristos<H1><A NAME="SEC11" HREF="gettext_toc.html#TOC11">4 Preparing Program Sources</A></H1> 15*946379e7Schristos<P> 16*946379e7Schristos<A NAME="IDX102"></A> 17*946379e7Schristos 18*946379e7Schristos</P> 19*946379e7Schristos 20*946379e7Schristos<P> 21*946379e7SchristosFor the programmer, changes to the C source code fall into three 22*946379e7Schristoscategories. First, you have to make the localization functions 23*946379e7Schristosknown to all modules needing message translation. Second, you should 24*946379e7Schristosproperly trigger the operation of GNU <CODE>gettext</CODE> when the program 25*946379e7Schristosinitializes, usually from the <CODE>main</CODE> function. Last, you should 26*946379e7Schristosidentify, adjust and mark all constant strings in your program 27*946379e7Schristosneeding translation. 28*946379e7Schristos 29*946379e7Schristos</P> 30*946379e7Schristos 31*946379e7Schristos 32*946379e7Schristos 33*946379e7Schristos<H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">4.1 Importing the <CODE>gettext</CODE> declaration</A></H2> 34*946379e7Schristos 35*946379e7Schristos<P> 36*946379e7SchristosPresuming that your set of programs, or package, has been adjusted 37*946379e7Schristosso all needed GNU <CODE>gettext</CODE> files are available, and your 38*946379e7Schristos<TT>‘Makefile’</TT> files are adjusted (see section <A HREF="gettext_13.html#SEC196">13 The Maintainer's View</A>), each C module 39*946379e7Schristoshaving translated C strings should contain the line: 40*946379e7Schristos 41*946379e7Schristos</P> 42*946379e7Schristos<P> 43*946379e7Schristos<A NAME="IDX103"></A> 44*946379e7Schristos 45*946379e7Schristos<PRE> 46*946379e7Schristos#include <libintl.h> 47*946379e7Schristos</PRE> 48*946379e7Schristos 49*946379e7Schristos<P> 50*946379e7SchristosSimilarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/... 51*946379e7Schristoscalls with a format string that could be a translated C string (even if 52*946379e7Schristosthe C string comes from a different C module) should contain the line: 53*946379e7Schristos 54*946379e7Schristos</P> 55*946379e7Schristos 56*946379e7Schristos<PRE> 57*946379e7Schristos#include <libintl.h> 58*946379e7Schristos</PRE> 59*946379e7Schristos 60*946379e7Schristos 61*946379e7Schristos 62*946379e7Schristos<H2><A NAME="SEC13" HREF="gettext_toc.html#TOC13">4.2 Triggering <CODE>gettext</CODE> Operations</A></H2> 63*946379e7Schristos 64*946379e7Schristos<P> 65*946379e7Schristos<A NAME="IDX104"></A> 66*946379e7SchristosThe initialization of locale data should be done with more or less 67*946379e7Schristosthe same code in every program, as demonstrated below: 68*946379e7Schristos 69*946379e7Schristos</P> 70*946379e7Schristos 71*946379e7Schristos<PRE> 72*946379e7Schristosint 73*946379e7Schristosmain (int argc, char *argv[]) 74*946379e7Schristos{ 75*946379e7Schristos ... 76*946379e7Schristos setlocale (LC_ALL, ""); 77*946379e7Schristos bindtextdomain (PACKAGE, LOCALEDIR); 78*946379e7Schristos textdomain (PACKAGE); 79*946379e7Schristos ... 80*946379e7Schristos} 81*946379e7Schristos</PRE> 82*946379e7Schristos 83*946379e7Schristos<P> 84*946379e7Schristos<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by 85*946379e7Schristos<TT>‘config.h’</TT> or by the Makefile. For now consult the <CODE>gettext</CODE> 86*946379e7Schristosor <CODE>hello</CODE> sources for more information. 87*946379e7Schristos 88*946379e7Schristos</P> 89*946379e7Schristos<P> 90*946379e7Schristos<A NAME="IDX105"></A> 91*946379e7Schristos<A NAME="IDX106"></A> 92*946379e7SchristosThe use of <CODE>LC_ALL</CODE> might not be appropriate for you. 93*946379e7Schristos<CODE>LC_ALL</CODE> includes all locale categories and especially 94*946379e7Schristos<CODE>LC_CTYPE</CODE>. This later category is responsible for determining 95*946379e7Schristoscharacter classes with the <CODE>isalnum</CODE> etc. functions from 96*946379e7Schristos<TT>‘ctype.h’</TT> which could especially for programs, which process some 97*946379e7Schristoskind of input language, be wrong. For example this would mean that a 98*946379e7Schristossource code using the ç (c-cedilla character) is runnable in 99*946379e7SchristosFrance but not in the U.S. 100*946379e7Schristos 101*946379e7Schristos</P> 102*946379e7Schristos<P> 103*946379e7SchristosSome systems also have problems with parsing numbers using the 104*946379e7Schristos<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used. 105*946379e7SchristosThe standards say that additional formats but the one known in the 106*946379e7Schristos<CODE>"C"</CODE> locale might be recognized. But some systems seem to reject 107*946379e7Schristosnumbers in the <CODE>"C"</CODE> locale format. In some situation, it might 108*946379e7Schristosalso be a problem with the notation itself which makes it impossible to 109*946379e7Schristosrecognize whether the number is in the <CODE>"C"</CODE> locale or the local 110*946379e7Schristosformat. This can happen if thousands separator characters are used. 111*946379e7SchristosSome locales define this character according to the national 112*946379e7Schristosconventions to <CODE>'.'</CODE> which is the same character used in the 113*946379e7Schristos<CODE>"C"</CODE> locale to denote the decimal point. 114*946379e7Schristos 115*946379e7Schristos</P> 116*946379e7Schristos<P> 117*946379e7SchristosSo it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the 118*946379e7Schristoscode above by a sequence of <CODE>setlocale</CODE> lines 119*946379e7Schristos 120*946379e7Schristos</P> 121*946379e7Schristos 122*946379e7Schristos<PRE> 123*946379e7Schristos{ 124*946379e7Schristos ... 125*946379e7Schristos setlocale (LC_CTYPE, ""); 126*946379e7Schristos setlocale (LC_MESSAGES, ""); 127*946379e7Schristos ... 128*946379e7Schristos} 129*946379e7Schristos</PRE> 130*946379e7Schristos 131*946379e7Schristos<P> 132*946379e7Schristos<A NAME="IDX107"></A> 133*946379e7Schristos<A NAME="IDX108"></A> 134*946379e7Schristos<A NAME="IDX109"></A> 135*946379e7Schristos<A NAME="IDX110"></A> 136*946379e7Schristos<A NAME="IDX111"></A> 137*946379e7Schristos<A NAME="IDX112"></A> 138*946379e7Schristos<A NAME="IDX113"></A> 139*946379e7SchristosOn all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>, 140*946379e7Schristos<CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>, 141*946379e7Schristos<CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available. On some systems 142*946379e7Schristoswhich are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but 143*946379e7Schristosa substitute for it is defined in GNU gettext's <CODE><libintl.h></CODE>. 144*946379e7Schristos 145*946379e7Schristos</P> 146*946379e7Schristos<P> 147*946379e7SchristosNote that changing the <CODE>LC_CTYPE</CODE> also affects the functions 148*946379e7Schristosdeclared in the <CODE><ctype.h></CODE> standard header. If this is not 149*946379e7Schristosdesirable in your application (for example in a compiler's parser), 150*946379e7Schristosyou can use a set of substitute functions which hardwire the C locale, 151*946379e7Schristossuch as found in the <CODE><c-ctype.h></CODE> and <CODE><c-ctype.c></CODE> files 152*946379e7Schristosin the gettext source distribution. 153*946379e7Schristos 154*946379e7Schristos</P> 155*946379e7Schristos<P> 156*946379e7SchristosIt is also possible to switch the locale forth and back between the 157*946379e7Schristosenvironment dependent locale and the C locale, but this approach is 158*946379e7Schristosnormally avoided because a <CODE>setlocale</CODE> call is expensive, 159*946379e7Schristosbecause it is tedious to determine the places where a locale switch 160*946379e7Schristosis needed in a large program's source, and because switching a locale 161*946379e7Schristosis not multithread-safe. 162*946379e7Schristos 163*946379e7Schristos</P> 164*946379e7Schristos 165*946379e7Schristos 166*946379e7Schristos<H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">4.3 Preparing Translatable Strings</A></H2> 167*946379e7Schristos 168*946379e7Schristos<P> 169*946379e7Schristos<A NAME="IDX114"></A> 170*946379e7SchristosBefore strings can be marked for translations, they sometimes need to 171*946379e7Schristosbe adjusted. Usually preparing a string for translation is done right 172*946379e7Schristosbefore marking it, during the marking phase which is described in the 173*946379e7Schristosnext sections. What you have to keep in mind while doing that is the 174*946379e7Schristosfollowing. 175*946379e7Schristos 176*946379e7Schristos</P> 177*946379e7Schristos 178*946379e7Schristos<UL> 179*946379e7Schristos<LI> 180*946379e7Schristos 181*946379e7SchristosDecent English style. 182*946379e7Schristos 183*946379e7Schristos<LI> 184*946379e7Schristos 185*946379e7SchristosEntire sentences. 186*946379e7Schristos 187*946379e7Schristos<LI> 188*946379e7Schristos 189*946379e7SchristosSplit at paragraphs. 190*946379e7Schristos 191*946379e7Schristos<LI> 192*946379e7Schristos 193*946379e7SchristosUse format strings instead of string concatenation. 194*946379e7Schristos 195*946379e7Schristos<LI> 196*946379e7Schristos 197*946379e7SchristosAvoid unusual markup and unusual control characters. 198*946379e7Schristos</UL> 199*946379e7Schristos 200*946379e7Schristos<P> 201*946379e7SchristosLet's look at some examples of these guidelines. 202*946379e7Schristos 203*946379e7Schristos</P> 204*946379e7Schristos<P> 205*946379e7Schristos<A NAME="IDX115"></A> 206*946379e7SchristosTranslatable strings should be in good English style. If slang language 207*946379e7Schristoswith abbreviations and shortcuts is used, often translators will not 208*946379e7Schristosunderstand the message and will produce very inappropriate translations. 209*946379e7Schristos 210*946379e7Schristos</P> 211*946379e7Schristos 212*946379e7Schristos<PRE> 213*946379e7Schristos"%s: is parameter\n" 214*946379e7Schristos</PRE> 215*946379e7Schristos 216*946379e7Schristos<P> 217*946379e7SchristosThis is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or 218*946379e7Schristos<EM>the</EM> parameter? 219*946379e7Schristos 220*946379e7Schristos</P> 221*946379e7Schristos 222*946379e7Schristos<PRE> 223*946379e7Schristos"No match" 224*946379e7Schristos</PRE> 225*946379e7Schristos 226*946379e7Schristos<P> 227*946379e7SchristosThe ambiguity in this message makes it unintelligible: Is the program 228*946379e7Schristosattempting to set something on fire? Does it mean "The given object does 229*946379e7Schristosnot match the template"? Does it mean "The template does not fit for any 230*946379e7Schristosof the objects"? 231*946379e7Schristos 232*946379e7Schristos</P> 233*946379e7Schristos<P> 234*946379e7Schristos<A NAME="IDX116"></A> 235*946379e7SchristosIn both cases, adding more words to the message will help both the 236*946379e7Schristostranslator and the English speaking user. 237*946379e7Schristos 238*946379e7Schristos</P> 239*946379e7Schristos<P> 240*946379e7Schristos<A NAME="IDX117"></A> 241*946379e7SchristosTranslatable strings should be entire sentences. It is often not possible 242*946379e7Schristosto translate single verbs or adjectives in a substitutable way. 243*946379e7Schristos 244*946379e7Schristos</P> 245*946379e7Schristos 246*946379e7Schristos<PRE> 247*946379e7Schristosprintf ("File %s is %s protected", filename, rw ? "write" : "read"); 248*946379e7Schristos</PRE> 249*946379e7Schristos 250*946379e7Schristos<P> 251*946379e7SchristosMost translators will not look at the source and will thus only see the 252*946379e7Schristosstring <CODE>"File %s is %s protected"</CODE>, which is unintelligible. Change 253*946379e7Schristosthis to 254*946379e7Schristos 255*946379e7Schristos</P> 256*946379e7Schristos 257*946379e7Schristos<PRE> 258*946379e7Schristosprintf (rw ? "File %s is write protected" : "File %s is read protected", 259*946379e7Schristos filename); 260*946379e7Schristos</PRE> 261*946379e7Schristos 262*946379e7Schristos<P> 263*946379e7SchristosThis way the translator will not only understand the message, she will 264*946379e7Schristosalso be able to find the appropriate grammatical construction. A French 265*946379e7Schristostranslator for example translates "write protected" like "protected 266*946379e7Schristosagainst writing". 267*946379e7Schristos 268*946379e7Schristos</P> 269*946379e7Schristos<P> 270*946379e7SchristosEntire sentences are also important because in many languages, the 271*946379e7Schristosdeclination of some word in a sentence depends on the gender or the 272*946379e7Schristosnumber (singular/plural) of another part of the sentence. There are 273*946379e7Schristosusually more interdependencies between words than in English. The 274*946379e7Schristosconsequence is that asking a translator to translate two half-sentences 275*946379e7Schristosand then combining these two half-sentences through dumb string concatenation 276*946379e7Schristoswill not work, for many languages, even though it would work for English. 277*946379e7SchristosThat's why translators need to handle entire sentences. 278*946379e7Schristos 279*946379e7Schristos</P> 280*946379e7Schristos<P> 281*946379e7SchristosOften sentences don't fit into a single line. If a sentence is output 282*946379e7Schristosusing two subsequent <CODE>printf</CODE> statements, like this 283*946379e7Schristos 284*946379e7Schristos</P> 285*946379e7Schristos 286*946379e7Schristos<PRE> 287*946379e7Schristosprintf ("Locale charset \"%s\" is different from\n", lcharset); 288*946379e7Schristosprintf ("input file charset \"%s\".\n", fcharset); 289*946379e7Schristos</PRE> 290*946379e7Schristos 291*946379e7Schristos<P> 292*946379e7Schristosthe translator would have to translate two half sentences, but nothing 293*946379e7Schristosin the POT file would tell her that the two half sentences belong together. 294*946379e7SchristosIt is necessary to merge the two <CODE>printf</CODE> statements so that the 295*946379e7Schristostranslator can handle the entire sentence at once and decide at which 296*946379e7Schristosplace to insert a line break in the translation (if at all): 297*946379e7Schristos 298*946379e7Schristos</P> 299*946379e7Schristos 300*946379e7Schristos<PRE> 301*946379e7Schristosprintf ("Locale charset \"%s\" is different from\n\ 302*946379e7Schristosinput file charset \"%s\".\n", lcharset, fcharset); 303*946379e7Schristos</PRE> 304*946379e7Schristos 305*946379e7Schristos<P> 306*946379e7SchristosYou may now ask: how about two or more adjacent sentences? Like in this case: 307*946379e7Schristos 308*946379e7Schristos</P> 309*946379e7Schristos 310*946379e7Schristos<PRE> 311*946379e7Schristosputs ("Apollo 13 scenario: Stack overflow handling failed."); 312*946379e7Schristosputs ("On the next stack overflow we will crash!!!"); 313*946379e7Schristos</PRE> 314*946379e7Schristos 315*946379e7Schristos<P> 316*946379e7SchristosShould these two statements merged into a single one? I would recommend to 317*946379e7Schristosmerge them if the two sentences are related to each other, because then it 318*946379e7Schristosmakes it easier for the translator to understand and translate both. On 319*946379e7Schristosthe other hand, if one of the two messages is a stereotypic one, occurring 320*946379e7Schristosin other places as well, you will do a favour to the translator by not 321*946379e7Schristosmerging the two. (Identical messages occurring in several places are 322*946379e7Schristoscombined by xgettext, so the translator has to handle them once only.) 323*946379e7Schristos 324*946379e7Schristos</P> 325*946379e7Schristos<P> 326*946379e7Schristos<A NAME="IDX118"></A> 327*946379e7SchristosTranslatable strings should be limited to one paragraph; don't let a 328*946379e7Schristossingle message be longer than ten lines. The reason is that when the 329*946379e7Schristostranslatable string changes, the translator is faced with the task of 330*946379e7Schristosupdating the entire translated string. Maybe only a single word will 331*946379e7Schristoshave changed in the English string, but the translator doesn't see that 332*946379e7Schristos(with the current translation tools), therefore she has to proofread 333*946379e7Schristosthe entire message. 334*946379e7Schristos 335*946379e7Schristos</P> 336*946379e7Schristos<P> 337*946379e7Schristos<A NAME="IDX119"></A> 338*946379e7SchristosMany GNU programs have a <SAMP>‘--help’</SAMP> output that extends over several 339*946379e7Schristosscreen pages. It is a courtesy towards the translators to split such a 340*946379e7Schristosmessage into several ones of five to ten lines each. While doing that, 341*946379e7Schristosyou can also attempt to split the documented options into groups, 342*946379e7Schristossuch as the input options, the output options, and the informative 343*946379e7Schristosoutput options. This will help every user to find the option he is 344*946379e7Schristoslooking for. 345*946379e7Schristos 346*946379e7Schristos</P> 347*946379e7Schristos<P> 348*946379e7Schristos<A NAME="IDX120"></A> 349*946379e7Schristos<A NAME="IDX121"></A> 350*946379e7SchristosHardcoded string concatenation is sometimes used to construct English 351*946379e7Schristosstrings: 352*946379e7Schristos 353*946379e7Schristos</P> 354*946379e7Schristos 355*946379e7Schristos<PRE> 356*946379e7Schristosstrcpy (s, "Replace "); 357*946379e7Schristosstrcat (s, object1); 358*946379e7Schristosstrcat (s, " with "); 359*946379e7Schristosstrcat (s, object2); 360*946379e7Schristosstrcat (s, "?"); 361*946379e7Schristos</PRE> 362*946379e7Schristos 363*946379e7Schristos<P> 364*946379e7SchristosIn order to present to the translator only entire sentences, and also 365*946379e7Schristosbecause in some languages the translator might want to swap the order 366*946379e7Schristosof <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this 367*946379e7Schristosto use a format string: 368*946379e7Schristos 369*946379e7Schristos</P> 370*946379e7Schristos 371*946379e7Schristos<PRE> 372*946379e7Schristossprintf (s, "Replace %s with %s?", object1, object2); 373*946379e7Schristos</PRE> 374*946379e7Schristos 375*946379e7Schristos<P> 376*946379e7Schristos<A NAME="IDX122"></A> 377*946379e7SchristosA similar case is compile time concatenation of strings. The ISO C 99 378*946379e7Schristosinclude file <CODE><inttypes.h></CODE> contains a macro <CODE>PRId64</CODE> that 379*946379e7Schristoscan be used as a formatting directive for outputting an <SAMP>‘int64_t’</SAMP> 380*946379e7Schristosinteger through <CODE>printf</CODE>. It expands to a constant string, usually 381*946379e7Schristos"d" or "ld" or "lld" or something like this, depending on the platform. 382*946379e7SchristosAssume you have code like 383*946379e7Schristos 384*946379e7Schristos</P> 385*946379e7Schristos 386*946379e7Schristos<PRE> 387*946379e7Schristosprintf ("The amount is %0" PRId64 "\n", number); 388*946379e7Schristos</PRE> 389*946379e7Schristos 390*946379e7Schristos<P> 391*946379e7SchristosThe <CODE>gettext</CODE> tools and library have special support for these 392*946379e7Schristos<CODE><inttypes.h></CODE> macros. You can therefore simply write 393*946379e7Schristos 394*946379e7Schristos</P> 395*946379e7Schristos 396*946379e7Schristos<PRE> 397*946379e7Schristosprintf (gettext ("The amount is %0" PRId64 "\n"), number); 398*946379e7Schristos</PRE> 399*946379e7Schristos 400*946379e7Schristos<P> 401*946379e7SchristosThe PO file will contain the string "The amount is %0<PRId64>\n". 402*946379e7SchristosThe translators will provide a translation containing "%0<PRId64>" 403*946379e7Schristosas well, and at runtime the <CODE>gettext</CODE> function's result will 404*946379e7Schristoscontain the appropriate constant string, "d" or "ld" or "lld". 405*946379e7Schristos 406*946379e7Schristos</P> 407*946379e7Schristos<P> 408*946379e7SchristosThis works only for the predefined <CODE><inttypes.h></CODE> macros. If 409*946379e7Schristosyou have defined your own similar macros, let's say <SAMP>‘MYPRId64’</SAMP>, 410*946379e7Schristosthat are not known to <CODE>xgettext</CODE>, the solution for this problem 411*946379e7Schristosis to change the code like this: 412*946379e7Schristos 413*946379e7Schristos</P> 414*946379e7Schristos 415*946379e7Schristos<PRE> 416*946379e7Schristoschar buf1[100]; 417*946379e7Schristossprintf (buf1, "%0" MYPRId64, number); 418*946379e7Schristosprintf (gettext ("The amount is %s\n"), buf1); 419*946379e7Schristos</PRE> 420*946379e7Schristos 421*946379e7Schristos<P> 422*946379e7SchristosThis means, you put the platform dependent code in one statement, and the 423*946379e7Schristosinternationalization code in a different statement. Note that a buffer length 424*946379e7Schristosof 100 is safe, because all available hardware integer types are limited to 425*946379e7Schristos128 bits, and to print a 128 bit integer one needs at most 54 characters, 426*946379e7Schristosregardless whether in decimal, octal or hexadecimal. 427*946379e7Schristos 428*946379e7Schristos</P> 429*946379e7Schristos<P> 430*946379e7Schristos<A NAME="IDX123"></A> 431*946379e7Schristos<A NAME="IDX124"></A> 432*946379e7SchristosAll this applies to other programming languages as well. For example, in 433*946379e7SchristosJava and C#, string concatenation is very frequently used, because it is a 434*946379e7Schristoscompiler built-in operator. Like in C, in Java, you would change 435*946379e7Schristos 436*946379e7Schristos</P> 437*946379e7Schristos 438*946379e7Schristos<PRE> 439*946379e7SchristosSystem.out.println("Replace "+object1+" with "+object2+"?"); 440*946379e7Schristos</PRE> 441*946379e7Schristos 442*946379e7Schristos<P> 443*946379e7Schristosinto a statement involving a format string: 444*946379e7Schristos 445*946379e7Schristos</P> 446*946379e7Schristos 447*946379e7Schristos<PRE> 448*946379e7SchristosSystem.out.println( 449*946379e7Schristos MessageFormat.format("Replace {0} with {1}?", 450*946379e7Schristos new Object[] { object1, object2 })); 451*946379e7Schristos</PRE> 452*946379e7Schristos 453*946379e7Schristos<P> 454*946379e7SchristosSimilarly, in C#, you would change 455*946379e7Schristos 456*946379e7Schristos</P> 457*946379e7Schristos 458*946379e7Schristos<PRE> 459*946379e7SchristosConsole.WriteLine("Replace "+object1+" with "+object2+"?"); 460*946379e7Schristos</PRE> 461*946379e7Schristos 462*946379e7Schristos<P> 463*946379e7Schristosinto a statement involving a format string: 464*946379e7Schristos 465*946379e7Schristos</P> 466*946379e7Schristos 467*946379e7Schristos<PRE> 468*946379e7SchristosConsole.WriteLine( 469*946379e7Schristos String.Format("Replace {0} with {1}?", object1, object2)); 470*946379e7Schristos</PRE> 471*946379e7Schristos 472*946379e7Schristos<P> 473*946379e7Schristos<A NAME="IDX125"></A> 474*946379e7Schristos<A NAME="IDX126"></A> 475*946379e7SchristosUnusual markup or control characters should not be used in translatable 476*946379e7Schristosstrings. Translators will likely not understand the particular meaning 477*946379e7Schristosof the markup or control characters. 478*946379e7Schristos 479*946379e7Schristos</P> 480*946379e7Schristos<P> 481*946379e7SchristosFor example, if you have a convention that <SAMP>‘|’</SAMP> delimits the 482*946379e7Schristosleft-hand and right-hand part of some GUI elements, translators will 483*946379e7Schristosoften not understand it without specific comments. It might be 484*946379e7Schristosbetter to have the translator translate the left-hand and right-hand 485*946379e7Schristospart separately. 486*946379e7Schristos 487*946379e7Schristos</P> 488*946379e7Schristos<P> 489*946379e7SchristosAnother example is the <SAMP>‘argp’</SAMP> convention to use a single <SAMP>‘\v’</SAMP> 490*946379e7Schristos(vertical tab) control character to delimit two sections inside a 491*946379e7Schristosstring. This is flawed. Some translators may convert it to a simple 492*946379e7Schristosnewline, some to blank lines. With some PO file editors it may not be 493*946379e7Schristoseasy to even enter a vertical tab control character. So, you cannot 494*946379e7Schristosbe sure that the translation will contain a <SAMP>‘\v’</SAMP> character, at the 495*946379e7Schristoscorresponding position. The solution is, again, to let the translator 496*946379e7Schristostranslate two separate strings and combine at run-time the two translated 497*946379e7Schristosstrings with the <SAMP>‘\v’</SAMP> required by the convention. 498*946379e7Schristos 499*946379e7Schristos</P> 500*946379e7Schristos<P> 501*946379e7SchristosHTML markup, however, is common enough that it's probably ok to use in 502*946379e7Schristostranslatable strings. But please bear in mind that the GNU gettext tools 503*946379e7Schristosdon't verify that the translations are well-formed HTML. 504*946379e7Schristos 505*946379e7Schristos</P> 506*946379e7Schristos 507*946379e7Schristos 508*946379e7Schristos<H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">4.4 How Marks Appear in Sources</A></H2> 509*946379e7Schristos<P> 510*946379e7Schristos<A NAME="IDX127"></A> 511*946379e7Schristos 512*946379e7Schristos</P> 513*946379e7Schristos<P> 514*946379e7SchristosAll strings requiring translation should be marked in the C sources. Marking 515*946379e7Schristosis done in such a way that each translatable string appears to be 516*946379e7Schristosthe sole argument of some function or preprocessor macro. There are 517*946379e7Schristosonly a few such possible functions or macros meant for translation, 518*946379e7Schristosand their names are said to be marking keywords. The marking is 519*946379e7Schristosattached to strings themselves, rather than to what we do with them. 520*946379e7SchristosThis approach has more uses. A blatant example is an error message 521*946379e7Schristosproduced by formatting. The format string needs translation, as 522*946379e7Schristoswell as some strings inserted through some <SAMP>‘%s’</SAMP> specification 523*946379e7Schristosin the format, while the result from <CODE>sprintf</CODE> may have so many 524*946379e7Schristosdifferent instances that it is impractical to list them all in some 525*946379e7Schristos<SAMP>‘error_string_out()’</SAMP> routine, say. 526*946379e7Schristos 527*946379e7Schristos</P> 528*946379e7Schristos<P> 529*946379e7SchristosThis marking operation has two goals. The first goal of marking 530*946379e7Schristosis for triggering the retrieval of the translation, at run time. 531*946379e7SchristosThe keyword is possibly resolved into a routine able to dynamically 532*946379e7Schristosreturn the proper translation, as far as possible or wanted, for the 533*946379e7Schristosargument string. Most localizable strings are found in executable 534*946379e7Schristospositions, that is, attached to variables or given as parameters to 535*946379e7Schristosfunctions. But this is not universal usage, and some translatable 536*946379e7Schristosstrings appear in structured initializations. See section <A HREF="gettext_4.html#SEC18">4.7 Special Cases of Translatable Strings</A>. 537*946379e7Schristos 538*946379e7Schristos</P> 539*946379e7Schristos<P> 540*946379e7SchristosThe second goal of the marking operation is to help <CODE>xgettext</CODE> 541*946379e7Schristosat properly extracting all translatable strings when it scans a set 542*946379e7Schristosof program sources and produces PO file templates. 543*946379e7Schristos 544*946379e7Schristos</P> 545*946379e7Schristos<P> 546*946379e7SchristosThe canonical keyword for marking translatable strings is 547*946379e7Schristos<SAMP>‘gettext’</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE> 548*946379e7Schristospackage. For packages making only light use of the <SAMP>‘gettext’</SAMP> 549*946379e7Schristoskeyword, macro or function, it is easily used <EM>as is</EM>. However, 550*946379e7Schristosfor packages using the <CODE>gettext</CODE> interface more heavily, it 551*946379e7Schristosis usually more convenient to give the main keyword a shorter, less 552*946379e7Schristosobtrusive name. Indeed, the keyword might appear on a lot of strings 553*946379e7Schristosall over the package, and programmers usually do not want nor need 554*946379e7Schristostheir program sources to remind them forcefully, all the time, that they 555*946379e7Schristosare internationalized. Further, a long keyword has the disadvantage 556*946379e7Schristosof using more horizontal space, forcing more indentation work on 557*946379e7Schristossources for those trying to keep them within 79 or 80 columns. 558*946379e7Schristos 559*946379e7Schristos</P> 560*946379e7Schristos<P> 561*946379e7Schristos<A NAME="IDX128"></A> 562*946379e7SchristosMany packages use <SAMP>‘_’</SAMP> (a simple underline) as a keyword, 563*946379e7Schristosand write <SAMP>‘_("Translatable string")’</SAMP> instead of <SAMP>‘gettext 564*946379e7Schristos("Translatable string")’</SAMP>. Further, the coding rule, from GNU standards, 565*946379e7Schristoswanting that there is a space between the keyword and the opening 566*946379e7Schristosparenthesis is relaxed, in practice, for this particular usage. 567*946379e7SchristosSo, the textual overhead per translatable string is reduced to 568*946379e7Schristosonly three characters: the underline and the two parentheses. 569*946379e7SchristosHowever, even if GNU <CODE>gettext</CODE> uses this convention internally, 570*946379e7Schristosit does not offer it officially. The real, genuine keyword is truly 571*946379e7Schristos<SAMP>‘gettext’</SAMP> indeed. It is fairly easy for those wanting to use 572*946379e7Schristos<SAMP>‘_’</SAMP> instead of <SAMP>‘gettext’</SAMP> to declare: 573*946379e7Schristos 574*946379e7Schristos</P> 575*946379e7Schristos 576*946379e7Schristos<PRE> 577*946379e7Schristos#include <libintl.h> 578*946379e7Schristos#define _(String) gettext (String) 579*946379e7Schristos</PRE> 580*946379e7Schristos 581*946379e7Schristos<P> 582*946379e7Schristosinstead of merely using <SAMP>‘#include <libintl.h>’</SAMP>. 583*946379e7Schristos 584*946379e7Schristos</P> 585*946379e7Schristos<P> 586*946379e7SchristosThe marking keywords <SAMP>‘gettext’</SAMP> and <SAMP>‘_’</SAMP> take the translatable 587*946379e7Schristosstring as sole argument. It is also possible to define marking functions 588*946379e7Schristosthat take it at another argument position. It is even possible to make 589*946379e7Schristosthe marked argument position depend on the total number of arguments of 590*946379e7Schristosthe function call; this is useful in C++. All this is achieved using 591*946379e7Schristos<CODE>xgettext</CODE>'s <SAMP>‘--keyword’</SAMP> option. 592*946379e7Schristos 593*946379e7Schristos</P> 594*946379e7Schristos<P> 595*946379e7SchristosNote also that long strings can be split across lines, into multiple 596*946379e7Schristosadjacent string tokens. Automatic string concatenation is performed 597*946379e7Schristosat compile time according to ISO C and ISO C++; <CODE>xgettext</CODE> also 598*946379e7Schristossupports this syntax. 599*946379e7Schristos 600*946379e7Schristos</P> 601*946379e7Schristos<P> 602*946379e7SchristosLater on, the maintenance is relatively easy. If, as a programmer, 603*946379e7Schristosyou add or modify a string, you will have to ask yourself if the 604*946379e7Schristosnew or altered string requires translation, and include it within 605*946379e7Schristos<SAMP>‘_()’</SAMP> if you think it should be translated. For example, <SAMP>‘"%s"’</SAMP> 606*946379e7Schristosis an example of string <EM>not</EM> requiring translation. But 607*946379e7Schristos<SAMP>‘"%s: %d"’</SAMP> <EM>does</EM> require translation, because in French, unlike 608*946379e7Schristosin English, it's customary to put a space before a colon. 609*946379e7Schristos 610*946379e7Schristos</P> 611*946379e7Schristos 612*946379e7Schristos 613*946379e7Schristos<H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">4.5 Marking Translatable Strings</A></H2> 614*946379e7Schristos<P> 615*946379e7Schristos<A NAME="IDX129"></A> 616*946379e7Schristos 617*946379e7Schristos</P> 618*946379e7Schristos<P> 619*946379e7SchristosIn PO mode, one set of features is meant more for the programmer than 620*946379e7Schristosfor the translator, and allows him to interactively mark which strings, 621*946379e7Schristosin a set of program sources, are translatable, and which are not. 622*946379e7SchristosEven if it is a fairly easy job for a programmer to find and mark 623*946379e7Schristossuch strings by other means, using any editor of his choice, PO mode 624*946379e7Schristosmakes this work more comfortable. Further, this gives translators 625*946379e7Schristoswho feel a little like programmers, or programmers who feel a little 626*946379e7Schristoslike translators, a tool letting them work at marking translatable 627*946379e7Schristosstrings in the program sources, while simultaneously producing a set of 628*946379e7Schristostranslation in some language, for the package being internationalized. 629*946379e7Schristos 630*946379e7Schristos</P> 631*946379e7Schristos<P> 632*946379e7Schristos<A NAME="IDX130"></A> 633*946379e7SchristosThe set of program sources, targeted by the PO mode commands describe 634*946379e7Schristoshere, should have an Emacs tags table constructed for your project, 635*946379e7Schristosprior to using these PO file commands. This is easy to do. In any 636*946379e7Schristosshell window, change the directory to the root of your project, then 637*946379e7Schristosexecute a command resembling: 638*946379e7Schristos 639*946379e7Schristos</P> 640*946379e7Schristos 641*946379e7Schristos<PRE> 642*946379e7Schristosetags src/*.[hc] lib/*.[hc] 643*946379e7Schristos</PRE> 644*946379e7Schristos 645*946379e7Schristos<P> 646*946379e7Schristospresuming here you want to process all <TT>‘.h’</TT> and <TT>‘.c’</TT> files 647*946379e7Schristosfrom the <TT>‘src/’</TT> and <TT>‘lib/’</TT> directories. This command will 648*946379e7Schristosexplore all said files and create a <TT>‘TAGS’</TT> file in your root 649*946379e7Schristosdirectory, somewhat summarizing the contents using a special file 650*946379e7Schristosformat Emacs can understand. 651*946379e7Schristos 652*946379e7Schristos</P> 653*946379e7Schristos<P> 654*946379e7Schristos<A NAME="IDX131"></A> 655*946379e7SchristosFor packages following the GNU coding standards, there is 656*946379e7Schristosa make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in 657*946379e7Schristosall directories and for all files containing source code. 658*946379e7Schristos 659*946379e7Schristos</P> 660*946379e7Schristos<P> 661*946379e7SchristosOnce your <TT>‘TAGS’</TT> file is ready, the following commands assist 662*946379e7Schristosthe programmer at marking translatable strings in his set of sources. 663*946379e7SchristosBut these commands are necessarily driven from within a PO file 664*946379e7Schristoswindow, and it is likely that you do not even have such a PO file yet. 665*946379e7SchristosThis is not a problem at all, as you may safely open a new, empty PO 666*946379e7Schristosfile, mainly for using these commands. This empty PO file will slowly 667*946379e7Schristosfill in while you mark strings as translatable in your program sources. 668*946379e7Schristos 669*946379e7Schristos</P> 670*946379e7Schristos<DL COMPACT> 671*946379e7Schristos 672*946379e7Schristos<DT><KBD>,</KBD> 673*946379e7Schristos<DD> 674*946379e7Schristos<A NAME="IDX132"></A> 675*946379e7SchristosSearch through program sources for a string which looks like a 676*946379e7Schristoscandidate for translation (<CODE>po-tags-search</CODE>). 677*946379e7Schristos 678*946379e7Schristos<DT><KBD>M-,</KBD> 679*946379e7Schristos<DD> 680*946379e7Schristos<A NAME="IDX133"></A> 681*946379e7SchristosMark the last string found with <SAMP>‘_()’</SAMP> (<CODE>po-mark-translatable</CODE>). 682*946379e7Schristos 683*946379e7Schristos<DT><KBD>M-.</KBD> 684*946379e7Schristos<DD> 685*946379e7Schristos<A NAME="IDX134"></A> 686*946379e7SchristosMark the last string found with a keyword taken from a set of possible 687*946379e7Schristoskeywords. This command with a prefix allows some management of these 688*946379e7Schristoskeywords (<CODE>po-select-mark-and-mark</CODE>). 689*946379e7Schristos 690*946379e7Schristos</DL> 691*946379e7Schristos 692*946379e7Schristos<P> 693*946379e7Schristos<A NAME="IDX135"></A> 694*946379e7SchristosThe <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next 695*946379e7Schristosoccurrence of a string which looks like a possible candidate for 696*946379e7Schristostranslation, and displays the program source in another Emacs window, 697*946379e7Schristospositioned in such a way that the string is near the top of this other 698*946379e7Schristoswindow. If the string is too big to fit whole in this window, it is 699*946379e7Schristospositioned so only its end is shown. In any case, the cursor 700*946379e7Schristosis left in the PO file window. If the shown string would be better 701*946379e7Schristospresented differently in different native languages, you may mark it 702*946379e7Schristosusing <KBD>M-,</KBD> or <KBD>M-.</KBD>. Otherwise, you might rather ignore it 703*946379e7Schristosand skip to the next string by merely repeating the <KBD>,</KBD> command. 704*946379e7Schristos 705*946379e7Schristos</P> 706*946379e7Schristos<P> 707*946379e7SchristosA string is a good candidate for translation if it contains a sequence 708*946379e7Schristosof three or more letters. A string containing at most two letters in 709*946379e7Schristosa row will be considered as a candidate if it has more letters than 710*946379e7Schristosnon-letters. The command disregards strings containing no letters, 711*946379e7Schristosor isolated letters only. It also disregards strings within comments, 712*946379e7Schristosor strings already marked with some keyword PO mode knows (see below). 713*946379e7Schristos 714*946379e7Schristos</P> 715*946379e7Schristos<P> 716*946379e7SchristosIf you have never told Emacs about some <TT>‘TAGS’</TT> file to use, the 717*946379e7Schristoscommand will request that you specify one from the minibuffer, the 718*946379e7Schristosfirst time you use the command. You may later change your <TT>‘TAGS’</TT> 719*946379e7Schristosfile by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>, 720*946379e7Schristoswhich will ask you to name the precise <TT>‘TAGS’</TT> file you want 721*946379e7Schristosto use. See section ‘Tag Tables’ in <CITE>The Emacs Editor</CITE>. 722*946379e7Schristos 723*946379e7Schristos</P> 724*946379e7Schristos<P> 725*946379e7SchristosEach time you use the <KBD>,</KBD> command, the search resumes from where it was 726*946379e7Schristosleft by the previous search, and goes through all program sources, 727*946379e7Schristosobeying the <TT>‘TAGS’</TT> file, until all sources have been processed. 728*946379e7SchristosHowever, by giving a prefix argument to the command (<KBD>C-u 729*946379e7Schristos,)</KBD>, you may request that the search be restarted all over again 730*946379e7Schristosfrom the first program source; but in this case, strings that you 731*946379e7Schristosrecently marked as translatable will be automatically skipped. 732*946379e7Schristos 733*946379e7Schristos</P> 734*946379e7Schristos<P> 735*946379e7SchristosUsing this <KBD>,</KBD> command does not prevent using of other regular 736*946379e7SchristosEmacs tags commands. For example, regular <CODE>tags-search</CODE> or 737*946379e7Schristos<CODE>tags-query-replace</CODE> commands may be used without disrupting the 738*946379e7Schristosindependent <KBD>,</KBD> search sequence. However, as implemented, the 739*946379e7Schristos<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a 740*946379e7Schristosprefix) might also reinitialize the regular Emacs tags searching to the 741*946379e7Schristosfirst tags file, this reinitialization might be considered spurious. 742*946379e7Schristos 743*946379e7Schristos</P> 744*946379e7Schristos<P> 745*946379e7Schristos<A NAME="IDX136"></A> 746*946379e7Schristos<A NAME="IDX137"></A> 747*946379e7SchristosThe <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the 748*946379e7Schristosrecently found string with the <SAMP>‘_’</SAMP> keyword. The <KBD>M-.</KBD> 749*946379e7Schristos(<CODE>po-select-mark-and-mark</CODE>) command will request that you type 750*946379e7Schristosone keyword from the minibuffer and use that keyword for marking 751*946379e7Schristosthe string. Both commands will automatically create a new PO file 752*946379e7Schristosuntranslated entry for the string being marked, and make it the 753*946379e7Schristoscurrent entry (making it easy for you to immediately proceed to its 754*946379e7Schristostranslation, if you feel like doing it right away). It is possible 755*946379e7Schristosthat the modifications made to the program source by <KBD>M-,</KBD> or 756*946379e7Schristos<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you 757*946379e7Schristosto break and re-indent this line differently. You may use the <KBD>O</KBD> 758*946379e7Schristoscommand from PO mode, or any other window changing command from 759*946379e7SchristosEmacs, to break out into the program source window, and do any 760*946379e7Schristosneeded adjustments. You will have to use some regular Emacs command 761*946379e7Schristosto return the cursor to the PO file window, if you want command 762*946379e7Schristos<KBD>,</KBD> for the next string, say. 763*946379e7Schristos 764*946379e7Schristos</P> 765*946379e7Schristos<P> 766*946379e7SchristosThe <KBD>M-.</KBD> command has a few built-in speedups, so you do not 767*946379e7Schristoshave to explicitly type all keywords all the time. The first such 768*946379e7Schristosspeedup is that you are presented with a <EM>preferred</EM> keyword, 769*946379e7Schristoswhich you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt. 770*946379e7SchristosThe second speedup is that you may type any non-ambiguous prefix of the 771*946379e7Schristoskeyword you really mean, and the command will complete it automatically 772*946379e7Schristosfor you. This also means that PO mode has to <EM>know</EM> all 773*946379e7Schristosyour possible keywords, and that it will not accept mistyped keywords. 774*946379e7Schristos 775*946379e7Schristos</P> 776*946379e7Schristos<P> 777*946379e7SchristosIf you reply <KBD>?</KBD> to the keyword request, the command gives a 778*946379e7Schristoslist of all known keywords, from which you may choose. When the 779*946379e7Schristoscommand is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits 780*946379e7Schristosupdating any program source or PO file buffer, and does some simple 781*946379e7Schristoskeyword management instead. In this case, the command asks for a 782*946379e7Schristoskeyword, written in full, which becomes a new allowed keyword for 783*946379e7Schristoslater <KBD>M-.</KBD> commands. Moreover, this new keyword automatically 784*946379e7Schristosbecomes the <EM>preferred</EM> keyword for later commands. By typing 785*946379e7Schristosan already known keyword in response to <KBD>C-u M-.</KBD>, one merely 786*946379e7Schristoschanges the <EM>preferred</EM> keyword and does nothing more. 787*946379e7Schristos 788*946379e7Schristos</P> 789*946379e7Schristos<P> 790*946379e7SchristosAll keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command 791*946379e7Schristoswhen scanning for strings, and strings already marked by any of those 792*946379e7Schristosknown keywords are automatically skipped. If many PO files are opened 793*946379e7Schristossimultaneously, each one has its own independent set of known keywords. 794*946379e7SchristosThere is no provision in PO mode, currently, for deleting a known 795*946379e7Schristoskeyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen 796*946379e7Schristosit afresh. When a PO file is newly brought up in an Emacs window, only 797*946379e7Schristos<SAMP>‘gettext’</SAMP> and <SAMP>‘_’</SAMP> are known as keywords, and <SAMP>‘gettext’</SAMP> 798*946379e7Schristosis preferred for the <KBD>M-.</KBD> command. In fact, this is not useful to 799*946379e7Schristosprefer <SAMP>‘_’</SAMP>, as this one is already built in the <KBD>M-,</KBD> command. 800*946379e7Schristos 801*946379e7Schristos</P> 802*946379e7Schristos 803*946379e7Schristos 804*946379e7Schristos<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">4.6 Special Comments preceding Keywords</A></H2> 805*946379e7Schristos 806*946379e7Schristos<P> 807*946379e7Schristos<A NAME="IDX138"></A> 808*946379e7SchristosIn C programs strings are often used within calls of functions from the 809*946379e7Schristos<CODE>printf</CODE> family. The special thing about these format strings is 810*946379e7Schristosthat they can contain format specifiers introduced with <KBD>%</KBD>. Assume 811*946379e7Schristoswe have the code 812*946379e7Schristos 813*946379e7Schristos</P> 814*946379e7Schristos 815*946379e7Schristos<PRE> 816*946379e7Schristosprintf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); 817*946379e7Schristos</PRE> 818*946379e7Schristos 819*946379e7Schristos<P> 820*946379e7SchristosA possible German translation for the above string might be: 821*946379e7Schristos 822*946379e7Schristos</P> 823*946379e7Schristos 824*946379e7Schristos<PRE> 825*946379e7Schristos"%d Zeichen lang ist die Zeichenkette `%s'" 826*946379e7Schristos</PRE> 827*946379e7Schristos 828*946379e7Schristos<P> 829*946379e7SchristosA C programmer, even if he cannot speak German, will recognize that 830*946379e7Schristosthere is something wrong here. The order of the two format specifiers 831*946379e7Schristosis changed but of course the arguments in the <CODE>printf</CODE> don't have. 832*946379e7SchristosThis will most probably lead to problems because now the length of the 833*946379e7Schristosstring is regarded as the address. 834*946379e7Schristos 835*946379e7Schristos</P> 836*946379e7Schristos<P> 837*946379e7SchristosTo prevent errors at runtime caused by translations the <CODE>msgfmt</CODE> 838*946379e7Schristostool can check statically whether the arguments in the original and the 839*946379e7Schristostranslation string match in type and number. If this is not the case 840*946379e7Schristosand the <SAMP>‘-c’</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE> 841*946379e7Schristoswill give an error and refuse to produce a MO file. Thus consequent 842*946379e7Schristosuse of <SAMP>‘msgfmt -c’</SAMP> will catch the error, so that it cannot cause 843*946379e7Schristoscause problems at runtime. 844*946379e7Schristos 845*946379e7Schristos</P> 846*946379e7Schristos<P> 847*946379e7SchristosIf the word order in the above German translation would be correct one 848*946379e7Schristoswould have to write 849*946379e7Schristos 850*946379e7Schristos</P> 851*946379e7Schristos 852*946379e7Schristos<PRE> 853*946379e7Schristos"%2$d Zeichen lang ist die Zeichenkette `%1$s'" 854*946379e7Schristos</PRE> 855*946379e7Schristos 856*946379e7Schristos<P> 857*946379e7SchristosThe routines in <CODE>msgfmt</CODE> know about this special notation. 858*946379e7Schristos 859*946379e7Schristos</P> 860*946379e7Schristos<P> 861*946379e7SchristosBecause not all strings in a program must be format strings it is not 862*946379e7Schristosuseful for <CODE>msgfmt</CODE> to test all the strings in the <TT>‘.po’</TT> file. 863*946379e7SchristosThis might cause problems because the string might contain what looks 864*946379e7Schristoslike a format specifier, but the string is not used in <CODE>printf</CODE>. 865*946379e7Schristos 866*946379e7Schristos</P> 867*946379e7Schristos<P> 868*946379e7SchristosTherefore the <CODE>xgettext</CODE> adds a special tag to those messages it 869*946379e7Schristosthinks might be a format string. There is no absolute rule for this, 870*946379e7Schristosonly a heuristic. In the <TT>‘.po’</TT> file the entry is marked using the 871*946379e7Schristos<CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_3.html#SEC10">3 The Format of PO Files</A>). 872*946379e7Schristos 873*946379e7Schristos</P> 874*946379e7Schristos<P> 875*946379e7Schristos<A NAME="IDX139"></A> 876*946379e7Schristos<A NAME="IDX140"></A> 877*946379e7SchristosThe careful reader now might say that this again can cause problems. 878*946379e7SchristosThe heuristic might guess it wrong. This is true and therefore 879*946379e7Schristos<CODE>xgettext</CODE> knows about a special kind of comment which lets 880*946379e7Schristosthe programmer take over the decision. If in the same line as or 881*946379e7Schristosthe immediately preceding line to the <CODE>gettext</CODE> keyword 882*946379e7Schristosthe <CODE>xgettext</CODE> program finds a comment containing the words 883*946379e7Schristos<CODE>xgettext:c-format</CODE>, it will mark the string in any case with 884*946379e7Schristosthe <CODE>c-format</CODE> flag. This kind of comment should be used when 885*946379e7Schristos<CODE>xgettext</CODE> does not recognize the string as a format string but 886*946379e7Schristosit really is one and it should be tested. Please note that when the 887*946379e7Schristoscomment is in the same line as the <CODE>gettext</CODE> keyword, it must be 888*946379e7Schristosbefore the string to be translated. 889*946379e7Schristos 890*946379e7Schristos</P> 891*946379e7Schristos<P> 892*946379e7SchristosThis situation happens quite often. The <CODE>printf</CODE> function is often 893*946379e7Schristoscalled with strings which do not contain a format specifier. Of course 894*946379e7Schristosone would normally use <CODE>fputs</CODE> but it does happen. In this case 895*946379e7Schristos<CODE>xgettext</CODE> does not recognize this as a format string but what 896*946379e7Schristoshappens if the translation introduces a valid format specifier? The 897*946379e7Schristos<CODE>printf</CODE> function will try to access one of the parameters but none 898*946379e7Schristosexists because the original code does not pass any parameters. 899*946379e7Schristos 900*946379e7Schristos</P> 901*946379e7Schristos<P> 902*946379e7Schristos<CODE>xgettext</CODE> of course could make a wrong decision the other way 903*946379e7Schristosround, i.e. a string marked as a format string actually is not a format 904*946379e7Schristosstring. In this case the <CODE>msgfmt</CODE> might give too many warnings and 905*946379e7Schristoswould prevent translating the <TT>‘.po’</TT> file. The method to prevent 906*946379e7Schristosthis wrong decision is similar to the one used above, only the comment 907*946379e7Schristosto use must contain the string <CODE>xgettext:no-c-format</CODE>. 908*946379e7Schristos 909*946379e7Schristos</P> 910*946379e7Schristos<P> 911*946379e7SchristosIf a string is marked with <CODE>c-format</CODE> and this is not correct the 912*946379e7Schristosuser can find out who is responsible for the decision. See 913*946379e7Schristossection <A HREF="gettext_5.html#SEC22">5.1 Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be 914*946379e7Schristosused for solving this problem. 915*946379e7Schristos 916*946379e7Schristos</P> 917*946379e7Schristos 918*946379e7Schristos 919*946379e7Schristos<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">4.7 Special Cases of Translatable Strings</A></H2> 920*946379e7Schristos 921*946379e7Schristos<P> 922*946379e7Schristos<A NAME="IDX141"></A> 923*946379e7SchristosThe attentive reader might now point out that it is not always possible 924*946379e7Schristosto mark translatable string with <CODE>gettext</CODE> or something like this. 925*946379e7SchristosConsider the following case: 926*946379e7Schristos 927*946379e7Schristos</P> 928*946379e7Schristos 929*946379e7Schristos<PRE> 930*946379e7Schristos{ 931*946379e7Schristos static const char *messages[] = { 932*946379e7Schristos "some very meaningful message", 933*946379e7Schristos "and another one" 934*946379e7Schristos }; 935*946379e7Schristos const char *string; 936*946379e7Schristos ... 937*946379e7Schristos string 938*946379e7Schristos = index > 1 ? "a default message" : messages[index]; 939*946379e7Schristos 940*946379e7Schristos fputs (string); 941*946379e7Schristos ... 942*946379e7Schristos} 943*946379e7Schristos</PRE> 944*946379e7Schristos 945*946379e7Schristos<P> 946*946379e7SchristosWhile it is no problem to mark the string <CODE>"a default message"</CODE> it 947*946379e7Schristosis not possible to mark the string initializers for <CODE>messages</CODE>. 948*946379e7SchristosWhat is to be done? We have to fulfill two tasks. First we have to mark the 949*946379e7Schristosstrings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_5.html#SEC22">5.1 Invoking the <CODE>xgettext</CODE> Program</A>) 950*946379e7Schristoscan find them, and second we have to translate the string at runtime 951*946379e7Schristosbefore printing them. 952*946379e7Schristos 953*946379e7Schristos</P> 954*946379e7Schristos<P> 955*946379e7SchristosThe first task can be fulfilled by creating a new keyword, which names a 956*946379e7Schristosno-op. For the second we have to mark all access points to a string 957*946379e7Schristosfrom the array. So one solution can look like this: 958*946379e7Schristos 959*946379e7Schristos</P> 960*946379e7Schristos 961*946379e7Schristos<PRE> 962*946379e7Schristos#define gettext_noop(String) String 963*946379e7Schristos 964*946379e7Schristos{ 965*946379e7Schristos static const char *messages[] = { 966*946379e7Schristos gettext_noop ("some very meaningful message"), 967*946379e7Schristos gettext_noop ("and another one") 968*946379e7Schristos }; 969*946379e7Schristos const char *string; 970*946379e7Schristos ... 971*946379e7Schristos string 972*946379e7Schristos = index > 1 ? gettext ("a default message") : gettext (messages[index]); 973*946379e7Schristos 974*946379e7Schristos fputs (string); 975*946379e7Schristos ... 976*946379e7Schristos} 977*946379e7Schristos</PRE> 978*946379e7Schristos 979*946379e7Schristos<P> 980*946379e7SchristosPlease convince yourself that the string which is written by 981*946379e7Schristos<CODE>fputs</CODE> is translated in any case. How to get <CODE>xgettext</CODE> know 982*946379e7Schristosthe additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_5.html#SEC22">5.1 Invoking the <CODE>xgettext</CODE> Program</A>. 983*946379e7Schristos 984*946379e7Schristos</P> 985*946379e7Schristos<P> 986*946379e7SchristosThe above is of course not the only solution. You could also come along 987*946379e7Schristoswith the following one: 988*946379e7Schristos 989*946379e7Schristos</P> 990*946379e7Schristos 991*946379e7Schristos<PRE> 992*946379e7Schristos#define gettext_noop(String) String 993*946379e7Schristos 994*946379e7Schristos{ 995*946379e7Schristos static const char *messages[] = { 996*946379e7Schristos gettext_noop ("some very meaningful message", 997*946379e7Schristos gettext_noop ("and another one") 998*946379e7Schristos }; 999*946379e7Schristos const char *string; 1000*946379e7Schristos ... 1001*946379e7Schristos string 1002*946379e7Schristos = index > 1 ? gettext_noop ("a default message") : messages[index]; 1003*946379e7Schristos 1004*946379e7Schristos fputs (gettext (string)); 1005*946379e7Schristos ... 1006*946379e7Schristos} 1007*946379e7Schristos</PRE> 1008*946379e7Schristos 1009*946379e7Schristos<P> 1010*946379e7SchristosBut this has a drawback. The programmer has to take care that 1011*946379e7Schristoshe uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>. 1012*946379e7SchristosA use of <CODE>gettext</CODE> could have in rare cases unpredictable results. 1013*946379e7Schristos 1014*946379e7Schristos</P> 1015*946379e7Schristos<P> 1016*946379e7SchristosOne advantage is that you need not make control flow analysis to make 1017*946379e7Schristossure the output is really translated in any case. But this analysis is 1018*946379e7Schristosgenerally not very difficult. If it should be in any situation you can 1019*946379e7Schristosuse this second method in this situation. 1020*946379e7Schristos 1021*946379e7Schristos</P> 1022*946379e7Schristos 1023*946379e7Schristos 1024*946379e7Schristos<H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4.8 Marking Proper Names for Translation</A></H2> 1025*946379e7Schristos 1026*946379e7Schristos<P> 1027*946379e7SchristosShould names of persons, cities, locations etc. be marked for translation 1028*946379e7Schristosor not? People who only know languages that can be written with Latin 1029*946379e7Schristosletters (English, Spanish, French, German, etc.) are tempted to say “no”, 1030*946379e7Schristosbecause names usually do not change when transported between these languages. 1031*946379e7SchristosHowever, in general when translating from one script to another, names 1032*946379e7Schristosare translated too, usually phonetically or by transliteration. For 1033*946379e7Schristosexample, Russian or Greek names are converted to the Latin alphabet when 1034*946379e7Schristosbeing translated to English, and English or French names are converted 1035*946379e7Schristosto the Katakana script when being translated to Japanese. This is 1036*946379e7Schristosnecessary because the speakers of the target language in general cannot 1037*946379e7Schristosread the script the name is originally written in. 1038*946379e7Schristos 1039*946379e7Schristos</P> 1040*946379e7Schristos<P> 1041*946379e7SchristosAs a programmer, you should therefore make sure that names are marked 1042*946379e7Schristosfor translation, with a special comment telling the translators that it 1043*946379e7Schristosis a proper name and how to pronounce it. Like this: 1044*946379e7Schristos 1045*946379e7Schristos</P> 1046*946379e7Schristos 1047*946379e7Schristos<PRE> 1048*946379e7Schristosprintf (_("Written by %s.\n"), 1049*946379e7Schristos /* TRANSLATORS: This is a proper name. See the gettext 1050*946379e7Schristos manual, section Names. Note this is actually a non-ASCII 1051*946379e7Schristos name: The first name is (with Unicode escapes) 1052*946379e7Schristos "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". 1053*946379e7Schristos Pronunciation is like "fraa-swa pee-nar". */ 1054*946379e7Schristos _("Francois Pinard")); 1055*946379e7Schristos</PRE> 1056*946379e7Schristos 1057*946379e7Schristos<P> 1058*946379e7SchristosAs a translator, you should use some care when translating names, because 1059*946379e7Schristosit is frustrating if people see their names mutilated or distorted. If 1060*946379e7Schristosyour language uses the Latin script, all you need to do is to reproduce 1061*946379e7Schristosthe name as perfectly as you can within the usual character set of your 1062*946379e7Schristoslanguage. In this particular case, this means to provide a translation 1063*946379e7Schristoscontaining the c-cedilla character. If your language uses a different 1064*946379e7Schristosscript and the people speaking it don't usually read Latin words, it means 1065*946379e7Schristostransliteration; but you should still give, in parentheses, the original 1066*946379e7Schristoswriting of the name -- for the sake of the people that do read the Latin 1067*946379e7Schristosscript. Here is an example, using Greek as the target script: 1068*946379e7Schristos 1069*946379e7Schristos</P> 1070*946379e7Schristos 1071*946379e7Schristos<PRE> 1072*946379e7Schristos#. This is a proper name. See the gettext 1073*946379e7Schristos#. manual, section Names. Note this is actually a non-ASCII 1074*946379e7Schristos#. name: The first name is (with Unicode escapes) 1075*946379e7Schristos#. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois". 1076*946379e7Schristos#. Pronunciation is like "fraa-swa pee-nar". 1077*946379e7Schristosmsgid "Francois Pinard" 1078*946379e7Schristosmsgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" 1079*946379e7Schristos " (Francois Pinard)" 1080*946379e7Schristos</PRE> 1081*946379e7Schristos 1082*946379e7Schristos<P> 1083*946379e7SchristosBecause translation of names is such a sensitive domain, it is a good 1084*946379e7Schristosidea to test your translation before submitting it. 1085*946379e7Schristos 1086*946379e7Schristos</P> 1087*946379e7Schristos<P> 1088*946379e7SchristosThe translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A> 1089*946379e7Schristoshas set up a POT file and translation domain consisting of program author 1090*946379e7Schristosnames, with better facilities for the translator than those presented here. 1091*946379e7SchristosNamely, there the original name is written directly in Unicode (rather 1092*946379e7Schristosthan with Unicode escapes or HTML entities), and the pronunciation is 1093*946379e7Schristosdenoted using the International Phonetic Alphabet (see 1094*946379e7Schristos<A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>). 1095*946379e7Schristos 1096*946379e7Schristos</P> 1097*946379e7Schristos<P> 1098*946379e7SchristosHowever, we don't recommend this approach for all POT files in all packages, 1099*946379e7Schristosbecause this would force translators to use PO files in UTF-8 encoding, 1100*946379e7Schristoswhich is - in the current state of software (as of 2003) - a major hassle 1101*946379e7Schristosfor translators using GNU Emacs or XEmacs with po-mode. 1102*946379e7Schristos 1103*946379e7Schristos</P> 1104*946379e7Schristos 1105*946379e7Schristos 1106*946379e7Schristos<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.9 Preparing Library Sources</A></H2> 1107*946379e7Schristos 1108*946379e7Schristos<P> 1109*946379e7SchristosWhen you are preparing a library, not a program, for the use of 1110*946379e7Schristos<CODE>gettext</CODE>, only a few details are different. Here we assume that 1111*946379e7Schristosthe library has a translation domain and a POT file of its own. (If 1112*946379e7Schristosit uses the translation domain and POT file of the main program, then 1113*946379e7Schristosthe previous sections apply without changes.) 1114*946379e7Schristos 1115*946379e7Schristos</P> 1116*946379e7Schristos 1117*946379e7Schristos<OL> 1118*946379e7Schristos<LI> 1119*946379e7Schristos 1120*946379e7SchristosThe library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>. It's the 1121*946379e7Schristosresponsibility of the main program to set the locale. The library's 1122*946379e7Schristosdocumentation should mention this fact, so that developers of programs 1123*946379e7Schristosusing the library are aware of it. 1124*946379e7Schristos 1125*946379e7Schristos<LI> 1126*946379e7Schristos 1127*946379e7SchristosThe library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it 1128*946379e7Schristoswould interfere with the text domain set by the main program. 1129*946379e7Schristos 1130*946379e7Schristos<LI> 1131*946379e7Schristos 1132*946379e7SchristosThe initialization code for a program was 1133*946379e7Schristos 1134*946379e7Schristos 1135*946379e7Schristos<PRE> 1136*946379e7Schristos setlocale (LC_ALL, ""); 1137*946379e7Schristos bindtextdomain (PACKAGE, LOCALEDIR); 1138*946379e7Schristos textdomain (PACKAGE); 1139*946379e7Schristos</PRE> 1140*946379e7Schristos 1141*946379e7SchristosFor a library it is reduced to 1142*946379e7Schristos 1143*946379e7Schristos 1144*946379e7Schristos<PRE> 1145*946379e7Schristos bindtextdomain (PACKAGE, LOCALEDIR); 1146*946379e7Schristos</PRE> 1147*946379e7Schristos 1148*946379e7SchristosIf your library's API doesn't already have an initialization function, 1149*946379e7Schristosyou need to create one, containing at least the <CODE>bindtextdomain</CODE> 1150*946379e7Schristosinvocation. However, you usually don't need to export and document this 1151*946379e7Schristosinitialization function: It is sufficient that all entry points of the 1152*946379e7Schristoslibrary call the initialization function if it hasn't been called before. 1153*946379e7SchristosThe typical idiom used to achieve this is a static boolean variable that 1154*946379e7Schristosindicates whether the initialization function has been called. Like this: 1155*946379e7Schristos 1156*946379e7Schristos 1157*946379e7Schristos<PRE> 1158*946379e7Schristosstatic bool libfoo_initialized; 1159*946379e7Schristos 1160*946379e7Schristosstatic void 1161*946379e7Schristoslibfoo_initialize (void) 1162*946379e7Schristos{ 1163*946379e7Schristos bindtextdomain (PACKAGE, LOCALEDIR); 1164*946379e7Schristos libfoo_initialized = true; 1165*946379e7Schristos} 1166*946379e7Schristos 1167*946379e7Schristos/* This function is part of the exported API. */ 1168*946379e7Schristosstruct foo * 1169*946379e7Schristoscreate_foo (...) 1170*946379e7Schristos{ 1171*946379e7Schristos /* Must ensure the initialization is performed. */ 1172*946379e7Schristos if (!libfoo_initialized) 1173*946379e7Schristos libfoo_initialize (); 1174*946379e7Schristos ... 1175*946379e7Schristos} 1176*946379e7Schristos 1177*946379e7Schristos/* This function is part of the exported API. The argument must be 1178*946379e7Schristos non-NULL and have been created through create_foo(). */ 1179*946379e7Schristosint 1180*946379e7Schristosfoo_refcount (struct foo *argument) 1181*946379e7Schristos{ 1182*946379e7Schristos /* No need to invoke the initialization function here, because 1183*946379e7Schristos create_foo() must already have been called before. */ 1184*946379e7Schristos ... 1185*946379e7Schristos} 1186*946379e7Schristos</PRE> 1187*946379e7Schristos 1188*946379e7Schristos<LI> 1189*946379e7Schristos 1190*946379e7SchristosThe usual declaration of the <SAMP>‘_’</SAMP> macro in each source file was 1191*946379e7Schristos 1192*946379e7Schristos 1193*946379e7Schristos<PRE> 1194*946379e7Schristos#include <libintl.h> 1195*946379e7Schristos#define _(String) gettext (String) 1196*946379e7Schristos</PRE> 1197*946379e7Schristos 1198*946379e7Schristosfor a program. For a library, which has its own translation domain, 1199*946379e7Schristosit reads like this: 1200*946379e7Schristos 1201*946379e7Schristos 1202*946379e7Schristos<PRE> 1203*946379e7Schristos#include <libintl.h> 1204*946379e7Schristos#define _(String) dgettext (PACKAGE, String) 1205*946379e7Schristos</PRE> 1206*946379e7Schristos 1207*946379e7SchristosIn other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>. 1208*946379e7SchristosSimilarly, the <CODE>dngettext</CODE> function should be used in place of the 1209*946379e7Schristos<CODE>ngettext</CODE> function. 1210*946379e7Schristos</OL> 1211*946379e7Schristos 1212*946379e7Schristos<P><HR><P> 1213*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 1214*946379e7Schristos</BODY> 1215*946379e7Schristos</HTML> 1216