xref: /netbsd-src/external/gpl2/gettext/dist/gettext-tools/doc/gettext_4.html (revision 946379e7b37692fc43f68eb0d1c10daa0a7f3b6c)
1*946379e7Schristos<HTML>
2*946379e7Schristos<HEAD>
3*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b
4*946379e7Schristos     from gettext.texi on 27 November 2006 -->
5*946379e7Schristos
6*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7*946379e7Schristos<TITLE>GNU gettext utilities - 4  Preparing Program Sources</TITLE>
8*946379e7Schristos</HEAD>
9*946379e7Schristos<BODY>
10*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11*946379e7Schristos<P><HR><P>
12*946379e7Schristos
13*946379e7Schristos
14*946379e7Schristos<H1><A NAME="SEC11" HREF="gettext_toc.html#TOC11">4  Preparing Program Sources</A></H1>
15*946379e7Schristos<P>
16*946379e7Schristos<A NAME="IDX102"></A>
17*946379e7Schristos
18*946379e7Schristos</P>
19*946379e7Schristos
20*946379e7Schristos<P>
21*946379e7SchristosFor the programmer, changes to the C source code fall into three
22*946379e7Schristoscategories.  First, you have to make the localization functions
23*946379e7Schristosknown to all modules needing message translation.  Second, you should
24*946379e7Schristosproperly trigger the operation of GNU <CODE>gettext</CODE> when the program
25*946379e7Schristosinitializes, usually from the <CODE>main</CODE> function.  Last, you should
26*946379e7Schristosidentify, adjust and mark all constant strings in your program
27*946379e7Schristosneeding translation.
28*946379e7Schristos
29*946379e7Schristos</P>
30*946379e7Schristos
31*946379e7Schristos
32*946379e7Schristos
33*946379e7Schristos<H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">4.1  Importing the <CODE>gettext</CODE> declaration</A></H2>
34*946379e7Schristos
35*946379e7Schristos<P>
36*946379e7SchristosPresuming that your set of programs, or package, has been adjusted
37*946379e7Schristosso all needed GNU <CODE>gettext</CODE> files are available, and your
38*946379e7Schristos<TT>&lsquo;Makefile&rsquo;</TT> files are adjusted (see section <A HREF="gettext_13.html#SEC196">13  The Maintainer's View</A>), each C module
39*946379e7Schristoshaving translated C strings should contain the line:
40*946379e7Schristos
41*946379e7Schristos</P>
42*946379e7Schristos<P>
43*946379e7Schristos<A NAME="IDX103"></A>
44*946379e7Schristos
45*946379e7Schristos<PRE>
46*946379e7Schristos#include &#60;libintl.h&#62;
47*946379e7Schristos</PRE>
48*946379e7Schristos
49*946379e7Schristos<P>
50*946379e7SchristosSimilarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/...
51*946379e7Schristoscalls with a format string that could be a translated C string (even if
52*946379e7Schristosthe C string comes from a different C module) should contain the line:
53*946379e7Schristos
54*946379e7Schristos</P>
55*946379e7Schristos
56*946379e7Schristos<PRE>
57*946379e7Schristos#include &#60;libintl.h&#62;
58*946379e7Schristos</PRE>
59*946379e7Schristos
60*946379e7Schristos
61*946379e7Schristos
62*946379e7Schristos<H2><A NAME="SEC13" HREF="gettext_toc.html#TOC13">4.2  Triggering <CODE>gettext</CODE> Operations</A></H2>
63*946379e7Schristos
64*946379e7Schristos<P>
65*946379e7Schristos<A NAME="IDX104"></A>
66*946379e7SchristosThe initialization of locale data should be done with more or less
67*946379e7Schristosthe same code in every program, as demonstrated below:
68*946379e7Schristos
69*946379e7Schristos</P>
70*946379e7Schristos
71*946379e7Schristos<PRE>
72*946379e7Schristosint
73*946379e7Schristosmain (int argc, char *argv[])
74*946379e7Schristos{
75*946379e7Schristos  ...
76*946379e7Schristos  setlocale (LC_ALL, "");
77*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
78*946379e7Schristos  textdomain (PACKAGE);
79*946379e7Schristos  ...
80*946379e7Schristos}
81*946379e7Schristos</PRE>
82*946379e7Schristos
83*946379e7Schristos<P>
84*946379e7Schristos<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
85*946379e7Schristos<TT>&lsquo;config.h&rsquo;</TT> or by the Makefile.  For now consult the <CODE>gettext</CODE>
86*946379e7Schristosor <CODE>hello</CODE> sources for more information.
87*946379e7Schristos
88*946379e7Schristos</P>
89*946379e7Schristos<P>
90*946379e7Schristos<A NAME="IDX105"></A>
91*946379e7Schristos<A NAME="IDX106"></A>
92*946379e7SchristosThe use of <CODE>LC_ALL</CODE> might not be appropriate for you.
93*946379e7Schristos<CODE>LC_ALL</CODE> includes all locale categories and especially
94*946379e7Schristos<CODE>LC_CTYPE</CODE>.  This later category is responsible for determining
95*946379e7Schristoscharacter classes with the <CODE>isalnum</CODE> etc. functions from
96*946379e7Schristos<TT>&lsquo;ctype.h&rsquo;</TT> which could especially for programs, which process some
97*946379e7Schristoskind of input language, be wrong.  For example this would mean that a
98*946379e7Schristossource code using the &ccedil; (c-cedilla character) is runnable in
99*946379e7SchristosFrance but not in the U.S.
100*946379e7Schristos
101*946379e7Schristos</P>
102*946379e7Schristos<P>
103*946379e7SchristosSome systems also have problems with parsing numbers using the
104*946379e7Schristos<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used.
105*946379e7SchristosThe standards say that additional formats but the one known in the
106*946379e7Schristos<CODE>"C"</CODE> locale might be recognized.  But some systems seem to reject
107*946379e7Schristosnumbers in the <CODE>"C"</CODE> locale format.  In some situation, it might
108*946379e7Schristosalso be a problem with the notation itself which makes it impossible to
109*946379e7Schristosrecognize whether the number is in the <CODE>"C"</CODE> locale or the local
110*946379e7Schristosformat.  This can happen if thousands separator characters are used.
111*946379e7SchristosSome locales define this character according to the national
112*946379e7Schristosconventions to <CODE>'.'</CODE> which is the same character used in the
113*946379e7Schristos<CODE>"C"</CODE> locale to denote the decimal point.
114*946379e7Schristos
115*946379e7Schristos</P>
116*946379e7Schristos<P>
117*946379e7SchristosSo it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
118*946379e7Schristoscode above by a sequence of <CODE>setlocale</CODE> lines
119*946379e7Schristos
120*946379e7Schristos</P>
121*946379e7Schristos
122*946379e7Schristos<PRE>
123*946379e7Schristos{
124*946379e7Schristos  ...
125*946379e7Schristos  setlocale (LC_CTYPE, "");
126*946379e7Schristos  setlocale (LC_MESSAGES, "");
127*946379e7Schristos  ...
128*946379e7Schristos}
129*946379e7Schristos</PRE>
130*946379e7Schristos
131*946379e7Schristos<P>
132*946379e7Schristos<A NAME="IDX107"></A>
133*946379e7Schristos<A NAME="IDX108"></A>
134*946379e7Schristos<A NAME="IDX109"></A>
135*946379e7Schristos<A NAME="IDX110"></A>
136*946379e7Schristos<A NAME="IDX111"></A>
137*946379e7Schristos<A NAME="IDX112"></A>
138*946379e7Schristos<A NAME="IDX113"></A>
139*946379e7SchristosOn all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
140*946379e7Schristos<CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>,
141*946379e7Schristos<CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available.  On some systems
142*946379e7Schristoswhich are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but
143*946379e7Schristosa substitute for it is defined in GNU gettext's <CODE>&#60;libintl.h&#62;</CODE>.
144*946379e7Schristos
145*946379e7Schristos</P>
146*946379e7Schristos<P>
147*946379e7SchristosNote that changing the <CODE>LC_CTYPE</CODE> also affects the functions
148*946379e7Schristosdeclared in the <CODE>&#60;ctype.h&#62;</CODE> standard header.  If this is not
149*946379e7Schristosdesirable in your application (for example in a compiler's parser),
150*946379e7Schristosyou can use a set of substitute functions which hardwire the C locale,
151*946379e7Schristossuch as found in the <CODE>&#60;c-ctype.h&#62;</CODE> and <CODE>&#60;c-ctype.c&#62;</CODE> files
152*946379e7Schristosin the gettext source distribution.
153*946379e7Schristos
154*946379e7Schristos</P>
155*946379e7Schristos<P>
156*946379e7SchristosIt is also possible to switch the locale forth and back between the
157*946379e7Schristosenvironment dependent locale and the C locale, but this approach is
158*946379e7Schristosnormally avoided because a <CODE>setlocale</CODE> call is expensive,
159*946379e7Schristosbecause it is tedious to determine the places where a locale switch
160*946379e7Schristosis needed in a large program's source, and because switching a locale
161*946379e7Schristosis not multithread-safe.
162*946379e7Schristos
163*946379e7Schristos</P>
164*946379e7Schristos
165*946379e7Schristos
166*946379e7Schristos<H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">4.3  Preparing Translatable Strings</A></H2>
167*946379e7Schristos
168*946379e7Schristos<P>
169*946379e7Schristos<A NAME="IDX114"></A>
170*946379e7SchristosBefore strings can be marked for translations, they sometimes need to
171*946379e7Schristosbe adjusted.  Usually preparing a string for translation is done right
172*946379e7Schristosbefore marking it, during the marking phase which is described in the
173*946379e7Schristosnext sections.  What you have to keep in mind while doing that is the
174*946379e7Schristosfollowing.
175*946379e7Schristos
176*946379e7Schristos</P>
177*946379e7Schristos
178*946379e7Schristos<UL>
179*946379e7Schristos<LI>
180*946379e7Schristos
181*946379e7SchristosDecent English style.
182*946379e7Schristos
183*946379e7Schristos<LI>
184*946379e7Schristos
185*946379e7SchristosEntire sentences.
186*946379e7Schristos
187*946379e7Schristos<LI>
188*946379e7Schristos
189*946379e7SchristosSplit at paragraphs.
190*946379e7Schristos
191*946379e7Schristos<LI>
192*946379e7Schristos
193*946379e7SchristosUse format strings instead of string concatenation.
194*946379e7Schristos
195*946379e7Schristos<LI>
196*946379e7Schristos
197*946379e7SchristosAvoid unusual markup and unusual control characters.
198*946379e7Schristos</UL>
199*946379e7Schristos
200*946379e7Schristos<P>
201*946379e7SchristosLet's look at some examples of these guidelines.
202*946379e7Schristos
203*946379e7Schristos</P>
204*946379e7Schristos<P>
205*946379e7Schristos<A NAME="IDX115"></A>
206*946379e7SchristosTranslatable strings should be in good English style.  If slang language
207*946379e7Schristoswith abbreviations and shortcuts is used, often translators will not
208*946379e7Schristosunderstand the message and will produce very inappropriate translations.
209*946379e7Schristos
210*946379e7Schristos</P>
211*946379e7Schristos
212*946379e7Schristos<PRE>
213*946379e7Schristos"%s: is parameter\n"
214*946379e7Schristos</PRE>
215*946379e7Schristos
216*946379e7Schristos<P>
217*946379e7SchristosThis is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or
218*946379e7Schristos<EM>the</EM> parameter?
219*946379e7Schristos
220*946379e7Schristos</P>
221*946379e7Schristos
222*946379e7Schristos<PRE>
223*946379e7Schristos"No match"
224*946379e7Schristos</PRE>
225*946379e7Schristos
226*946379e7Schristos<P>
227*946379e7SchristosThe ambiguity in this message makes it unintelligible: Is the program
228*946379e7Schristosattempting to set something on fire? Does it mean "The given object does
229*946379e7Schristosnot match the template"? Does it mean "The template does not fit for any
230*946379e7Schristosof the objects"?
231*946379e7Schristos
232*946379e7Schristos</P>
233*946379e7Schristos<P>
234*946379e7Schristos<A NAME="IDX116"></A>
235*946379e7SchristosIn both cases, adding more words to the message will help both the
236*946379e7Schristostranslator and the English speaking user.
237*946379e7Schristos
238*946379e7Schristos</P>
239*946379e7Schristos<P>
240*946379e7Schristos<A NAME="IDX117"></A>
241*946379e7SchristosTranslatable strings should be entire sentences.  It is often not possible
242*946379e7Schristosto translate single verbs or adjectives in a substitutable way.
243*946379e7Schristos
244*946379e7Schristos</P>
245*946379e7Schristos
246*946379e7Schristos<PRE>
247*946379e7Schristosprintf ("File %s is %s protected", filename, rw ? "write" : "read");
248*946379e7Schristos</PRE>
249*946379e7Schristos
250*946379e7Schristos<P>
251*946379e7SchristosMost translators will not look at the source and will thus only see the
252*946379e7Schristosstring <CODE>"File %s is %s protected"</CODE>, which is unintelligible.  Change
253*946379e7Schristosthis to
254*946379e7Schristos
255*946379e7Schristos</P>
256*946379e7Schristos
257*946379e7Schristos<PRE>
258*946379e7Schristosprintf (rw ? "File %s is write protected" : "File %s is read protected",
259*946379e7Schristos        filename);
260*946379e7Schristos</PRE>
261*946379e7Schristos
262*946379e7Schristos<P>
263*946379e7SchristosThis way the translator will not only understand the message, she will
264*946379e7Schristosalso be able to find the appropriate grammatical construction.  A French
265*946379e7Schristostranslator for example translates "write protected" like "protected
266*946379e7Schristosagainst writing".
267*946379e7Schristos
268*946379e7Schristos</P>
269*946379e7Schristos<P>
270*946379e7SchristosEntire sentences are also important because in many languages, the
271*946379e7Schristosdeclination of some word in a sentence depends on the gender or the
272*946379e7Schristosnumber (singular/plural) of another part of the sentence.  There are
273*946379e7Schristosusually more interdependencies between words than in English.  The
274*946379e7Schristosconsequence is that asking a translator to translate two half-sentences
275*946379e7Schristosand then combining these two half-sentences through dumb string concatenation
276*946379e7Schristoswill not work, for many languages, even though it would work for English.
277*946379e7SchristosThat's why translators need to handle entire sentences.
278*946379e7Schristos
279*946379e7Schristos</P>
280*946379e7Schristos<P>
281*946379e7SchristosOften sentences don't fit into a single line.  If a sentence is output
282*946379e7Schristosusing two subsequent <CODE>printf</CODE> statements, like this
283*946379e7Schristos
284*946379e7Schristos</P>
285*946379e7Schristos
286*946379e7Schristos<PRE>
287*946379e7Schristosprintf ("Locale charset \"%s\" is different from\n", lcharset);
288*946379e7Schristosprintf ("input file charset \"%s\".\n", fcharset);
289*946379e7Schristos</PRE>
290*946379e7Schristos
291*946379e7Schristos<P>
292*946379e7Schristosthe translator would have to translate two half sentences, but nothing
293*946379e7Schristosin the POT file would tell her that the two half sentences belong together.
294*946379e7SchristosIt is necessary to merge the two <CODE>printf</CODE> statements so that the
295*946379e7Schristostranslator can handle the entire sentence at once and decide at which
296*946379e7Schristosplace to insert a line break in the translation (if at all):
297*946379e7Schristos
298*946379e7Schristos</P>
299*946379e7Schristos
300*946379e7Schristos<PRE>
301*946379e7Schristosprintf ("Locale charset \"%s\" is different from\n\
302*946379e7Schristosinput file charset \"%s\".\n", lcharset, fcharset);
303*946379e7Schristos</PRE>
304*946379e7Schristos
305*946379e7Schristos<P>
306*946379e7SchristosYou may now ask: how about two or more adjacent sentences? Like in this case:
307*946379e7Schristos
308*946379e7Schristos</P>
309*946379e7Schristos
310*946379e7Schristos<PRE>
311*946379e7Schristosputs ("Apollo 13 scenario: Stack overflow handling failed.");
312*946379e7Schristosputs ("On the next stack overflow we will crash!!!");
313*946379e7Schristos</PRE>
314*946379e7Schristos
315*946379e7Schristos<P>
316*946379e7SchristosShould these two statements merged into a single one? I would recommend to
317*946379e7Schristosmerge them if the two sentences are related to each other, because then it
318*946379e7Schristosmakes it easier for the translator to understand and translate both.  On
319*946379e7Schristosthe other hand, if one of the two messages is a stereotypic one, occurring
320*946379e7Schristosin other places as well, you will do a favour to the translator by not
321*946379e7Schristosmerging the two.  (Identical messages occurring in several places are
322*946379e7Schristoscombined by xgettext, so the translator has to handle them once only.)
323*946379e7Schristos
324*946379e7Schristos</P>
325*946379e7Schristos<P>
326*946379e7Schristos<A NAME="IDX118"></A>
327*946379e7SchristosTranslatable strings should be limited to one paragraph; don't let a
328*946379e7Schristossingle message be longer than ten lines.  The reason is that when the
329*946379e7Schristostranslatable string changes, the translator is faced with the task of
330*946379e7Schristosupdating the entire translated string.  Maybe only a single word will
331*946379e7Schristoshave changed in the English string, but the translator doesn't see that
332*946379e7Schristos(with the current translation tools), therefore she has to proofread
333*946379e7Schristosthe entire message.
334*946379e7Schristos
335*946379e7Schristos</P>
336*946379e7Schristos<P>
337*946379e7Schristos<A NAME="IDX119"></A>
338*946379e7SchristosMany GNU programs have a <SAMP>&lsquo;--help&rsquo;</SAMP> output that extends over several
339*946379e7Schristosscreen pages.  It is a courtesy towards the translators to split such a
340*946379e7Schristosmessage into several ones of five to ten lines each.  While doing that,
341*946379e7Schristosyou can also attempt to split the documented options into groups,
342*946379e7Schristossuch as the input options, the output options, and the informative
343*946379e7Schristosoutput options.  This will help every user to find the option he is
344*946379e7Schristoslooking for.
345*946379e7Schristos
346*946379e7Schristos</P>
347*946379e7Schristos<P>
348*946379e7Schristos<A NAME="IDX120"></A>
349*946379e7Schristos<A NAME="IDX121"></A>
350*946379e7SchristosHardcoded string concatenation is sometimes used to construct English
351*946379e7Schristosstrings:
352*946379e7Schristos
353*946379e7Schristos</P>
354*946379e7Schristos
355*946379e7Schristos<PRE>
356*946379e7Schristosstrcpy (s, "Replace ");
357*946379e7Schristosstrcat (s, object1);
358*946379e7Schristosstrcat (s, " with ");
359*946379e7Schristosstrcat (s, object2);
360*946379e7Schristosstrcat (s, "?");
361*946379e7Schristos</PRE>
362*946379e7Schristos
363*946379e7Schristos<P>
364*946379e7SchristosIn order to present to the translator only entire sentences, and also
365*946379e7Schristosbecause in some languages the translator might want to swap the order
366*946379e7Schristosof <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this
367*946379e7Schristosto use a format string:
368*946379e7Schristos
369*946379e7Schristos</P>
370*946379e7Schristos
371*946379e7Schristos<PRE>
372*946379e7Schristossprintf (s, "Replace %s with %s?", object1, object2);
373*946379e7Schristos</PRE>
374*946379e7Schristos
375*946379e7Schristos<P>
376*946379e7Schristos<A NAME="IDX122"></A>
377*946379e7SchristosA similar case is compile time concatenation of strings.  The ISO C 99
378*946379e7Schristosinclude file <CODE>&#60;inttypes.h&#62;</CODE> contains a macro <CODE>PRId64</CODE> that
379*946379e7Schristoscan be used as a formatting directive for outputting an <SAMP>&lsquo;int64_t&rsquo;</SAMP>
380*946379e7Schristosinteger through <CODE>printf</CODE>.  It expands to a constant string, usually
381*946379e7Schristos"d" or "ld" or "lld" or something like this, depending on the platform.
382*946379e7SchristosAssume you have code like
383*946379e7Schristos
384*946379e7Schristos</P>
385*946379e7Schristos
386*946379e7Schristos<PRE>
387*946379e7Schristosprintf ("The amount is %0" PRId64 "\n", number);
388*946379e7Schristos</PRE>
389*946379e7Schristos
390*946379e7Schristos<P>
391*946379e7SchristosThe <CODE>gettext</CODE> tools and library have special support for these
392*946379e7Schristos<CODE>&#60;inttypes.h&#62;</CODE> macros.  You can therefore simply write
393*946379e7Schristos
394*946379e7Schristos</P>
395*946379e7Schristos
396*946379e7Schristos<PRE>
397*946379e7Schristosprintf (gettext ("The amount is %0" PRId64 "\n"), number);
398*946379e7Schristos</PRE>
399*946379e7Schristos
400*946379e7Schristos<P>
401*946379e7SchristosThe PO file will contain the string "The amount is %0&#60;PRId64&#62;\n".
402*946379e7SchristosThe translators will provide a translation containing "%0&#60;PRId64&#62;"
403*946379e7Schristosas well, and at runtime the <CODE>gettext</CODE> function's result will
404*946379e7Schristoscontain the appropriate constant string, "d" or "ld" or "lld".
405*946379e7Schristos
406*946379e7Schristos</P>
407*946379e7Schristos<P>
408*946379e7SchristosThis works only for the predefined <CODE>&#60;inttypes.h&#62;</CODE> macros.  If
409*946379e7Schristosyou have defined your own similar macros, let's say <SAMP>&lsquo;MYPRId64&rsquo;</SAMP>,
410*946379e7Schristosthat are not known to <CODE>xgettext</CODE>, the solution for this problem
411*946379e7Schristosis to change the code like this:
412*946379e7Schristos
413*946379e7Schristos</P>
414*946379e7Schristos
415*946379e7Schristos<PRE>
416*946379e7Schristoschar buf1[100];
417*946379e7Schristossprintf (buf1, "%0" MYPRId64, number);
418*946379e7Schristosprintf (gettext ("The amount is %s\n"), buf1);
419*946379e7Schristos</PRE>
420*946379e7Schristos
421*946379e7Schristos<P>
422*946379e7SchristosThis means, you put the platform dependent code in one statement, and the
423*946379e7Schristosinternationalization code in a different statement.  Note that a buffer length
424*946379e7Schristosof 100 is safe, because all available hardware integer types are limited to
425*946379e7Schristos128 bits, and to print a 128 bit integer one needs at most 54 characters,
426*946379e7Schristosregardless whether in decimal, octal or hexadecimal.
427*946379e7Schristos
428*946379e7Schristos</P>
429*946379e7Schristos<P>
430*946379e7Schristos<A NAME="IDX123"></A>
431*946379e7Schristos<A NAME="IDX124"></A>
432*946379e7SchristosAll this applies to other programming languages as well.  For example, in
433*946379e7SchristosJava and C#, string concatenation is very frequently used, because it is a
434*946379e7Schristoscompiler built-in operator.  Like in C, in Java, you would change
435*946379e7Schristos
436*946379e7Schristos</P>
437*946379e7Schristos
438*946379e7Schristos<PRE>
439*946379e7SchristosSystem.out.println("Replace "+object1+" with "+object2+"?");
440*946379e7Schristos</PRE>
441*946379e7Schristos
442*946379e7Schristos<P>
443*946379e7Schristosinto a statement involving a format string:
444*946379e7Schristos
445*946379e7Schristos</P>
446*946379e7Schristos
447*946379e7Schristos<PRE>
448*946379e7SchristosSystem.out.println(
449*946379e7Schristos    MessageFormat.format("Replace {0} with {1}?",
450*946379e7Schristos                         new Object[] { object1, object2 }));
451*946379e7Schristos</PRE>
452*946379e7Schristos
453*946379e7Schristos<P>
454*946379e7SchristosSimilarly, in C#, you would change
455*946379e7Schristos
456*946379e7Schristos</P>
457*946379e7Schristos
458*946379e7Schristos<PRE>
459*946379e7SchristosConsole.WriteLine("Replace "+object1+" with "+object2+"?");
460*946379e7Schristos</PRE>
461*946379e7Schristos
462*946379e7Schristos<P>
463*946379e7Schristosinto a statement involving a format string:
464*946379e7Schristos
465*946379e7Schristos</P>
466*946379e7Schristos
467*946379e7Schristos<PRE>
468*946379e7SchristosConsole.WriteLine(
469*946379e7Schristos    String.Format("Replace {0} with {1}?", object1, object2));
470*946379e7Schristos</PRE>
471*946379e7Schristos
472*946379e7Schristos<P>
473*946379e7Schristos<A NAME="IDX125"></A>
474*946379e7Schristos<A NAME="IDX126"></A>
475*946379e7SchristosUnusual markup or control characters should not be used in translatable
476*946379e7Schristosstrings.  Translators will likely not understand the particular meaning
477*946379e7Schristosof the markup or control characters.
478*946379e7Schristos
479*946379e7Schristos</P>
480*946379e7Schristos<P>
481*946379e7SchristosFor example, if you have a convention that <SAMP>&lsquo;|&rsquo;</SAMP> delimits the
482*946379e7Schristosleft-hand and right-hand part of some GUI elements, translators will
483*946379e7Schristosoften not understand it without specific comments.  It might be
484*946379e7Schristosbetter to have the translator translate the left-hand and right-hand
485*946379e7Schristospart separately.
486*946379e7Schristos
487*946379e7Schristos</P>
488*946379e7Schristos<P>
489*946379e7SchristosAnother example is the <SAMP>&lsquo;argp&rsquo;</SAMP> convention to use a single <SAMP>&lsquo;\v&rsquo;</SAMP>
490*946379e7Schristos(vertical tab) control character to delimit two sections inside a
491*946379e7Schristosstring.  This is flawed.  Some translators may convert it to a simple
492*946379e7Schristosnewline, some to blank lines.  With some PO file editors it may not be
493*946379e7Schristoseasy to even enter a vertical tab control character.  So, you cannot
494*946379e7Schristosbe sure that the translation will contain a <SAMP>&lsquo;\v&rsquo;</SAMP> character, at the
495*946379e7Schristoscorresponding position.  The solution is, again, to let the translator
496*946379e7Schristostranslate two separate strings and combine at run-time the two translated
497*946379e7Schristosstrings with the <SAMP>&lsquo;\v&rsquo;</SAMP> required by the convention.
498*946379e7Schristos
499*946379e7Schristos</P>
500*946379e7Schristos<P>
501*946379e7SchristosHTML markup, however, is common enough that it's probably ok to use in
502*946379e7Schristostranslatable strings.  But please bear in mind that the GNU gettext tools
503*946379e7Schristosdon't verify that the translations are well-formed HTML.
504*946379e7Schristos
505*946379e7Schristos</P>
506*946379e7Schristos
507*946379e7Schristos
508*946379e7Schristos<H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">4.4  How Marks Appear in Sources</A></H2>
509*946379e7Schristos<P>
510*946379e7Schristos<A NAME="IDX127"></A>
511*946379e7Schristos
512*946379e7Schristos</P>
513*946379e7Schristos<P>
514*946379e7SchristosAll strings requiring translation should be marked in the C sources.  Marking
515*946379e7Schristosis done in such a way that each translatable string appears to be
516*946379e7Schristosthe sole argument of some function or preprocessor macro.  There are
517*946379e7Schristosonly a few such possible functions or macros meant for translation,
518*946379e7Schristosand their names are said to be marking keywords.  The marking is
519*946379e7Schristosattached to strings themselves, rather than to what we do with them.
520*946379e7SchristosThis approach has more uses.  A blatant example is an error message
521*946379e7Schristosproduced by formatting.  The format string needs translation, as
522*946379e7Schristoswell as some strings inserted through some <SAMP>&lsquo;%s&rsquo;</SAMP> specification
523*946379e7Schristosin the format, while the result from <CODE>sprintf</CODE> may have so many
524*946379e7Schristosdifferent instances that it is impractical to list them all in some
525*946379e7Schristos<SAMP>&lsquo;error_string_out()&rsquo;</SAMP> routine, say.
526*946379e7Schristos
527*946379e7Schristos</P>
528*946379e7Schristos<P>
529*946379e7SchristosThis marking operation has two goals.  The first goal of marking
530*946379e7Schristosis for triggering the retrieval of the translation, at run time.
531*946379e7SchristosThe keyword is possibly resolved into a routine able to dynamically
532*946379e7Schristosreturn the proper translation, as far as possible or wanted, for the
533*946379e7Schristosargument string.  Most localizable strings are found in executable
534*946379e7Schristospositions, that is, attached to variables or given as parameters to
535*946379e7Schristosfunctions.  But this is not universal usage, and some translatable
536*946379e7Schristosstrings appear in structured initializations.  See section <A HREF="gettext_4.html#SEC18">4.7  Special Cases of Translatable Strings</A>.
537*946379e7Schristos
538*946379e7Schristos</P>
539*946379e7Schristos<P>
540*946379e7SchristosThe second goal of the marking operation is to help <CODE>xgettext</CODE>
541*946379e7Schristosat properly extracting all translatable strings when it scans a set
542*946379e7Schristosof program sources and produces PO file templates.
543*946379e7Schristos
544*946379e7Schristos</P>
545*946379e7Schristos<P>
546*946379e7SchristosThe canonical keyword for marking translatable strings is
547*946379e7Schristos<SAMP>&lsquo;gettext&rsquo;</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
548*946379e7Schristospackage.  For packages making only light use of the <SAMP>&lsquo;gettext&rsquo;</SAMP>
549*946379e7Schristoskeyword, macro or function, it is easily used <EM>as is</EM>.  However,
550*946379e7Schristosfor packages using the <CODE>gettext</CODE> interface more heavily, it
551*946379e7Schristosis usually more convenient to give the main keyword a shorter, less
552*946379e7Schristosobtrusive name.  Indeed, the keyword might appear on a lot of strings
553*946379e7Schristosall over the package, and programmers usually do not want nor need
554*946379e7Schristostheir program sources to remind them forcefully, all the time, that they
555*946379e7Schristosare internationalized.  Further, a long keyword has the disadvantage
556*946379e7Schristosof using more horizontal space, forcing more indentation work on
557*946379e7Schristossources for those trying to keep them within 79 or 80 columns.
558*946379e7Schristos
559*946379e7Schristos</P>
560*946379e7Schristos<P>
561*946379e7Schristos<A NAME="IDX128"></A>
562*946379e7SchristosMany packages use <SAMP>&lsquo;_&rsquo;</SAMP> (a simple underline) as a keyword,
563*946379e7Schristosand write <SAMP>&lsquo;_("Translatable string")&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext
564*946379e7Schristos("Translatable string")&rsquo;</SAMP>.  Further, the coding rule, from GNU standards,
565*946379e7Schristoswanting that there is a space between the keyword and the opening
566*946379e7Schristosparenthesis is relaxed, in practice, for this particular usage.
567*946379e7SchristosSo, the textual overhead per translatable string is reduced to
568*946379e7Schristosonly three characters: the underline and the two parentheses.
569*946379e7SchristosHowever, even if GNU <CODE>gettext</CODE> uses this convention internally,
570*946379e7Schristosit does not offer it officially.  The real, genuine keyword is truly
571*946379e7Schristos<SAMP>&lsquo;gettext&rsquo;</SAMP> indeed.  It is fairly easy for those wanting to use
572*946379e7Schristos<SAMP>&lsquo;_&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext&rsquo;</SAMP> to declare:
573*946379e7Schristos
574*946379e7Schristos</P>
575*946379e7Schristos
576*946379e7Schristos<PRE>
577*946379e7Schristos#include &#60;libintl.h&#62;
578*946379e7Schristos#define _(String) gettext (String)
579*946379e7Schristos</PRE>
580*946379e7Schristos
581*946379e7Schristos<P>
582*946379e7Schristosinstead of merely using <SAMP>&lsquo;#include &#60;libintl.h&#62;&rsquo;</SAMP>.
583*946379e7Schristos
584*946379e7Schristos</P>
585*946379e7Schristos<P>
586*946379e7SchristosThe marking keywords <SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> take the translatable
587*946379e7Schristosstring as sole argument.  It is also possible to define marking functions
588*946379e7Schristosthat take it at another argument position.  It is even possible to make
589*946379e7Schristosthe marked argument position depend on the total number of arguments of
590*946379e7Schristosthe function call; this is useful in C++.  All this is achieved using
591*946379e7Schristos<CODE>xgettext</CODE>'s <SAMP>&lsquo;--keyword&rsquo;</SAMP> option.
592*946379e7Schristos
593*946379e7Schristos</P>
594*946379e7Schristos<P>
595*946379e7SchristosNote also that long strings can be split across lines, into multiple
596*946379e7Schristosadjacent string tokens.  Automatic string concatenation is performed
597*946379e7Schristosat compile time according to ISO C and ISO C++; <CODE>xgettext</CODE> also
598*946379e7Schristossupports this syntax.
599*946379e7Schristos
600*946379e7Schristos</P>
601*946379e7Schristos<P>
602*946379e7SchristosLater on, the maintenance is relatively easy.  If, as a programmer,
603*946379e7Schristosyou add or modify a string, you will have to ask yourself if the
604*946379e7Schristosnew or altered string requires translation, and include it within
605*946379e7Schristos<SAMP>&lsquo;_()&rsquo;</SAMP> if you think it should be translated.  For example, <SAMP>&lsquo;"%s"&rsquo;</SAMP>
606*946379e7Schristosis an example of string <EM>not</EM> requiring translation.  But
607*946379e7Schristos<SAMP>&lsquo;"%s: %d"&rsquo;</SAMP> <EM>does</EM> require translation, because in French, unlike
608*946379e7Schristosin English, it's customary to put a space before a colon.
609*946379e7Schristos
610*946379e7Schristos</P>
611*946379e7Schristos
612*946379e7Schristos
613*946379e7Schristos<H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">4.5  Marking Translatable Strings</A></H2>
614*946379e7Schristos<P>
615*946379e7Schristos<A NAME="IDX129"></A>
616*946379e7Schristos
617*946379e7Schristos</P>
618*946379e7Schristos<P>
619*946379e7SchristosIn PO mode, one set of features is meant more for the programmer than
620*946379e7Schristosfor the translator, and allows him to interactively mark which strings,
621*946379e7Schristosin a set of program sources, are translatable, and which are not.
622*946379e7SchristosEven if it is a fairly easy job for a programmer to find and mark
623*946379e7Schristossuch strings by other means, using any editor of his choice, PO mode
624*946379e7Schristosmakes this work more comfortable.  Further, this gives translators
625*946379e7Schristoswho feel a little like programmers, or programmers who feel a little
626*946379e7Schristoslike translators, a tool letting them work at marking translatable
627*946379e7Schristosstrings in the program sources, while simultaneously producing a set of
628*946379e7Schristostranslation in some language, for the package being internationalized.
629*946379e7Schristos
630*946379e7Schristos</P>
631*946379e7Schristos<P>
632*946379e7Schristos<A NAME="IDX130"></A>
633*946379e7SchristosThe set of program sources, targeted by the PO mode commands describe
634*946379e7Schristoshere, should have an Emacs tags table constructed for your project,
635*946379e7Schristosprior to using these PO file commands.  This is easy to do.  In any
636*946379e7Schristosshell window, change the directory to the root of your project, then
637*946379e7Schristosexecute a command resembling:
638*946379e7Schristos
639*946379e7Schristos</P>
640*946379e7Schristos
641*946379e7Schristos<PRE>
642*946379e7Schristosetags src/*.[hc] lib/*.[hc]
643*946379e7Schristos</PRE>
644*946379e7Schristos
645*946379e7Schristos<P>
646*946379e7Schristospresuming here you want to process all <TT>&lsquo;.h&rsquo;</TT> and <TT>&lsquo;.c&rsquo;</TT> files
647*946379e7Schristosfrom the <TT>&lsquo;src/&rsquo;</TT> and <TT>&lsquo;lib/&rsquo;</TT> directories.  This command will
648*946379e7Schristosexplore all said files and create a <TT>&lsquo;TAGS&rsquo;</TT> file in your root
649*946379e7Schristosdirectory, somewhat summarizing the contents using a special file
650*946379e7Schristosformat Emacs can understand.
651*946379e7Schristos
652*946379e7Schristos</P>
653*946379e7Schristos<P>
654*946379e7Schristos<A NAME="IDX131"></A>
655*946379e7SchristosFor packages following the GNU coding standards, there is
656*946379e7Schristosa make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
657*946379e7Schristosall directories and for all files containing source code.
658*946379e7Schristos
659*946379e7Schristos</P>
660*946379e7Schristos<P>
661*946379e7SchristosOnce your <TT>&lsquo;TAGS&rsquo;</TT> file is ready, the following commands assist
662*946379e7Schristosthe programmer at marking translatable strings in his set of sources.
663*946379e7SchristosBut these commands are necessarily driven from within a PO file
664*946379e7Schristoswindow, and it is likely that you do not even have such a PO file yet.
665*946379e7SchristosThis is not a problem at all, as you may safely open a new, empty PO
666*946379e7Schristosfile, mainly for using these commands.  This empty PO file will slowly
667*946379e7Schristosfill in while you mark strings as translatable in your program sources.
668*946379e7Schristos
669*946379e7Schristos</P>
670*946379e7Schristos<DL COMPACT>
671*946379e7Schristos
672*946379e7Schristos<DT><KBD>,</KBD>
673*946379e7Schristos<DD>
674*946379e7Schristos<A NAME="IDX132"></A>
675*946379e7SchristosSearch through program sources for a string which looks like a
676*946379e7Schristoscandidate for translation (<CODE>po-tags-search</CODE>).
677*946379e7Schristos
678*946379e7Schristos<DT><KBD>M-,</KBD>
679*946379e7Schristos<DD>
680*946379e7Schristos<A NAME="IDX133"></A>
681*946379e7SchristosMark the last string found with <SAMP>&lsquo;_()&rsquo;</SAMP> (<CODE>po-mark-translatable</CODE>).
682*946379e7Schristos
683*946379e7Schristos<DT><KBD>M-.</KBD>
684*946379e7Schristos<DD>
685*946379e7Schristos<A NAME="IDX134"></A>
686*946379e7SchristosMark the last string found with a keyword taken from a set of possible
687*946379e7Schristoskeywords.  This command with a prefix allows some management of these
688*946379e7Schristoskeywords (<CODE>po-select-mark-and-mark</CODE>).
689*946379e7Schristos
690*946379e7Schristos</DL>
691*946379e7Schristos
692*946379e7Schristos<P>
693*946379e7Schristos<A NAME="IDX135"></A>
694*946379e7SchristosThe <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
695*946379e7Schristosoccurrence of a string which looks like a possible candidate for
696*946379e7Schristostranslation, and displays the program source in another Emacs window,
697*946379e7Schristospositioned in such a way that the string is near the top of this other
698*946379e7Schristoswindow.  If the string is too big to fit whole in this window, it is
699*946379e7Schristospositioned so only its end is shown.  In any case, the cursor
700*946379e7Schristosis left in the PO file window.  If the shown string would be better
701*946379e7Schristospresented differently in different native languages, you may mark it
702*946379e7Schristosusing <KBD>M-,</KBD> or <KBD>M-.</KBD>.  Otherwise, you might rather ignore it
703*946379e7Schristosand skip to the next string by merely repeating the <KBD>,</KBD> command.
704*946379e7Schristos
705*946379e7Schristos</P>
706*946379e7Schristos<P>
707*946379e7SchristosA string is a good candidate for translation if it contains a sequence
708*946379e7Schristosof three or more letters.  A string containing at most two letters in
709*946379e7Schristosa row will be considered as a candidate if it has more letters than
710*946379e7Schristosnon-letters.  The command disregards strings containing no letters,
711*946379e7Schristosor isolated letters only.  It also disregards strings within comments,
712*946379e7Schristosor strings already marked with some keyword PO mode knows (see below).
713*946379e7Schristos
714*946379e7Schristos</P>
715*946379e7Schristos<P>
716*946379e7SchristosIf you have never told Emacs about some <TT>&lsquo;TAGS&rsquo;</TT> file to use, the
717*946379e7Schristoscommand will request that you specify one from the minibuffer, the
718*946379e7Schristosfirst time you use the command.  You may later change your <TT>&lsquo;TAGS&rsquo;</TT>
719*946379e7Schristosfile by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
720*946379e7Schristoswhich will ask you to name the precise <TT>&lsquo;TAGS&rsquo;</TT> file you want
721*946379e7Schristosto use.  See section ‘Tag Tables’ in <CITE>The Emacs Editor</CITE>.
722*946379e7Schristos
723*946379e7Schristos</P>
724*946379e7Schristos<P>
725*946379e7SchristosEach time you use the <KBD>,</KBD> command, the search resumes from where it was
726*946379e7Schristosleft by the previous search, and goes through all program sources,
727*946379e7Schristosobeying the <TT>&lsquo;TAGS&rsquo;</TT> file, until all sources have been processed.
728*946379e7SchristosHowever, by giving a prefix argument to the command (<KBD>C-u
729*946379e7Schristos,)</KBD>, you may request that the search be restarted all over again
730*946379e7Schristosfrom the first program source; but in this case, strings that you
731*946379e7Schristosrecently marked as translatable will be automatically skipped.
732*946379e7Schristos
733*946379e7Schristos</P>
734*946379e7Schristos<P>
735*946379e7SchristosUsing this <KBD>,</KBD> command does not prevent using of other regular
736*946379e7SchristosEmacs tags commands.  For example, regular <CODE>tags-search</CODE> or
737*946379e7Schristos<CODE>tags-query-replace</CODE> commands may be used without disrupting the
738*946379e7Schristosindependent <KBD>,</KBD> search sequence.  However, as implemented, the
739*946379e7Schristos<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
740*946379e7Schristosprefix) might also reinitialize the regular Emacs tags searching to the
741*946379e7Schristosfirst tags file, this reinitialization might be considered spurious.
742*946379e7Schristos
743*946379e7Schristos</P>
744*946379e7Schristos<P>
745*946379e7Schristos<A NAME="IDX136"></A>
746*946379e7Schristos<A NAME="IDX137"></A>
747*946379e7SchristosThe <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
748*946379e7Schristosrecently found string with the <SAMP>&lsquo;_&rsquo;</SAMP> keyword.  The <KBD>M-.</KBD>
749*946379e7Schristos(<CODE>po-select-mark-and-mark</CODE>) command will request that you type
750*946379e7Schristosone keyword from the minibuffer and use that keyword for marking
751*946379e7Schristosthe string.  Both commands will automatically create a new PO file
752*946379e7Schristosuntranslated entry for the string being marked, and make it the
753*946379e7Schristoscurrent entry (making it easy for you to immediately proceed to its
754*946379e7Schristostranslation, if you feel like doing it right away).  It is possible
755*946379e7Schristosthat the modifications made to the program source by <KBD>M-,</KBD> or
756*946379e7Schristos<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
757*946379e7Schristosto break and re-indent this line differently.  You may use the <KBD>O</KBD>
758*946379e7Schristoscommand from PO mode, or any other window changing command from
759*946379e7SchristosEmacs, to break out into the program source window, and do any
760*946379e7Schristosneeded adjustments.  You will have to use some regular Emacs command
761*946379e7Schristosto return the cursor to the PO file window, if you want command
762*946379e7Schristos<KBD>,</KBD> for the next string, say.
763*946379e7Schristos
764*946379e7Schristos</P>
765*946379e7Schristos<P>
766*946379e7SchristosThe <KBD>M-.</KBD> command has a few built-in speedups, so you do not
767*946379e7Schristoshave to explicitly type all keywords all the time.  The first such
768*946379e7Schristosspeedup is that you are presented with a <EM>preferred</EM> keyword,
769*946379e7Schristoswhich you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
770*946379e7SchristosThe second speedup is that you may type any non-ambiguous prefix of the
771*946379e7Schristoskeyword you really mean, and the command will complete it automatically
772*946379e7Schristosfor you.  This also means that PO mode has to <EM>know</EM> all
773*946379e7Schristosyour possible keywords, and that it will not accept mistyped keywords.
774*946379e7Schristos
775*946379e7Schristos</P>
776*946379e7Schristos<P>
777*946379e7SchristosIf you reply <KBD>?</KBD> to the keyword request, the command gives a
778*946379e7Schristoslist of all known keywords, from which you may choose.  When the
779*946379e7Schristoscommand is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
780*946379e7Schristosupdating any program source or PO file buffer, and does some simple
781*946379e7Schristoskeyword management instead.  In this case, the command asks for a
782*946379e7Schristoskeyword, written in full, which becomes a new allowed keyword for
783*946379e7Schristoslater <KBD>M-.</KBD> commands.  Moreover, this new keyword automatically
784*946379e7Schristosbecomes the <EM>preferred</EM> keyword for later commands.  By typing
785*946379e7Schristosan already known keyword in response to <KBD>C-u M-.</KBD>, one merely
786*946379e7Schristoschanges the <EM>preferred</EM> keyword and does nothing more.
787*946379e7Schristos
788*946379e7Schristos</P>
789*946379e7Schristos<P>
790*946379e7SchristosAll keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
791*946379e7Schristoswhen scanning for strings, and strings already marked by any of those
792*946379e7Schristosknown keywords are automatically skipped.  If many PO files are opened
793*946379e7Schristossimultaneously, each one has its own independent set of known keywords.
794*946379e7SchristosThere is no provision in PO mode, currently, for deleting a known
795*946379e7Schristoskeyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
796*946379e7Schristosit afresh.  When a PO file is newly brought up in an Emacs window, only
797*946379e7Schristos<SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> are known as keywords, and <SAMP>&lsquo;gettext&rsquo;</SAMP>
798*946379e7Schristosis preferred for the <KBD>M-.</KBD> command.  In fact, this is not useful to
799*946379e7Schristosprefer <SAMP>&lsquo;_&rsquo;</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
800*946379e7Schristos
801*946379e7Schristos</P>
802*946379e7Schristos
803*946379e7Schristos
804*946379e7Schristos<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">4.6  Special Comments preceding Keywords</A></H2>
805*946379e7Schristos
806*946379e7Schristos<P>
807*946379e7Schristos<A NAME="IDX138"></A>
808*946379e7SchristosIn C programs strings are often used within calls of functions from the
809*946379e7Schristos<CODE>printf</CODE> family.  The special thing about these format strings is
810*946379e7Schristosthat they can contain format specifiers introduced with <KBD>%</KBD>.  Assume
811*946379e7Schristoswe have the code
812*946379e7Schristos
813*946379e7Schristos</P>
814*946379e7Schristos
815*946379e7Schristos<PRE>
816*946379e7Schristosprintf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
817*946379e7Schristos</PRE>
818*946379e7Schristos
819*946379e7Schristos<P>
820*946379e7SchristosA possible German translation for the above string might be:
821*946379e7Schristos
822*946379e7Schristos</P>
823*946379e7Schristos
824*946379e7Schristos<PRE>
825*946379e7Schristos"%d Zeichen lang ist die Zeichenkette `%s'"
826*946379e7Schristos</PRE>
827*946379e7Schristos
828*946379e7Schristos<P>
829*946379e7SchristosA C programmer, even if he cannot speak German, will recognize that
830*946379e7Schristosthere is something wrong here.  The order of the two format specifiers
831*946379e7Schristosis changed but of course the arguments in the <CODE>printf</CODE> don't have.
832*946379e7SchristosThis will most probably lead to problems because now the length of the
833*946379e7Schristosstring is regarded as the address.
834*946379e7Schristos
835*946379e7Schristos</P>
836*946379e7Schristos<P>
837*946379e7SchristosTo prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
838*946379e7Schristostool can check statically whether the arguments in the original and the
839*946379e7Schristostranslation string match in type and number.  If this is not the case
840*946379e7Schristosand the <SAMP>&lsquo;-c&rsquo;</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE>
841*946379e7Schristoswill give an error and refuse to produce a MO file.  Thus consequent
842*946379e7Schristosuse of <SAMP>&lsquo;msgfmt -c&rsquo;</SAMP> will catch the error, so that it cannot cause
843*946379e7Schristoscause problems at runtime.
844*946379e7Schristos
845*946379e7Schristos</P>
846*946379e7Schristos<P>
847*946379e7SchristosIf the word order in the above German translation would be correct one
848*946379e7Schristoswould have to write
849*946379e7Schristos
850*946379e7Schristos</P>
851*946379e7Schristos
852*946379e7Schristos<PRE>
853*946379e7Schristos"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
854*946379e7Schristos</PRE>
855*946379e7Schristos
856*946379e7Schristos<P>
857*946379e7SchristosThe routines in <CODE>msgfmt</CODE> know about this special notation.
858*946379e7Schristos
859*946379e7Schristos</P>
860*946379e7Schristos<P>
861*946379e7SchristosBecause not all strings in a program must be format strings it is not
862*946379e7Schristosuseful for <CODE>msgfmt</CODE> to test all the strings in the <TT>&lsquo;.po&rsquo;</TT> file.
863*946379e7SchristosThis might cause problems because the string might contain what looks
864*946379e7Schristoslike a format specifier, but the string is not used in <CODE>printf</CODE>.
865*946379e7Schristos
866*946379e7Schristos</P>
867*946379e7Schristos<P>
868*946379e7SchristosTherefore the <CODE>xgettext</CODE> adds a special tag to those messages it
869*946379e7Schristosthinks might be a format string.  There is no absolute rule for this,
870*946379e7Schristosonly a heuristic.  In the <TT>&lsquo;.po&rsquo;</TT> file the entry is marked using the
871*946379e7Schristos<CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_3.html#SEC10">3  The Format of PO Files</A>).
872*946379e7Schristos
873*946379e7Schristos</P>
874*946379e7Schristos<P>
875*946379e7Schristos<A NAME="IDX139"></A>
876*946379e7Schristos<A NAME="IDX140"></A>
877*946379e7SchristosThe careful reader now might say that this again can cause problems.
878*946379e7SchristosThe heuristic might guess it wrong.  This is true and therefore
879*946379e7Schristos<CODE>xgettext</CODE> knows about a special kind of comment which lets
880*946379e7Schristosthe programmer take over the decision.  If in the same line as or
881*946379e7Schristosthe immediately preceding line to the <CODE>gettext</CODE> keyword
882*946379e7Schristosthe <CODE>xgettext</CODE> program finds a comment containing the words
883*946379e7Schristos<CODE>xgettext:c-format</CODE>, it will mark the string in any case with
884*946379e7Schristosthe <CODE>c-format</CODE> flag.  This kind of comment should be used when
885*946379e7Schristos<CODE>xgettext</CODE> does not recognize the string as a format string but
886*946379e7Schristosit really is one and it should be tested.  Please note that when the
887*946379e7Schristoscomment is in the same line as the <CODE>gettext</CODE> keyword, it must be
888*946379e7Schristosbefore the string to be translated.
889*946379e7Schristos
890*946379e7Schristos</P>
891*946379e7Schristos<P>
892*946379e7SchristosThis situation happens quite often.  The <CODE>printf</CODE> function is often
893*946379e7Schristoscalled with strings which do not contain a format specifier.  Of course
894*946379e7Schristosone would normally use <CODE>fputs</CODE> but it does happen.  In this case
895*946379e7Schristos<CODE>xgettext</CODE> does not recognize this as a format string but what
896*946379e7Schristoshappens if the translation introduces a valid format specifier?  The
897*946379e7Schristos<CODE>printf</CODE> function will try to access one of the parameters but none
898*946379e7Schristosexists because the original code does not pass any parameters.
899*946379e7Schristos
900*946379e7Schristos</P>
901*946379e7Schristos<P>
902*946379e7Schristos<CODE>xgettext</CODE> of course could make a wrong decision the other way
903*946379e7Schristosround, i.e. a string marked as a format string actually is not a format
904*946379e7Schristosstring.  In this case the <CODE>msgfmt</CODE> might give too many warnings and
905*946379e7Schristoswould prevent translating the <TT>&lsquo;.po&rsquo;</TT> file.  The method to prevent
906*946379e7Schristosthis wrong decision is similar to the one used above, only the comment
907*946379e7Schristosto use must contain the string <CODE>xgettext:no-c-format</CODE>.
908*946379e7Schristos
909*946379e7Schristos</P>
910*946379e7Schristos<P>
911*946379e7SchristosIf a string is marked with <CODE>c-format</CODE> and this is not correct the
912*946379e7Schristosuser can find out who is responsible for the decision.  See
913*946379e7Schristossection <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be
914*946379e7Schristosused for solving this problem.
915*946379e7Schristos
916*946379e7Schristos</P>
917*946379e7Schristos
918*946379e7Schristos
919*946379e7Schristos<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">4.7  Special Cases of Translatable Strings</A></H2>
920*946379e7Schristos
921*946379e7Schristos<P>
922*946379e7Schristos<A NAME="IDX141"></A>
923*946379e7SchristosThe attentive reader might now point out that it is not always possible
924*946379e7Schristosto mark translatable string with <CODE>gettext</CODE> or something like this.
925*946379e7SchristosConsider the following case:
926*946379e7Schristos
927*946379e7Schristos</P>
928*946379e7Schristos
929*946379e7Schristos<PRE>
930*946379e7Schristos{
931*946379e7Schristos  static const char *messages[] = {
932*946379e7Schristos    "some very meaningful message",
933*946379e7Schristos    "and another one"
934*946379e7Schristos  };
935*946379e7Schristos  const char *string;
936*946379e7Schristos  ...
937*946379e7Schristos  string
938*946379e7Schristos    = index &#62; 1 ? "a default message" : messages[index];
939*946379e7Schristos
940*946379e7Schristos  fputs (string);
941*946379e7Schristos  ...
942*946379e7Schristos}
943*946379e7Schristos</PRE>
944*946379e7Schristos
945*946379e7Schristos<P>
946*946379e7SchristosWhile it is no problem to mark the string <CODE>"a default message"</CODE> it
947*946379e7Schristosis not possible to mark the string initializers for <CODE>messages</CODE>.
948*946379e7SchristosWhat is to be done?  We have to fulfill two tasks.  First we have to mark the
949*946379e7Schristosstrings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A>)
950*946379e7Schristoscan find them, and second we have to translate the string at runtime
951*946379e7Schristosbefore printing them.
952*946379e7Schristos
953*946379e7Schristos</P>
954*946379e7Schristos<P>
955*946379e7SchristosThe first task can be fulfilled by creating a new keyword, which names a
956*946379e7Schristosno-op.  For the second we have to mark all access points to a string
957*946379e7Schristosfrom the array.  So one solution can look like this:
958*946379e7Schristos
959*946379e7Schristos</P>
960*946379e7Schristos
961*946379e7Schristos<PRE>
962*946379e7Schristos#define gettext_noop(String) String
963*946379e7Schristos
964*946379e7Schristos{
965*946379e7Schristos  static const char *messages[] = {
966*946379e7Schristos    gettext_noop ("some very meaningful message"),
967*946379e7Schristos    gettext_noop ("and another one")
968*946379e7Schristos  };
969*946379e7Schristos  const char *string;
970*946379e7Schristos  ...
971*946379e7Schristos  string
972*946379e7Schristos    = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
973*946379e7Schristos
974*946379e7Schristos  fputs (string);
975*946379e7Schristos  ...
976*946379e7Schristos}
977*946379e7Schristos</PRE>
978*946379e7Schristos
979*946379e7Schristos<P>
980*946379e7SchristosPlease convince yourself that the string which is written by
981*946379e7Schristos<CODE>fputs</CODE> is translated in any case.  How to get <CODE>xgettext</CODE> know
982*946379e7Schristosthe additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A>.
983*946379e7Schristos
984*946379e7Schristos</P>
985*946379e7Schristos<P>
986*946379e7SchristosThe above is of course not the only solution.  You could also come along
987*946379e7Schristoswith the following one:
988*946379e7Schristos
989*946379e7Schristos</P>
990*946379e7Schristos
991*946379e7Schristos<PRE>
992*946379e7Schristos#define gettext_noop(String) String
993*946379e7Schristos
994*946379e7Schristos{
995*946379e7Schristos  static const char *messages[] = {
996*946379e7Schristos    gettext_noop ("some very meaningful message",
997*946379e7Schristos    gettext_noop ("and another one")
998*946379e7Schristos  };
999*946379e7Schristos  const char *string;
1000*946379e7Schristos  ...
1001*946379e7Schristos  string
1002*946379e7Schristos    = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
1003*946379e7Schristos
1004*946379e7Schristos  fputs (gettext (string));
1005*946379e7Schristos  ...
1006*946379e7Schristos}
1007*946379e7Schristos</PRE>
1008*946379e7Schristos
1009*946379e7Schristos<P>
1010*946379e7SchristosBut this has a drawback.  The programmer has to take care that
1011*946379e7Schristoshe uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
1012*946379e7SchristosA use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
1013*946379e7Schristos
1014*946379e7Schristos</P>
1015*946379e7Schristos<P>
1016*946379e7SchristosOne advantage is that you need not make control flow analysis to make
1017*946379e7Schristossure the output is really translated in any case.  But this analysis is
1018*946379e7Schristosgenerally not very difficult.  If it should be in any situation you can
1019*946379e7Schristosuse this second method in this situation.
1020*946379e7Schristos
1021*946379e7Schristos</P>
1022*946379e7Schristos
1023*946379e7Schristos
1024*946379e7Schristos<H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4.8  Marking Proper Names for Translation</A></H2>
1025*946379e7Schristos
1026*946379e7Schristos<P>
1027*946379e7SchristosShould names of persons, cities, locations etc. be marked for translation
1028*946379e7Schristosor not?  People who only know languages that can be written with Latin
1029*946379e7Schristosletters (English, Spanish, French, German, etc.) are tempted to say “no”,
1030*946379e7Schristosbecause names usually do not change when transported between these languages.
1031*946379e7SchristosHowever, in general when translating from one script to another, names
1032*946379e7Schristosare translated too, usually phonetically or by transliteration.  For
1033*946379e7Schristosexample, Russian or Greek names are converted to the Latin alphabet when
1034*946379e7Schristosbeing translated to English, and English or French names are converted
1035*946379e7Schristosto the Katakana script when being translated to Japanese.  This is
1036*946379e7Schristosnecessary because the speakers of the target language in general cannot
1037*946379e7Schristosread the script the name is originally written in.
1038*946379e7Schristos
1039*946379e7Schristos</P>
1040*946379e7Schristos<P>
1041*946379e7SchristosAs a programmer, you should therefore make sure that names are marked
1042*946379e7Schristosfor translation, with a special comment telling the translators that it
1043*946379e7Schristosis a proper name and how to pronounce it.  Like this:
1044*946379e7Schristos
1045*946379e7Schristos</P>
1046*946379e7Schristos
1047*946379e7Schristos<PRE>
1048*946379e7Schristosprintf (_("Written by %s.\n"),
1049*946379e7Schristos        /* TRANSLATORS: This is a proper name.  See the gettext
1050*946379e7Schristos           manual, section Names.  Note this is actually a non-ASCII
1051*946379e7Schristos           name: The first name is (with Unicode escapes)
1052*946379e7Schristos           "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1053*946379e7Schristos           Pronunciation is like "fraa-swa pee-nar".  */
1054*946379e7Schristos        _("Francois Pinard"));
1055*946379e7Schristos</PRE>
1056*946379e7Schristos
1057*946379e7Schristos<P>
1058*946379e7SchristosAs a translator, you should use some care when translating names, because
1059*946379e7Schristosit is frustrating if people see their names mutilated or distorted.  If
1060*946379e7Schristosyour language uses the Latin script, all you need to do is to reproduce
1061*946379e7Schristosthe name as perfectly as you can within the usual character set of your
1062*946379e7Schristoslanguage.  In this particular case, this means to provide a translation
1063*946379e7Schristoscontaining the c-cedilla character.  If your language uses a different
1064*946379e7Schristosscript and the people speaking it don't usually read Latin words, it means
1065*946379e7Schristostransliteration; but you should still give, in parentheses, the original
1066*946379e7Schristoswriting of the name -- for the sake of the people that do read the Latin
1067*946379e7Schristosscript.  Here is an example, using Greek as the target script:
1068*946379e7Schristos
1069*946379e7Schristos</P>
1070*946379e7Schristos
1071*946379e7Schristos<PRE>
1072*946379e7Schristos#. This is a proper name.  See the gettext
1073*946379e7Schristos#. manual, section Names.  Note this is actually a non-ASCII
1074*946379e7Schristos#. name: The first name is (with Unicode escapes)
1075*946379e7Schristos#. "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
1076*946379e7Schristos#. Pronunciation is like "fraa-swa pee-nar".
1077*946379e7Schristosmsgid "Francois Pinard"
1078*946379e7Schristosmsgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
1079*946379e7Schristos       " (Francois Pinard)"
1080*946379e7Schristos</PRE>
1081*946379e7Schristos
1082*946379e7Schristos<P>
1083*946379e7SchristosBecause translation of names is such a sensitive domain, it is a good
1084*946379e7Schristosidea to test your translation before submitting it.
1085*946379e7Schristos
1086*946379e7Schristos</P>
1087*946379e7Schristos<P>
1088*946379e7SchristosThe translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A>
1089*946379e7Schristoshas set up a POT file and translation domain consisting of program author
1090*946379e7Schristosnames, with better facilities for the translator than those presented here.
1091*946379e7SchristosNamely, there the original name is written directly in Unicode (rather
1092*946379e7Schristosthan with Unicode escapes or HTML entities), and the pronunciation is
1093*946379e7Schristosdenoted using the International Phonetic Alphabet (see
1094*946379e7Schristos<A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>).
1095*946379e7Schristos
1096*946379e7Schristos</P>
1097*946379e7Schristos<P>
1098*946379e7SchristosHowever, we don't recommend this approach for all POT files in all packages,
1099*946379e7Schristosbecause this would force translators to use PO files in UTF-8 encoding,
1100*946379e7Schristoswhich is - in the current state of software (as of 2003) - a major hassle
1101*946379e7Schristosfor translators using GNU Emacs or XEmacs with po-mode.
1102*946379e7Schristos
1103*946379e7Schristos</P>
1104*946379e7Schristos
1105*946379e7Schristos
1106*946379e7Schristos<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.9  Preparing Library Sources</A></H2>
1107*946379e7Schristos
1108*946379e7Schristos<P>
1109*946379e7SchristosWhen you are preparing a library, not a program, for the use of
1110*946379e7Schristos<CODE>gettext</CODE>, only a few details are different.  Here we assume that
1111*946379e7Schristosthe library has a translation domain and a POT file of its own.  (If
1112*946379e7Schristosit uses the translation domain and POT file of the main program, then
1113*946379e7Schristosthe previous sections apply without changes.)
1114*946379e7Schristos
1115*946379e7Schristos</P>
1116*946379e7Schristos
1117*946379e7Schristos<OL>
1118*946379e7Schristos<LI>
1119*946379e7Schristos
1120*946379e7SchristosThe library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>.  It's the
1121*946379e7Schristosresponsibility of the main program to set the locale.  The library's
1122*946379e7Schristosdocumentation should mention this fact, so that developers of programs
1123*946379e7Schristosusing the library are aware of it.
1124*946379e7Schristos
1125*946379e7Schristos<LI>
1126*946379e7Schristos
1127*946379e7SchristosThe library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it
1128*946379e7Schristoswould interfere with the text domain set by the main program.
1129*946379e7Schristos
1130*946379e7Schristos<LI>
1131*946379e7Schristos
1132*946379e7SchristosThe initialization code for a program was
1133*946379e7Schristos
1134*946379e7Schristos
1135*946379e7Schristos<PRE>
1136*946379e7Schristos  setlocale (LC_ALL, "");
1137*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
1138*946379e7Schristos  textdomain (PACKAGE);
1139*946379e7Schristos</PRE>
1140*946379e7Schristos
1141*946379e7SchristosFor a library it is reduced to
1142*946379e7Schristos
1143*946379e7Schristos
1144*946379e7Schristos<PRE>
1145*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
1146*946379e7Schristos</PRE>
1147*946379e7Schristos
1148*946379e7SchristosIf your library's API doesn't already have an initialization function,
1149*946379e7Schristosyou need to create one, containing at least the <CODE>bindtextdomain</CODE>
1150*946379e7Schristosinvocation.  However, you usually don't need to export and document this
1151*946379e7Schristosinitialization function: It is sufficient that all entry points of the
1152*946379e7Schristoslibrary call the initialization function if it hasn't been called before.
1153*946379e7SchristosThe typical idiom used to achieve this is a static boolean variable that
1154*946379e7Schristosindicates whether the initialization function has been called. Like this:
1155*946379e7Schristos
1156*946379e7Schristos
1157*946379e7Schristos<PRE>
1158*946379e7Schristosstatic bool libfoo_initialized;
1159*946379e7Schristos
1160*946379e7Schristosstatic void
1161*946379e7Schristoslibfoo_initialize (void)
1162*946379e7Schristos{
1163*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
1164*946379e7Schristos  libfoo_initialized = true;
1165*946379e7Schristos}
1166*946379e7Schristos
1167*946379e7Schristos/* This function is part of the exported API.  */
1168*946379e7Schristosstruct foo *
1169*946379e7Schristoscreate_foo (...)
1170*946379e7Schristos{
1171*946379e7Schristos  /* Must ensure the initialization is performed.  */
1172*946379e7Schristos  if (!libfoo_initialized)
1173*946379e7Schristos    libfoo_initialize ();
1174*946379e7Schristos  ...
1175*946379e7Schristos}
1176*946379e7Schristos
1177*946379e7Schristos/* This function is part of the exported API.  The argument must be
1178*946379e7Schristos   non-NULL and have been created through create_foo().  */
1179*946379e7Schristosint
1180*946379e7Schristosfoo_refcount (struct foo *argument)
1181*946379e7Schristos{
1182*946379e7Schristos  /* No need to invoke the initialization function here, because
1183*946379e7Schristos     create_foo() must already have been called before.  */
1184*946379e7Schristos  ...
1185*946379e7Schristos}
1186*946379e7Schristos</PRE>
1187*946379e7Schristos
1188*946379e7Schristos<LI>
1189*946379e7Schristos
1190*946379e7SchristosThe usual declaration of the <SAMP>&lsquo;_&rsquo;</SAMP> macro in each source file was
1191*946379e7Schristos
1192*946379e7Schristos
1193*946379e7Schristos<PRE>
1194*946379e7Schristos#include &#60;libintl.h&#62;
1195*946379e7Schristos#define _(String) gettext (String)
1196*946379e7Schristos</PRE>
1197*946379e7Schristos
1198*946379e7Schristosfor a program.  For a library, which has its own translation domain,
1199*946379e7Schristosit reads like this:
1200*946379e7Schristos
1201*946379e7Schristos
1202*946379e7Schristos<PRE>
1203*946379e7Schristos#include &#60;libintl.h&#62;
1204*946379e7Schristos#define _(String) dgettext (PACKAGE, String)
1205*946379e7Schristos</PRE>
1206*946379e7Schristos
1207*946379e7SchristosIn other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>.
1208*946379e7SchristosSimilarly, the <CODE>dngettext</CODE> function should be used in place of the
1209*946379e7Schristos<CODE>ngettext</CODE> function.
1210*946379e7Schristos</OL>
1211*946379e7Schristos
1212*946379e7Schristos<P><HR><P>
1213*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1214*946379e7Schristos</BODY>
1215*946379e7Schristos</HTML>
1216