xref: /netbsd-src/external/gpl2/gettext/dist/gettext-tools/doc/gettext_11.html (revision 946379e7b37692fc43f68eb0d1c10daa0a7f3b6c)
1*946379e7Schristos<HTML>
2*946379e7Schristos<HEAD>
3*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b
4*946379e7Schristos     from gettext.texi on 27 November 2006 -->
5*946379e7Schristos
6*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7*946379e7Schristos<TITLE>GNU gettext utilities - 11  The Programmer's View</TITLE>
8*946379e7Schristos</HEAD>
9*946379e7Schristos<BODY>
10*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11*946379e7Schristos<P><HR><P>
12*946379e7Schristos
13*946379e7Schristos
14*946379e7Schristos<H1><A NAME="SEC164" HREF="gettext_toc.html#TOC164">11  The Programmer's View</A></H1>
15*946379e7Schristos
16*946379e7Schristos<P>
17*946379e7SchristosOne aim of the current message catalog implementation provided by
18*946379e7SchristosGNU <CODE>gettext</CODE> was to use the system's message catalog handling, if the
19*946379e7Schristosinstaller wishes to do so.  So we perhaps should first take a look at
20*946379e7Schristosthe solutions we know about.  The people in the POSIX committee did not
21*946379e7Schristosmanage to agree on one of the semi-official standards which we'll
22*946379e7Schristosdescribe below.  In fact they couldn't agree on anything, so they decided
23*946379e7Schristosonly to include an example of an interface.  The major Unix vendors
24*946379e7Schristosare split in the usage of the two most important specifications: X/Open's
25*946379e7Schristoscatgets vs. Uniforum's gettext interface.  We'll describe them both and
26*946379e7Schristoslater explain our solution of this dilemma.
27*946379e7Schristos
28*946379e7Schristos</P>
29*946379e7Schristos
30*946379e7Schristos
31*946379e7Schristos
32*946379e7Schristos<H2><A NAME="SEC165" HREF="gettext_toc.html#TOC165">11.1  About <CODE>catgets</CODE></A></H2>
33*946379e7Schristos<P>
34*946379e7Schristos<A NAME="IDX1006"></A>
35*946379e7Schristos
36*946379e7Schristos</P>
37*946379e7Schristos<P>
38*946379e7SchristosThe <CODE>catgets</CODE> implementation is defined in the X/Open Portability
39*946379e7SchristosGuide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
40*946379e7Schristosprocess of creating this standard seemed to be too slow for some of
41*946379e7Schristosthe Unix vendors so they created their implementations on preliminary
42*946379e7Schristosversions of the standard.  Of course this leads again to problems while
43*946379e7Schristoswriting platform independent programs: even the usage of <CODE>catgets</CODE>
44*946379e7Schristosdoes not guarantee a unique interface.
45*946379e7Schristos
46*946379e7Schristos</P>
47*946379e7Schristos<P>
48*946379e7SchristosAnother, personal comment on this that only a bunch of committee members
49*946379e7Schristoscould have made this interface.  They never really tried to program
50*946379e7Schristosusing this interface.  It is a fast, memory-saving implementation, an
51*946379e7Schristosuser can happily live with it.  But programmers hate it (at least I and
52*946379e7Schristossome others do...)
53*946379e7Schristos
54*946379e7Schristos</P>
55*946379e7Schristos<P>
56*946379e7SchristosBut we must not forget one point: after all the trouble with transferring
57*946379e7Schristosthe rights on Unix(tm) they at last came to X/Open, the very same who
58*946379e7Schristospublished this specification.  This leads me to making the prediction
59*946379e7Schristosthat this interface will be in future Unix standards (e.g. Spec1170) and
60*946379e7Schristostherefore part of all Unix implementation (implementations, which are
61*946379e7Schristos<EM>allowed</EM> to wear this name).
62*946379e7Schristos
63*946379e7Schristos</P>
64*946379e7Schristos
65*946379e7Schristos
66*946379e7Schristos
67*946379e7Schristos<H3><A NAME="SEC166" HREF="gettext_toc.html#TOC166">11.1.1  The Interface</A></H3>
68*946379e7Schristos<P>
69*946379e7Schristos<A NAME="IDX1007"></A>
70*946379e7Schristos
71*946379e7Schristos</P>
72*946379e7Schristos<P>
73*946379e7SchristosThe interface to the <CODE>catgets</CODE> implementation consists of three
74*946379e7Schristosfunctions which correspond to those used in file access: <CODE>catopen</CODE>
75*946379e7Schristosto open the catalog for using, <CODE>catgets</CODE> for accessing the message
76*946379e7Schristostables, and <CODE>catclose</CODE> for closing after work is done.  Prototypes
77*946379e7Schristosfor the functions and the needed definitions are in the
78*946379e7Schristos<CODE>&#60;nl_types.h&#62;</CODE> header file.
79*946379e7Schristos
80*946379e7Schristos</P>
81*946379e7Schristos<P>
82*946379e7Schristos<A NAME="IDX1008"></A>
83*946379e7Schristos<CODE>catopen</CODE> is used like in this:
84*946379e7Schristos
85*946379e7Schristos</P>
86*946379e7Schristos
87*946379e7Schristos<PRE>
88*946379e7Schristosnl_catd catd = catopen ("catalog_name", 0);
89*946379e7Schristos</PRE>
90*946379e7Schristos
91*946379e7Schristos<P>
92*946379e7SchristosThe function takes as the argument the name of the catalog.  This usual
93*946379e7Schristosrefers to the name of the program or the package.  The second parameter
94*946379e7Schristosis not further specified in the standard.  I don't even know whether it
95*946379e7Schristosis implemented consistently among various systems.  So the common advice
96*946379e7Schristosis to use <CODE>0</CODE> as the value.  The return value is a handle to the
97*946379e7Schristosmessage catalog, equivalent to handles to file returned by <CODE>open</CODE>.
98*946379e7Schristos
99*946379e7Schristos</P>
100*946379e7Schristos<P>
101*946379e7Schristos<A NAME="IDX1009"></A>
102*946379e7SchristosThis handle is of course used in the <CODE>catgets</CODE> function which can
103*946379e7Schristosbe used like this:
104*946379e7Schristos
105*946379e7Schristos</P>
106*946379e7Schristos
107*946379e7Schristos<PRE>
108*946379e7Schristoschar *translation = catgets (catd, set_no, msg_id, "original string");
109*946379e7Schristos</PRE>
110*946379e7Schristos
111*946379e7Schristos<P>
112*946379e7SchristosThe first parameter is this catalog descriptor.  The second parameter
113*946379e7Schristosspecifies the set of messages in this catalog, in which the message
114*946379e7Schristosdescribed by <CODE>msg_id</CODE> is obtained.  <CODE>catgets</CODE> therefore uses a
115*946379e7Schristosthree-stage addressing:
116*946379e7Schristos
117*946379e7Schristos</P>
118*946379e7Schristos
119*946379e7Schristos<PRE>
120*946379e7Schristoscatalog name => set number => message ID => translation
121*946379e7Schristos</PRE>
122*946379e7Schristos
123*946379e7Schristos<P>
124*946379e7SchristosThe fourth argument is not used to address the translation.  It is given
125*946379e7Schristosas a default value in case when one of the addressing stages fail.  One
126*946379e7Schristosimportant thing to remember is that although the return type of catgets
127*946379e7Schristosis <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed.  It
128*946379e7Schristosshould better be <CODE>const char *</CODE>, but the standard is published in
129*946379e7Schristos1988, one year before ANSI C.
130*946379e7Schristos
131*946379e7Schristos</P>
132*946379e7Schristos<P>
133*946379e7Schristos<A NAME="IDX1010"></A>
134*946379e7SchristosThe last of these functions is used and behaves as expected:
135*946379e7Schristos
136*946379e7Schristos</P>
137*946379e7Schristos
138*946379e7Schristos<PRE>
139*946379e7Schristoscatclose (catd);
140*946379e7Schristos</PRE>
141*946379e7Schristos
142*946379e7Schristos<P>
143*946379e7SchristosAfter this no <CODE>catgets</CODE> call using the descriptor is legal anymore.
144*946379e7Schristos
145*946379e7Schristos</P>
146*946379e7Schristos
147*946379e7Schristos
148*946379e7Schristos<H3><A NAME="SEC167" HREF="gettext_toc.html#TOC167">11.1.2  Problems with the <CODE>catgets</CODE> Interface?!</A></H3>
149*946379e7Schristos<P>
150*946379e7Schristos<A NAME="IDX1011"></A>
151*946379e7Schristos
152*946379e7Schristos</P>
153*946379e7Schristos<P>
154*946379e7SchristosNow that this description seemed to be really easy -- where are the
155*946379e7Schristosproblems we speak of?  In fact the interface could be used in a
156*946379e7Schristosreasonable way, but constructing the message catalogs is a pain.  The
157*946379e7Schristosreason for this lies in the third argument of <CODE>catgets</CODE>: the unique
158*946379e7Schristosmessage ID.  This has to be a numeric value for all messages in a single
159*946379e7Schristosset.  Perhaps you could imagine the problems keeping such a list while
160*946379e7Schristoschanging the source code.  Add a new message here, remove one there.  Of
161*946379e7Schristoscourse there have been developed a lot of tools helping to organize this
162*946379e7Schristoschaos but one as the other fails in one aspect or the other.  We don't
163*946379e7Schristoswant to say that the other approach has no problems but they are far
164*946379e7Schristosmore easy to manage.
165*946379e7Schristos
166*946379e7Schristos</P>
167*946379e7Schristos
168*946379e7Schristos
169*946379e7Schristos<H2><A NAME="SEC168" HREF="gettext_toc.html#TOC168">11.2  About <CODE>gettext</CODE></A></H2>
170*946379e7Schristos<P>
171*946379e7Schristos<A NAME="IDX1012"></A>
172*946379e7Schristos
173*946379e7Schristos</P>
174*946379e7Schristos<P>
175*946379e7SchristosThe definition of the <CODE>gettext</CODE> interface comes from a Uniforum
176*946379e7Schristosproposal.  It was submitted there by Sun, who had implemented the
177*946379e7Schristos<CODE>gettext</CODE> function in SunOS 4, around 1990.  Nowadays, the
178*946379e7Schristos<CODE>gettext</CODE> interface is specified by the OpenI18N standard.
179*946379e7Schristos
180*946379e7Schristos</P>
181*946379e7Schristos<P>
182*946379e7SchristosThe main point about this solution is that it does not follow the
183*946379e7Schristosmethod of normal file handling (open-use-close) and that it does not
184*946379e7Schristosburden the programmer with so many tasks, especially the unique key handling.
185*946379e7SchristosOf course here also a unique key is needed, but this key is the message
186*946379e7Schristositself (how long or short it is).  See section <A HREF="gettext_11.html#SEC176">11.3  Comparing the Two Interfaces</A> for a more
187*946379e7Schristosdetailed comparison of the two methods.
188*946379e7Schristos
189*946379e7Schristos</P>
190*946379e7Schristos<P>
191*946379e7SchristosThe following section contains a rather detailed description of the
192*946379e7Schristosinterface.  We make it that detailed because this is the interface
193*946379e7Schristoswe chose for the GNU <CODE>gettext</CODE> Library.  Programmers interested
194*946379e7Schristosin using this library will be interested in this description.
195*946379e7Schristos
196*946379e7Schristos</P>
197*946379e7Schristos
198*946379e7Schristos
199*946379e7Schristos
200*946379e7Schristos<H3><A NAME="SEC169" HREF="gettext_toc.html#TOC169">11.2.1  The Interface</A></H3>
201*946379e7Schristos<P>
202*946379e7Schristos<A NAME="IDX1013"></A>
203*946379e7Schristos
204*946379e7Schristos</P>
205*946379e7Schristos<P>
206*946379e7SchristosThe minimal functionality an interface must have is a) to select a
207*946379e7Schristosdomain the strings are coming from (a single domain for all programs is
208*946379e7Schristosnot reasonable because its construction and maintenance is difficult,
209*946379e7Schristosperhaps impossible) and b) to access a string in a selected domain.
210*946379e7Schristos
211*946379e7Schristos</P>
212*946379e7Schristos<P>
213*946379e7SchristosThis is principally the description of the <CODE>gettext</CODE> interface.  It
214*946379e7Schristoshas a global domain which unqualified usages reference.  Of course this
215*946379e7Schristosdomain is selectable by the user.
216*946379e7Schristos
217*946379e7Schristos</P>
218*946379e7Schristos
219*946379e7Schristos<PRE>
220*946379e7Schristoschar *textdomain (const char *domain_name);
221*946379e7Schristos</PRE>
222*946379e7Schristos
223*946379e7Schristos<P>
224*946379e7SchristosThis provides the possibility to change or query the current status of
225*946379e7Schristosthe current global domain of the <CODE>LC_MESSAGE</CODE> category.  The
226*946379e7Schristosargument is a null-terminated string, whose characters must be legal in
227*946379e7Schristosthe use in filenames.  If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,
228*946379e7Schristosthe function returns the current value.  If no value has been set
229*946379e7Schristosbefore, the name of the default domain is returned: <EM>messages</EM>.
230*946379e7SchristosPlease note that although the return value of <CODE>textdomain</CODE> is of
231*946379e7Schristostype <CODE>char *</CODE> no changing is allowed.  It is also important to know
232*946379e7Schristosthat no checks of the availability are made.  If the name is not
233*946379e7Schristosavailable you will see this by the fact that no translations are provided.
234*946379e7Schristos
235*946379e7Schristos</P>
236*946379e7Schristos<P>
237*946379e7SchristosTo use a domain set by <CODE>textdomain</CODE> the function
238*946379e7Schristos
239*946379e7Schristos</P>
240*946379e7Schristos
241*946379e7Schristos<PRE>
242*946379e7Schristoschar *gettext (const char *msgid);
243*946379e7Schristos</PRE>
244*946379e7Schristos
245*946379e7Schristos<P>
246*946379e7Schristosis to be used.  This is the simplest reasonable form one can imagine.
247*946379e7SchristosThe translation of the string <VAR>msgid</VAR> is returned if it is available
248*946379e7Schristosin the current domain.  If it is not available, the argument itself is
249*946379e7Schristosreturned.  If the argument is <CODE>NULL</CODE> the result is undefined.
250*946379e7Schristos
251*946379e7Schristos</P>
252*946379e7Schristos<P>
253*946379e7SchristosOne thing which should come into mind is that no explicit dependency to
254*946379e7Schristosthe used domain is given.  The current value of the domain for the
255*946379e7Schristos<CODE>LC_MESSAGES</CODE> locale is used.  If this changes between two
256*946379e7Schristosexecutions of the same <CODE>gettext</CODE> call in the program, both calls
257*946379e7Schristosreference a different message catalog.
258*946379e7Schristos
259*946379e7Schristos</P>
260*946379e7Schristos<P>
261*946379e7SchristosFor the easiest case, which is normally used in internationalized
262*946379e7Schristospackages, once at the beginning of execution a call to <CODE>textdomain</CODE>
263*946379e7Schristosis issued, setting the domain to a unique name, normally the package
264*946379e7Schristosname.  In the following code all strings which have to be translated are
265*946379e7Schristosfiltered through the gettext function.  That's all, the package speaks
266*946379e7Schristosyour language.
267*946379e7Schristos
268*946379e7Schristos</P>
269*946379e7Schristos
270*946379e7Schristos
271*946379e7Schristos<H3><A NAME="SEC170" HREF="gettext_toc.html#TOC170">11.2.2  Solving Ambiguities</A></H3>
272*946379e7Schristos<P>
273*946379e7Schristos<A NAME="IDX1014"></A>
274*946379e7Schristos<A NAME="IDX1015"></A>
275*946379e7Schristos<A NAME="IDX1016"></A>
276*946379e7Schristos
277*946379e7Schristos</P>
278*946379e7Schristos<P>
279*946379e7SchristosWhile this single name domain works well for most applications there
280*946379e7Schristosmight be the need to get translations from more than one domain.  Of
281*946379e7Schristoscourse one could switch between different domains with calls to
282*946379e7Schristos<CODE>textdomain</CODE>, but this is really not convenient nor is it fast.  A
283*946379e7Schristospossible situation could be one case subject to discussion during this
284*946379e7Schristoswriting:  all
285*946379e7Schristoserror messages of functions in the set of common used functions should
286*946379e7Schristosgo into a separate domain <CODE>error</CODE>.  By this mean we would only need
287*946379e7Schristosto translate them once.
288*946379e7SchristosAnother case are messages from a library, as these <EM>have</EM> to be
289*946379e7Schristosindependent of the current domain set by the application.
290*946379e7Schristos
291*946379e7Schristos</P>
292*946379e7Schristos<P>
293*946379e7SchristosFor this reasons there are two more functions to retrieve strings:
294*946379e7Schristos
295*946379e7Schristos</P>
296*946379e7Schristos
297*946379e7Schristos<PRE>
298*946379e7Schristoschar *dgettext (const char *domain_name, const char *msgid);
299*946379e7Schristoschar *dcgettext (const char *domain_name, const char *msgid,
300*946379e7Schristos                 int category);
301*946379e7Schristos</PRE>
302*946379e7Schristos
303*946379e7Schristos<P>
304*946379e7SchristosBoth take an additional argument at the first place, which corresponds
305*946379e7Schristosto the argument of <CODE>textdomain</CODE>.  The third argument of
306*946379e7Schristos<CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>.
307*946379e7SchristosBut I really don't know where this can be useful.  If the
308*946379e7Schristos<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside
309*946379e7Schristosthe known ones, the result is undefined.  It should also be noted that
310*946379e7Schristosthis function is not part of the second known implementation of this
311*946379e7Schristosfunction family, the one found in Solaris.
312*946379e7Schristos
313*946379e7Schristos</P>
314*946379e7Schristos<P>
315*946379e7SchristosA second ambiguity can arise by the fact, that perhaps more than one
316*946379e7Schristosdomain has the same name.  This can be solved by specifying where the
317*946379e7Schristosneeded message catalog files can be found.
318*946379e7Schristos
319*946379e7Schristos</P>
320*946379e7Schristos
321*946379e7Schristos<PRE>
322*946379e7Schristoschar *bindtextdomain (const char *domain_name,
323*946379e7Schristos                      const char *dir_name);
324*946379e7Schristos</PRE>
325*946379e7Schristos
326*946379e7Schristos<P>
327*946379e7SchristosCalling this function binds the given domain to a file in the specified
328*946379e7Schristosdirectory (how this file is determined follows below).  Especially a
329*946379e7Schristosfile in the systems default place is not favored against the specified
330*946379e7Schristosfile anymore (as it would be by solely using <CODE>textdomain</CODE>).  A
331*946379e7Schristos<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding
332*946379e7Schristosassociated with <VAR>domain_name</VAR>.  If <VAR>domain_name</VAR> itself is
333*946379e7Schristos<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned.  Here
334*946379e7Schristosagain as for all the other functions is true that none of the return
335*946379e7Schristosvalue must be changed!
336*946379e7Schristos
337*946379e7Schristos</P>
338*946379e7Schristos<P>
339*946379e7SchristosIt is important to remember that relative path names for the
340*946379e7Schristos<VAR>dir_name</VAR> parameter can be trouble.  Since the path is always
341*946379e7Schristoscomputed relative to the current directory different results will be
342*946379e7Schristosachieved when the program executes a <CODE>chdir</CODE> command.  Relative
343*946379e7Schristospaths should always be avoided to avoid dependencies and
344*946379e7Schristosunreliabilities.
345*946379e7Schristos
346*946379e7Schristos</P>
347*946379e7Schristos
348*946379e7Schristos
349*946379e7Schristos<H3><A NAME="SEC171" HREF="gettext_toc.html#TOC171">11.2.3  Locating Message Catalog Files</A></H3>
350*946379e7Schristos<P>
351*946379e7Schristos<A NAME="IDX1017"></A>
352*946379e7Schristos
353*946379e7Schristos</P>
354*946379e7Schristos<P>
355*946379e7SchristosBecause many different languages for many different packages have to be
356*946379e7Schristosstored we need some way to add these information to file message catalog
357*946379e7Schristosfiles.  The way usually used in Unix environments is have this encoding
358*946379e7Schristosin the file name.  This is also done here.  The directory name given in
359*946379e7Schristos<CODE>bindtextdomain</CODE>s second argument (or the default directory),
360*946379e7Schristosfollowed by the value and name of the locale and the domain name are
361*946379e7Schristosconcatenated:
362*946379e7Schristos
363*946379e7Schristos</P>
364*946379e7Schristos
365*946379e7Schristos<PRE>
366*946379e7Schristos<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo
367*946379e7Schristos</PRE>
368*946379e7Schristos
369*946379e7Schristos<P>
370*946379e7SchristosThe default value for <VAR>dir_name</VAR> is system specific.  For the GNU
371*946379e7Schristoslibrary, and for packages adhering to its conventions, it's:
372*946379e7Schristos
373*946379e7Schristos<PRE>
374*946379e7Schristos/usr/local/share/locale
375*946379e7Schristos</PRE>
376*946379e7Schristos
377*946379e7Schristos<P>
378*946379e7Schristos<VAR>locale</VAR> is the value of the locale whose name is this
379*946379e7Schristos<CODE>LC_<VAR>category</VAR></CODE>.  For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this
380*946379e7Schristos<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>
381*946379e7SchristosThe value of the locale is determined through
382*946379e7Schristos<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>.
383*946379e7Schristos<A NAME="DOCF4" HREF="gettext_foot.html#FOOT4">(4)</A>
384*946379e7Schristos<CODE>dcgettext</CODE> specifies the locale category by the third argument.
385*946379e7Schristos
386*946379e7Schristos</P>
387*946379e7Schristos
388*946379e7Schristos
389*946379e7Schristos<H3><A NAME="SEC172" HREF="gettext_toc.html#TOC172">11.2.4  How to specify the output character set <CODE>gettext</CODE> uses</A></H3>
390*946379e7Schristos<P>
391*946379e7Schristos<A NAME="IDX1018"></A>
392*946379e7Schristos<A NAME="IDX1019"></A>
393*946379e7Schristos
394*946379e7Schristos</P>
395*946379e7Schristos<P>
396*946379e7Schristos<CODE>gettext</CODE> not only looks up a translation in a message catalog.  It
397*946379e7Schristosalso converts the translation on the fly to the desired output character
398*946379e7Schristosset.  This is useful if the user is working in a different character set
399*946379e7Schristosthan the translator who created the message catalog, because it avoids
400*946379e7Schristosdistributing variants of message catalogs which differ only in the
401*946379e7Schristoscharacter set.
402*946379e7Schristos
403*946379e7Schristos</P>
404*946379e7Schristos<P>
405*946379e7SchristosThe output character set is, by default, the value of <CODE>nl_langinfo
406*946379e7Schristos(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current
407*946379e7Schristoslocale.  But programs which store strings in a locale independent way
408*946379e7Schristos(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions
409*946379e7Schristosreturn the translations in that encoding, by use of the
410*946379e7Schristos<CODE>bind_textdomain_codeset</CODE> function.
411*946379e7Schristos
412*946379e7Schristos</P>
413*946379e7Schristos<P>
414*946379e7SchristosNote that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to
415*946379e7Schristoscharacter set conversion.  Also, when <CODE>gettext</CODE> does not find a
416*946379e7Schristostranslation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged --
417*946379e7Schristosindependently of the current output character set.  It is therefore
418*946379e7Schristosrecommended that all <VAR>msgid</VAR>s be US-ASCII strings.
419*946379e7Schristos
420*946379e7Schristos</P>
421*946379e7Schristos<P>
422*946379e7Schristos<DL>
423*946379e7Schristos<DT><U>Function:</U> char * <B>bind_textdomain_codeset</B> <I>(const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)</I>
424*946379e7Schristos<DD><A NAME="IDX1020"></A>
425*946379e7SchristosThe <CODE>bind_textdomain_codeset</CODE> function can be used to specify the
426*946379e7Schristosoutput character set for message catalogs for domain <VAR>domainname</VAR>.
427*946379e7SchristosThe <VAR>codeset</VAR> argument must be a valid codeset name which can be used
428*946379e7Schristosfor the <CODE>iconv_open</CODE> function, or a null pointer.
429*946379e7Schristos
430*946379e7Schristos</P>
431*946379e7Schristos<P>
432*946379e7SchristosIf the <VAR>codeset</VAR> parameter is the null pointer,
433*946379e7Schristos<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset
434*946379e7Schristosfor the domain with the name <VAR>domainname</VAR>.  It returns <CODE>NULL</CODE> if
435*946379e7Schristosno codeset has yet been selected.
436*946379e7Schristos
437*946379e7Schristos</P>
438*946379e7Schristos<P>
439*946379e7SchristosThe <CODE>bind_textdomain_codeset</CODE> function can be used several times.
440*946379e7SchristosIf used multiple times with the same <VAR>domainname</VAR> argument, the
441*946379e7Schristoslater call overrides the settings made by the earlier one.
442*946379e7Schristos
443*946379e7Schristos</P>
444*946379e7Schristos<P>
445*946379e7SchristosThe <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a
446*946379e7Schristosstring containing the name of the selected codeset.  The string is
447*946379e7Schristosallocated internally in the function and must not be changed by the
448*946379e7Schristosuser.  If the system went out of core during the execution of
449*946379e7Schristos<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the
450*946379e7Schristosglobal variable <VAR>errno</VAR> is set accordingly.
451*946379e7Schristos</DL>
452*946379e7Schristos
453*946379e7Schristos</P>
454*946379e7Schristos
455*946379e7Schristos
456*946379e7Schristos<H3><A NAME="SEC173" HREF="gettext_toc.html#TOC173">11.2.5  Using contexts for solving ambiguities</A></H3>
457*946379e7Schristos<P>
458*946379e7Schristos<A NAME="IDX1021"></A>
459*946379e7Schristos<A NAME="IDX1022"></A>
460*946379e7Schristos<A NAME="IDX1023"></A>
461*946379e7Schristos<A NAME="IDX1024"></A>
462*946379e7Schristos
463*946379e7Schristos</P>
464*946379e7Schristos<P>
465*946379e7SchristosOne place where the <CODE>gettext</CODE> functions, if used normally, have big
466*946379e7Schristosproblems is within programs with graphical user interfaces (GUIs).  The
467*946379e7Schristosproblem is that many of the strings which have to be translated are very
468*946379e7Schristosshort.  They have to appear in pull-down menus which restricts the
469*946379e7Schristoslength.  But strings which are not containing entire sentences or at
470*946379e7Schristosleast large fragments of a sentence may appear in more than one
471*946379e7Schristossituation in the program but might have different translations.  This is
472*946379e7Schristosespecially true for the one-word strings which are frequently used in
473*946379e7SchristosGUI programs.
474*946379e7Schristos
475*946379e7Schristos</P>
476*946379e7Schristos<P>
477*946379e7SchristosAs a consequence many people say that the <CODE>gettext</CODE> approach is
478*946379e7Schristoswrong and instead <CODE>catgets</CODE> should be used which indeed does not
479*946379e7Schristoshave this problem.  But there is a very simple and powerful method to
480*946379e7Schristoshandle this kind of problems with the <CODE>gettext</CODE> functions.
481*946379e7Schristos
482*946379e7Schristos</P>
483*946379e7Schristos<P>
484*946379e7SchristosContexts can be added to strings to be translated.  A context dependent
485*946379e7Schristostranslation lookup is when a translation for a given string is searched,
486*946379e7Schristosthat is limited to a given context.  The translation for the same string
487*946379e7Schristosin a different context can be different.  The different translations of
488*946379e7Schristosthe same string in different contexts can be stored in the in the same
489*946379e7SchristosMO file, and can be edited by the translator in the same PO file.
490*946379e7Schristos
491*946379e7Schristos</P>
492*946379e7Schristos<P>
493*946379e7SchristosThe <TT>&lsquo;gettext.h&rsquo;</TT> include file contains the lookup macros for strings
494*946379e7Schristoswith contexts.  They are implemented as thin macros and inline functions
495*946379e7Schristosover the functions from <CODE>&#60;libintl.h&#62;</CODE>.
496*946379e7Schristos
497*946379e7Schristos</P>
498*946379e7Schristos<P>
499*946379e7Schristos<A NAME="IDX1025"></A>
500*946379e7Schristos
501*946379e7Schristos<PRE>
502*946379e7Schristosconst char *pgettext (const char *msgctxt, const char *msgid);
503*946379e7Schristos</PRE>
504*946379e7Schristos
505*946379e7Schristos<P>
506*946379e7SchristosIn a call of this macro, <VAR>msgctxt</VAR> and <VAR>msgid</VAR> must be string
507*946379e7Schristosliterals.  The macro returns the translation of <VAR>msgid</VAR>, restricted
508*946379e7Schristosto the context given by <VAR>msgctxt</VAR>.
509*946379e7Schristos
510*946379e7Schristos</P>
511*946379e7Schristos<P>
512*946379e7SchristosThe <VAR>msgctxt</VAR> string is visible in the PO file to the translator.
513*946379e7SchristosYou should try to make it somehow canonical and never changing.  Because
514*946379e7Schristosevery time you change an <VAR>msgctxt</VAR>, the translator will have to review
515*946379e7Schristosthe translation of <VAR>msgid</VAR>.
516*946379e7Schristos
517*946379e7Schristos</P>
518*946379e7Schristos<P>
519*946379e7SchristosFinding a canonical <VAR>msgctxt</VAR> string that doesn't change over time can
520*946379e7Schristosbe hard.  But you shouldn't use the file name or class name containing the
521*946379e7Schristos<CODE>pgettext</CODE> call -- because it is a common development task to rename
522*946379e7Schristosa file or a class, and it shouldn't cause translator work.  Also you shouldn't
523*946379e7Schristosuse a comment in the form of a complete English sentence as <VAR>msgctxt</VAR> --
524*946379e7Schristosbecause orthography or grammar changes are often applied to such sentences,
525*946379e7Schristosand again, it shouldn't force the translator to do a review.
526*946379e7Schristos
527*946379e7Schristos</P>
528*946379e7Schristos<P>
529*946379e7SchristosThe <SAMP>&lsquo;p&rsquo;</SAMP> in <SAMP>&lsquo;pgettext&rsquo;</SAMP> stands for “particular”: <CODE>pgettext</CODE>
530*946379e7Schristosfetches a particular translation of the <VAR>msgid</VAR>.
531*946379e7Schristos
532*946379e7Schristos</P>
533*946379e7Schristos<P>
534*946379e7Schristos<A NAME="IDX1026"></A>
535*946379e7Schristos<A NAME="IDX1027"></A>
536*946379e7Schristos
537*946379e7Schristos<PRE>
538*946379e7Schristosconst char *dpgettext (const char *domain_name,
539*946379e7Schristos                       const char *msgctxt, const char *msgid);
540*946379e7Schristosconst char *dcpgettext (const char *domain_name,
541*946379e7Schristos                        const char *msgctxt, const char *msgid,
542*946379e7Schristos                        int category);
543*946379e7Schristos</PRE>
544*946379e7Schristos
545*946379e7Schristos<P>
546*946379e7SchristosThese are generalizations of <CODE>pgettext</CODE>.  They behave similarly to
547*946379e7Schristos<CODE>dgettext</CODE> and <CODE>dcgettext</CODE>, respectively.  The <VAR>domain_name</VAR>
548*946379e7Schristosargument defines the translation domain.  The <VAR>category</VAR> argument
549*946379e7Schristosallows to use another locale facet than <CODE>LC_MESSAGES</CODE>.
550*946379e7Schristos
551*946379e7Schristos</P>
552*946379e7Schristos<P>
553*946379e7SchristosAs as example consider the following fictional situation.  A GUI program
554*946379e7Schristoshas a menu bar with the following entries:
555*946379e7Schristos
556*946379e7Schristos</P>
557*946379e7Schristos
558*946379e7Schristos<PRE>
559*946379e7Schristos+------------+------------+--------------------------------------+
560*946379e7Schristos| File       | Printer    |                                      |
561*946379e7Schristos+------------+------------+--------------------------------------+
562*946379e7Schristos| Open     | | Select   |
563*946379e7Schristos| New      | | Open     |
564*946379e7Schristos+----------+ | Connect  |
565*946379e7Schristos             +----------+
566*946379e7Schristos</PRE>
567*946379e7Schristos
568*946379e7Schristos<P>
569*946379e7SchristosTo have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>,
570*946379e7Schristos<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be
571*946379e7Schristosat some point in the code a call to a function of the <CODE>gettext</CODE>
572*946379e7Schristosfamily.  But in two places the string passed into the function would be
573*946379e7Schristos<CODE>Open</CODE>.  The translations might not be the same and therefore we
574*946379e7Schristosare in the dilemma described above.
575*946379e7Schristos
576*946379e7Schristos</P>
577*946379e7Schristos<P>
578*946379e7SchristosWhat distinguishes the two places is the menu path from the menu root to
579*946379e7Schristosthe particular menu entries:
580*946379e7Schristos
581*946379e7Schristos</P>
582*946379e7Schristos
583*946379e7Schristos<PRE>
584*946379e7SchristosMenu|File
585*946379e7SchristosMenu|Printer
586*946379e7SchristosMenu|File|Open
587*946379e7SchristosMenu|File|New
588*946379e7SchristosMenu|Printer|Select
589*946379e7SchristosMenu|Printer|Open
590*946379e7SchristosMenu|Printer|Connect
591*946379e7Schristos</PRE>
592*946379e7Schristos
593*946379e7Schristos<P>
594*946379e7SchristosThe context is thus the menu path without its last part.  So, the calls
595*946379e7Schristoslook like this:
596*946379e7Schristos
597*946379e7Schristos</P>
598*946379e7Schristos
599*946379e7Schristos<PRE>
600*946379e7Schristospgettext ("Menu|", "File")
601*946379e7Schristospgettext ("Menu|", "Printer")
602*946379e7Schristospgettext ("Menu|File|", "Open")
603*946379e7Schristospgettext ("Menu|File|", "New")
604*946379e7Schristospgettext ("Menu|Printer|", "Select")
605*946379e7Schristospgettext ("Menu|Printer|", "Open")
606*946379e7Schristospgettext ("Menu|Printer|", "Connect")
607*946379e7Schristos</PRE>
608*946379e7Schristos
609*946379e7Schristos<P>
610*946379e7SchristosWhether or not to use the <SAMP>&lsquo;|&rsquo;</SAMP> character at the end of the context is a
611*946379e7Schristosmatter of style.
612*946379e7Schristos
613*946379e7Schristos</P>
614*946379e7Schristos<P>
615*946379e7SchristosFor more complex cases, where the <VAR>msgctxt</VAR> or <VAR>msgid</VAR> are not
616*946379e7Schristosstring literals, more general macros are available:
617*946379e7Schristos
618*946379e7Schristos</P>
619*946379e7Schristos<P>
620*946379e7Schristos<A NAME="IDX1028"></A>
621*946379e7Schristos<A NAME="IDX1029"></A>
622*946379e7Schristos<A NAME="IDX1030"></A>
623*946379e7Schristos
624*946379e7Schristos<PRE>
625*946379e7Schristosconst char *pgettext_expr (const char *msgctxt, const char *msgid);
626*946379e7Schristosconst char *dpgettext_expr (const char *domain_name,
627*946379e7Schristos                            const char *msgctxt, const char *msgid);
628*946379e7Schristosconst char *dcpgettext_expr (const char *domain_name,
629*946379e7Schristos                             const char *msgctxt, const char *msgid,
630*946379e7Schristos                             int category);
631*946379e7Schristos</PRE>
632*946379e7Schristos
633*946379e7Schristos<P>
634*946379e7SchristosHere <VAR>msgctxt</VAR> and <VAR>msgid</VAR> can be arbitrary string-valued expressions.
635*946379e7SchristosThese macros are more general.  But in the case that both argument expressions
636*946379e7Schristosare string literals, the macros without the <SAMP>&lsquo;_expr&rsquo;</SAMP> suffix are more
637*946379e7Schristosefficient.
638*946379e7Schristos
639*946379e7Schristos</P>
640*946379e7Schristos
641*946379e7Schristos
642*946379e7Schristos<H3><A NAME="SEC174" HREF="gettext_toc.html#TOC174">11.2.6  Additional functions for plural forms</A></H3>
643*946379e7Schristos<P>
644*946379e7Schristos<A NAME="IDX1031"></A>
645*946379e7Schristos
646*946379e7Schristos</P>
647*946379e7Schristos<P>
648*946379e7SchristosThe functions of the <CODE>gettext</CODE> family described so far (and all the
649*946379e7Schristos<CODE>catgets</CODE> functions as well) have one problem in the real world
650*946379e7Schristoswhich have been neglected completely in all existing approaches.  What
651*946379e7Schristosis meant here is the handling of plural forms.
652*946379e7Schristos
653*946379e7Schristos</P>
654*946379e7Schristos<P>
655*946379e7SchristosLooking through Unix source code before the time anybody thought about
656*946379e7Schristosinternationalization (and, sadly, even afterwards) one can often find
657*946379e7Schristoscode similar to the following:
658*946379e7Schristos
659*946379e7Schristos</P>
660*946379e7Schristos
661*946379e7Schristos<PRE>
662*946379e7Schristos   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
663*946379e7Schristos</PRE>
664*946379e7Schristos
665*946379e7Schristos<P>
666*946379e7SchristosAfter the first complaints from people internationalizing the code people
667*946379e7Schristoseither completely avoided formulations like this or used strings like
668*946379e7Schristos<CODE>"file(s)"</CODE>.  Both look unnatural and should be avoided.  First
669*946379e7Schristostries to solve the problem correctly looked like this:
670*946379e7Schristos
671*946379e7Schristos</P>
672*946379e7Schristos
673*946379e7Schristos<PRE>
674*946379e7Schristos   if (n == 1)
675*946379e7Schristos     printf ("%d file deleted", n);
676*946379e7Schristos   else
677*946379e7Schristos     printf ("%d files deleted", n);
678*946379e7Schristos</PRE>
679*946379e7Schristos
680*946379e7Schristos<P>
681*946379e7SchristosBut this does not solve the problem.  It helps languages where the
682*946379e7Schristosplural form of a noun is not simply constructed by adding an
683*946379e7Schristos‘s’
684*946379e7Schristosbut that is all.  Once again people fell into the trap of believing the
685*946379e7Schristosrules their language is using are universal.  But the handling of plural
686*946379e7Schristosforms differs widely between the language families.  For example,
687*946379e7SchristosRafal Maszkowski <CODE>&#60;rzm@mat.uni.torun.pl&#62;</CODE> reports:
688*946379e7Schristos
689*946379e7Schristos</P>
690*946379e7Schristos
691*946379e7Schristos<BLOCKQUOTE>
692*946379e7Schristos<P>
693*946379e7SchristosIn Polish we use e.g. plik (file) this way:
694*946379e7Schristos
695*946379e7Schristos<PRE>
696*946379e7Schristos1 plik
697*946379e7Schristos2,3,4 pliki
698*946379e7Schristos5-21 pliko'w
699*946379e7Schristos22-24 pliki
700*946379e7Schristos25-31 pliko'w
701*946379e7Schristos</PRE>
702*946379e7Schristos
703*946379e7Schristos<P>
704*946379e7Schristosand so on (o' means 8859-2 oacute which should be rather okreska,
705*946379e7Schristossimilar to aogonek).
706*946379e7Schristos</BLOCKQUOTE>
707*946379e7Schristos
708*946379e7Schristos<P>
709*946379e7SchristosThere are two things which can differ between languages (and even inside
710*946379e7Schristoslanguage families);
711*946379e7Schristos
712*946379e7Schristos</P>
713*946379e7Schristos
714*946379e7Schristos<UL>
715*946379e7Schristos<LI>
716*946379e7Schristos
717*946379e7SchristosThe form how plural forms are built differs.  This is a problem with
718*946379e7Schristoslanguages which have many irregularities.  German, for instance, is a
719*946379e7Schristosdrastic case.  Though English and German are part of the same language
720*946379e7Schristosfamily (Germanic), the almost regular forming of plural noun forms
721*946379e7Schristos(appending an
722*946379e7Schristos‘s’)
723*946379e7Schristosis hardly found in German.
724*946379e7Schristos
725*946379e7Schristos<LI>
726*946379e7Schristos
727*946379e7SchristosThe number of plural forms differ.  This is somewhat surprising for
728*946379e7Schristosthose who only have experiences with Romanic and Germanic languages
729*946379e7Schristossince here the number is the same (there are two).
730*946379e7Schristos
731*946379e7SchristosBut other language families have only one form or many forms.  More
732*946379e7Schristosinformation on this in an extra section.
733*946379e7Schristos</UL>
734*946379e7Schristos
735*946379e7Schristos<P>
736*946379e7SchristosThe consequence of this is that application writers should not try to
737*946379e7Schristossolve the problem in their code.  This would be localization since it is
738*946379e7Schristosonly usable for certain, hardcoded language environments.  Instead the
739*946379e7Schristosextended <CODE>gettext</CODE> interface should be used.
740*946379e7Schristos
741*946379e7Schristos</P>
742*946379e7Schristos<P>
743*946379e7SchristosThese extra functions are taking instead of the one key string two
744*946379e7Schristosstrings and a numerical argument.  The idea behind this is that using
745*946379e7Schristosthe numerical argument and the first string as a key, the implementation
746*946379e7Schristoscan select using rules specified by the translator the right plural
747*946379e7Schristosform.  The two string arguments then will be used to provide a return
748*946379e7Schristosvalue in case no message catalog is found (similar to the normal
749*946379e7Schristos<CODE>gettext</CODE> behavior).  In this case the rules for Germanic language
750*946379e7Schristosis used and it is assumed that the first string argument is the singular
751*946379e7Schristosform, the second the plural form.
752*946379e7Schristos
753*946379e7Schristos</P>
754*946379e7Schristos<P>
755*946379e7SchristosThis has the consequence that programs without language catalogs can
756*946379e7Schristosdisplay the correct strings only if the program itself is written using
757*946379e7Schristosa Germanic language.  This is a limitation but since the GNU C library
758*946379e7Schristos(as well as the GNU <CODE>gettext</CODE> package) are written as part of the
759*946379e7SchristosGNU package and the coding standards for the GNU project require program
760*946379e7Schristosbeing written in English, this solution nevertheless fulfills its
761*946379e7Schristospurpose.
762*946379e7Schristos
763*946379e7Schristos</P>
764*946379e7Schristos<P>
765*946379e7Schristos<DL>
766*946379e7Schristos<DT><U>Function:</U> char * <B>ngettext</B> <I>(const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I>
767*946379e7Schristos<DD><A NAME="IDX1032"></A>
768*946379e7SchristosThe <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function
769*946379e7Schristosas it finds the message catalogs in the same way.  But it takes two
770*946379e7Schristosextra arguments.  The <VAR>msgid1</VAR> parameter must contain the singular
771*946379e7Schristosform of the string to be converted.  It is also used as the key for the
772*946379e7Schristossearch in the catalog.  The <VAR>msgid2</VAR> parameter is the plural form.
773*946379e7SchristosThe parameter <VAR>n</VAR> is used to determine the plural form.  If no
774*946379e7Schristosmessage catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>,
775*946379e7Schristosotherwise <CODE>msgid2</CODE>.
776*946379e7Schristos
777*946379e7Schristos</P>
778*946379e7Schristos<P>
779*946379e7SchristosAn example for the use of this function is:
780*946379e7Schristos
781*946379e7Schristos</P>
782*946379e7Schristos
783*946379e7Schristos<PRE>
784*946379e7Schristosprintf (ngettext ("%d file removed", "%d files removed", n), n);
785*946379e7Schristos</PRE>
786*946379e7Schristos
787*946379e7Schristos<P>
788*946379e7SchristosPlease note that the numeric value <VAR>n</VAR> has to be passed to the
789*946379e7Schristos<CODE>printf</CODE> function as well.  It is not sufficient to pass it only to
790*946379e7Schristos<CODE>ngettext</CODE>.
791*946379e7Schristos
792*946379e7Schristos</P>
793*946379e7Schristos<P>
794*946379e7SchristosIn the English singular case, the number -- always 1 -- can be replaced with
795*946379e7Schristos"one":
796*946379e7Schristos
797*946379e7Schristos</P>
798*946379e7Schristos
799*946379e7Schristos<PRE>
800*946379e7Schristosprintf (ngettext ("One file removed", "%d files removed", n), n);
801*946379e7Schristos</PRE>
802*946379e7Schristos
803*946379e7Schristos<P>
804*946379e7SchristosThis works because the <SAMP>&lsquo;printf&rsquo;</SAMP> function discards excess arguments that
805*946379e7Schristosare not consumed by the format string.
806*946379e7Schristos
807*946379e7Schristos</P>
808*946379e7Schristos<P>
809*946379e7SchristosIt is also possible to use this function when the strings don't contain a
810*946379e7Schristoscardinal number:
811*946379e7Schristos
812*946379e7Schristos</P>
813*946379e7Schristos
814*946379e7Schristos<PRE>
815*946379e7Schristosputs (ngettext ("Delete the selected file?",
816*946379e7Schristos                "Delete the selected files?",
817*946379e7Schristos                n));
818*946379e7Schristos</PRE>
819*946379e7Schristos
820*946379e7Schristos<P>
821*946379e7SchristosIn this case the number <VAR>n</VAR> is only used to choose the plural form.
822*946379e7Schristos</DL>
823*946379e7Schristos
824*946379e7Schristos</P>
825*946379e7Schristos<P>
826*946379e7Schristos<DL>
827*946379e7Schristos<DT><U>Function:</U> char * <B>dngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)</I>
828*946379e7Schristos<DD><A NAME="IDX1033"></A>
829*946379e7SchristosThe <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the
830*946379e7Schristosway the message catalog is selected.  The difference is that it takes
831*946379e7Schristostwo extra parameter to provide the correct plural form.  These two
832*946379e7Schristosparameters are handled in the same way <CODE>ngettext</CODE> handles them.
833*946379e7Schristos</DL>
834*946379e7Schristos
835*946379e7Schristos</P>
836*946379e7Schristos<P>
837*946379e7Schristos<DL>
838*946379e7Schristos<DT><U>Function:</U> char * <B>dcngettext</B> <I>(const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)</I>
839*946379e7Schristos<DD><A NAME="IDX1034"></A>
840*946379e7SchristosThe <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the
841*946379e7Schristosway the message catalog is selected.  The difference is that it takes
842*946379e7Schristostwo extra parameter to provide the correct plural form.  These two
843*946379e7Schristosparameters are handled in the same way <CODE>ngettext</CODE> handles them.
844*946379e7Schristos</DL>
845*946379e7Schristos
846*946379e7Schristos</P>
847*946379e7Schristos<P>
848*946379e7SchristosNow, how do these functions solve the problem of the plural forms?
849*946379e7SchristosWithout the input of linguists (which was not available) it was not
850*946379e7Schristospossible to determine whether there are only a few different forms in
851*946379e7Schristoswhich plural forms are formed or whether the number can increase with
852*946379e7Schristosevery new supported language.
853*946379e7Schristos
854*946379e7Schristos</P>
855*946379e7Schristos<P>
856*946379e7SchristosTherefore the solution implemented is to allow the translator to specify
857*946379e7Schristosthe rules of how to select the plural form.  Since the formula varies
858*946379e7Schristoswith every language this is the only viable solution except for
859*946379e7Schristoshardcoding the information in the code (which still would require the
860*946379e7Schristospossibility of extensions to not prevent the use of new languages).
861*946379e7Schristos
862*946379e7Schristos</P>
863*946379e7Schristos<P>
864*946379e7Schristos<A NAME="IDX1035"></A>
865*946379e7Schristos<A NAME="IDX1036"></A>
866*946379e7Schristos<A NAME="IDX1037"></A>
867*946379e7SchristosThe information about the plural form selection has to be stored in the
868*946379e7Schristosheader entry of the PO file (the one with the empty <CODE>msgid</CODE> string).
869*946379e7SchristosThe plural form information looks like this:
870*946379e7Schristos
871*946379e7Schristos</P>
872*946379e7Schristos
873*946379e7Schristos<PRE>
874*946379e7SchristosPlural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
875*946379e7Schristos</PRE>
876*946379e7Schristos
877*946379e7Schristos<P>
878*946379e7SchristosThe <CODE>nplurals</CODE> value must be a decimal number which specifies how
879*946379e7Schristosmany different plural forms exist for this language.  The string
880*946379e7Schristosfollowing <CODE>plural</CODE> is an expression which is using the C language
881*946379e7Schristossyntax.  Exceptions are that no negative numbers are allowed, numbers
882*946379e7Schristosmust be decimal, and the only variable allowed is <CODE>n</CODE>.  Spaces are
883*946379e7Schristosallowed in the expression, but backslash-newlines are not; in the
884*946379e7Schristosexamples below the backslash-newlines are present for formatting purposes
885*946379e7Schristosonly.  This expression will be evaluated whenever one of the functions
886*946379e7Schristos<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called.  The
887*946379e7Schristosnumeric value passed to these functions is then substituted for all uses
888*946379e7Schristosof the variable <CODE>n</CODE> in the expression.  The resulting value then
889*946379e7Schristosmust be greater or equal to zero and smaller than the value given as the
890*946379e7Schristosvalue of <CODE>nplurals</CODE>.
891*946379e7Schristos
892*946379e7Schristos</P>
893*946379e7Schristos<P>
894*946379e7Schristos<A NAME="IDX1038"></A>
895*946379e7SchristosThe following rules are known at this point.  The language with families
896*946379e7Schristosare listed.  But this does not necessarily mean the information can be
897*946379e7Schristosgeneralized for the whole family (as can be easily seen in the table
898*946379e7Schristosbelow).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A>
899*946379e7Schristos
900*946379e7Schristos</P>
901*946379e7Schristos<DL COMPACT>
902*946379e7Schristos
903*946379e7Schristos<DT>Only one form:
904*946379e7Schristos<DD>
905*946379e7SchristosSome languages only require one single form.  There is no distinction
906*946379e7Schristosbetween the singular and plural form.  An appropriate header entry
907*946379e7Schristoswould look like this:
908*946379e7Schristos
909*946379e7Schristos
910*946379e7Schristos<PRE>
911*946379e7SchristosPlural-Forms: nplurals=1; plural=0;
912*946379e7Schristos</PRE>
913*946379e7Schristos
914*946379e7SchristosLanguages with this property include:
915*946379e7Schristos
916*946379e7Schristos<DL COMPACT>
917*946379e7Schristos
918*946379e7Schristos<DT>Asian family
919*946379e7Schristos<DD>
920*946379e7SchristosJapanese, Korean, Vietnamese
921*946379e7Schristos<DT>Turkic/Altaic family
922*946379e7Schristos<DD>
923*946379e7SchristosTurkish
924*946379e7Schristos</DL>
925*946379e7Schristos
926*946379e7Schristos<DT>Two forms, singular used for one only
927*946379e7Schristos<DD>
928*946379e7SchristosThis is the form used in most existing programs since it is what English
929*946379e7Schristosis using.  A header entry would look like this:
930*946379e7Schristos
931*946379e7Schristos
932*946379e7Schristos<PRE>
933*946379e7SchristosPlural-Forms: nplurals=2; plural=n != 1;
934*946379e7Schristos</PRE>
935*946379e7Schristos
936*946379e7Schristos(Note: this uses the feature of C expressions that boolean expressions
937*946379e7Schristoshave to value zero or one.)
938*946379e7Schristos
939*946379e7SchristosLanguages with this property include:
940*946379e7Schristos
941*946379e7Schristos<DL COMPACT>
942*946379e7Schristos
943*946379e7Schristos<DT>Germanic family
944*946379e7Schristos<DD>
945*946379e7SchristosDanish, Dutch, English, Faroese, German, Norwegian, Swedish
946*946379e7Schristos<DT>Finno-Ugric family
947*946379e7Schristos<DD>
948*946379e7SchristosEstonian, Finnish
949*946379e7Schristos<DT>Latin/Greek family
950*946379e7Schristos<DD>
951*946379e7SchristosGreek
952*946379e7Schristos<DT>Semitic family
953*946379e7Schristos<DD>
954*946379e7SchristosHebrew
955*946379e7Schristos<DT>Romanic family
956*946379e7Schristos<DD>
957*946379e7SchristosItalian, Portuguese, Spanish
958*946379e7Schristos<DT>Artificial
959*946379e7Schristos<DD>
960*946379e7SchristosEsperanto
961*946379e7Schristos</DL>
962*946379e7Schristos
963*946379e7SchristosAnother language using the same header entry is:
964*946379e7Schristos
965*946379e7Schristos<DL COMPACT>
966*946379e7Schristos
967*946379e7Schristos<DT>Finno-Ugric family
968*946379e7Schristos<DD>
969*946379e7SchristosHungarian
970*946379e7Schristos</DL>
971*946379e7Schristos
972*946379e7SchristosHungarian does not appear to have a plural if you look at sentences involving
973*946379e7Schristoscardinal numbers.  For example, “1 apple” is “1 alma”, and “123 apples” is
974*946379e7Schristos“123 alma”.  But when the number is not explicit, the distinction between
975*946379e7Schristossingular and plural exists: “the apple” is “az alma”, and “the apples” is
976*946379e7Schristos“az alm'{a}k”.  Since <CODE>ngettext</CODE> has to support both types of sentences,
977*946379e7Schristosit is classified here, under “two forms”.
978*946379e7Schristos
979*946379e7Schristos<DT>Two forms, singular used for zero and one
980*946379e7Schristos<DD>
981*946379e7SchristosExceptional case in the language family.  The header entry would be:
982*946379e7Schristos
983*946379e7Schristos
984*946379e7Schristos<PRE>
985*946379e7SchristosPlural-Forms: nplurals=2; plural=n&#62;1;
986*946379e7Schristos</PRE>
987*946379e7Schristos
988*946379e7SchristosLanguages with this property include:
989*946379e7Schristos
990*946379e7Schristos<DL COMPACT>
991*946379e7Schristos
992*946379e7Schristos<DT>Romanic family
993*946379e7Schristos<DD>
994*946379e7SchristosFrench, Brazilian Portuguese
995*946379e7Schristos</DL>
996*946379e7Schristos
997*946379e7Schristos<DT>Three forms, special case for zero
998*946379e7Schristos<DD>
999*946379e7SchristosThe header entry would be:
1000*946379e7Schristos
1001*946379e7Schristos
1002*946379e7Schristos<PRE>
1003*946379e7SchristosPlural-Forms: nplurals=3; plural=n%10==1 &#38;&#38; n%100!=11 ? 0 : n != 0 ? 1 : 2;
1004*946379e7Schristos</PRE>
1005*946379e7Schristos
1006*946379e7SchristosLanguages with this property include:
1007*946379e7Schristos
1008*946379e7Schristos<DL COMPACT>
1009*946379e7Schristos
1010*946379e7Schristos<DT>Baltic family
1011*946379e7Schristos<DD>
1012*946379e7SchristosLatvian
1013*946379e7Schristos</DL>
1014*946379e7Schristos
1015*946379e7Schristos<DT>Three forms, special cases for one and two
1016*946379e7Schristos<DD>
1017*946379e7SchristosThe header entry would be:
1018*946379e7Schristos
1019*946379e7Schristos
1020*946379e7Schristos<PRE>
1021*946379e7SchristosPlural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
1022*946379e7Schristos</PRE>
1023*946379e7Schristos
1024*946379e7SchristosLanguages with this property include:
1025*946379e7Schristos
1026*946379e7Schristos<DL COMPACT>
1027*946379e7Schristos
1028*946379e7Schristos<DT>Celtic
1029*946379e7Schristos<DD>
1030*946379e7SchristosGaeilge (Irish)
1031*946379e7Schristos</DL>
1032*946379e7Schristos
1033*946379e7Schristos<DT>Three forms, special case for numbers ending in 00 or [2-9][0-9]
1034*946379e7Schristos<DD>
1035*946379e7SchristosThe header entry would be:
1036*946379e7Schristos
1037*946379e7Schristos
1038*946379e7Schristos<PRE>
1039*946379e7SchristosPlural-Forms: nplurals=3; \
1040*946379e7Schristos    plural=n==1 ? 0 : (n==0 || (n%100 &#62; 0 &#38;&#38; n%100 &#60; 20)) ? 1 : 2;
1041*946379e7Schristos</PRE>
1042*946379e7Schristos
1043*946379e7SchristosLanguages with this property include:
1044*946379e7Schristos
1045*946379e7Schristos<DL COMPACT>
1046*946379e7Schristos
1047*946379e7Schristos<DT>Romanic family
1048*946379e7Schristos<DD>
1049*946379e7SchristosRomanian
1050*946379e7Schristos</DL>
1051*946379e7Schristos
1052*946379e7Schristos<DT>Three forms, special case for numbers ending in 1[2-9]
1053*946379e7Schristos<DD>
1054*946379e7SchristosThe header entry would look like this:
1055*946379e7Schristos
1056*946379e7Schristos
1057*946379e7Schristos<PRE>
1058*946379e7SchristosPlural-Forms: nplurals=3; \
1059*946379e7Schristos    plural=n%10==1 &#38;&#38; n%100!=11 ? 0 : \
1060*946379e7Schristos           n%10&#62;=2 &#38;&#38; (n%100&#60;10 || n%100&#62;=20) ? 1 : 2;
1061*946379e7Schristos</PRE>
1062*946379e7Schristos
1063*946379e7SchristosLanguages with this property include:
1064*946379e7Schristos
1065*946379e7Schristos<DL COMPACT>
1066*946379e7Schristos
1067*946379e7Schristos<DT>Baltic family
1068*946379e7Schristos<DD>
1069*946379e7SchristosLithuanian
1070*946379e7Schristos</DL>
1071*946379e7Schristos
1072*946379e7Schristos<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
1073*946379e7Schristos<DD>
1074*946379e7SchristosThe header entry would look like this:
1075*946379e7Schristos
1076*946379e7Schristos
1077*946379e7Schristos<PRE>
1078*946379e7SchristosPlural-Forms: nplurals=3; \
1079*946379e7Schristos    plural=n%10==1 &#38;&#38; n%100!=11 ? 0 : \
1080*946379e7Schristos           n%10&#62;=2 &#38;&#38; n%10&#60;=4 &#38;&#38; (n%100&#60;10 || n%100&#62;=20) ? 1 : 2;
1081*946379e7Schristos</PRE>
1082*946379e7Schristos
1083*946379e7SchristosLanguages with this property include:
1084*946379e7Schristos
1085*946379e7Schristos<DL COMPACT>
1086*946379e7Schristos
1087*946379e7Schristos<DT>Slavic family
1088*946379e7Schristos<DD>
1089*946379e7SchristosCroatian, Serbian, Russian, Ukrainian
1090*946379e7Schristos</DL>
1091*946379e7Schristos
1092*946379e7Schristos<DT>Three forms, special cases for 1 and 2, 3, 4
1093*946379e7Schristos<DD>
1094*946379e7SchristosThe header entry would look like this:
1095*946379e7Schristos
1096*946379e7Schristos
1097*946379e7Schristos<PRE>
1098*946379e7SchristosPlural-Forms: nplurals=3; \
1099*946379e7Schristos    plural=(n==1) ? 0 : (n&#62;=2 &#38;&#38; n&#60;=4) ? 1 : 2;
1100*946379e7Schristos</PRE>
1101*946379e7Schristos
1102*946379e7SchristosLanguages with this property include:
1103*946379e7Schristos
1104*946379e7Schristos<DL COMPACT>
1105*946379e7Schristos
1106*946379e7Schristos<DT>Slavic family
1107*946379e7Schristos<DD>
1108*946379e7SchristosSlovak, Czech
1109*946379e7Schristos</DL>
1110*946379e7Schristos
1111*946379e7Schristos<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4
1112*946379e7Schristos<DD>
1113*946379e7SchristosThe header entry would look like this:
1114*946379e7Schristos
1115*946379e7Schristos
1116*946379e7Schristos<PRE>
1117*946379e7SchristosPlural-Forms: nplurals=3; \
1118*946379e7Schristos    plural=n==1 ? 0 : \
1119*946379e7Schristos           n%10&#62;=2 &#38;&#38; n%10&#60;=4 &#38;&#38; (n%100&#60;10 || n%100&#62;=20) ? 1 : 2;
1120*946379e7Schristos</PRE>
1121*946379e7Schristos
1122*946379e7SchristosLanguages with this property include:
1123*946379e7Schristos
1124*946379e7Schristos<DL COMPACT>
1125*946379e7Schristos
1126*946379e7Schristos<DT>Slavic family
1127*946379e7Schristos<DD>
1128*946379e7SchristosPolish
1129*946379e7Schristos</DL>
1130*946379e7Schristos
1131*946379e7Schristos<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04
1132*946379e7Schristos<DD>
1133*946379e7SchristosThe header entry would look like this:
1134*946379e7Schristos
1135*946379e7Schristos
1136*946379e7Schristos<PRE>
1137*946379e7SchristosPlural-Forms: nplurals=4; \
1138*946379e7Schristos    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
1139*946379e7Schristos</PRE>
1140*946379e7Schristos
1141*946379e7SchristosLanguages with this property include:
1142*946379e7Schristos
1143*946379e7Schristos<DL COMPACT>
1144*946379e7Schristos
1145*946379e7Schristos<DT>Slavic family
1146*946379e7Schristos<DD>
1147*946379e7SchristosSlovenian
1148*946379e7Schristos</DL>
1149*946379e7Schristos</DL>
1150*946379e7Schristos
1151*946379e7Schristos<P>
1152*946379e7SchristosYou might now ask, <CODE>ngettext</CODE> handles only numbers <VAR>n</VAR> of type
1153*946379e7Schristos<SAMP>&lsquo;unsigned long&rsquo;</SAMP>.  What about larger integer types?  What about negative
1154*946379e7Schristosnumbers?  What about floating-point numbers?
1155*946379e7Schristos
1156*946379e7Schristos</P>
1157*946379e7Schristos<P>
1158*946379e7SchristosAbout larger integer types, such as <SAMP>&lsquo;uintmax_t&rsquo;</SAMP> or
1159*946379e7Schristos<SAMP>&lsquo;unsigned long long&rsquo;</SAMP>: they can be handled by reducing the value to a
1160*946379e7Schristosrange that fits in an <SAMP>&lsquo;unsigned long&rsquo;</SAMP>.  Simply casting the value to
1161*946379e7Schristos<SAMP>&lsquo;unsigned long&rsquo;</SAMP> would not do the right thing, since it would treat
1162*946379e7Schristos<CODE>ULONG_MAX + 1</CODE> like zero, <CODE>ULONG_MAX + 2</CODE> like singular, and
1163*946379e7Schristosthe like.  Here you can exploit the fact that all mentioned plural form
1164*946379e7Schristosformulas eventually become periodic, with a period that is a divisor of 100
1165*946379e7Schristos(or 1000 or 1000000).  So, when you reduce a large value to another one in
1166*946379e7Schristosthe range [1000000, 1999999] that ends in the same 6 decimal digits, you
1167*946379e7Schristoscan assume that it will lead to the same plural form selection.  This code
1168*946379e7Schristosdoes this:
1169*946379e7Schristos
1170*946379e7Schristos</P>
1171*946379e7Schristos
1172*946379e7Schristos<PRE>
1173*946379e7Schristos#include &#60;inttypes.h&#62;
1174*946379e7Schristosuintmax_t nbytes = ...;
1175*946379e7Schristosprintf (ngettext ("The file has %"PRIuMAX" byte.",
1176*946379e7Schristos                  "The file has %"PRIuMAX" bytes.",
1177*946379e7Schristos                  (nbytes &#62; ULONG_MAX
1178*946379e7Schristos                   ? (nbytes % 1000000) + 1000000
1179*946379e7Schristos                   : nbytes)),
1180*946379e7Schristos        nbytes);
1181*946379e7Schristos</PRE>
1182*946379e7Schristos
1183*946379e7Schristos<P>
1184*946379e7SchristosNegative and floating-point values usually represent physical entities for
1185*946379e7Schristoswhich singular and plural don't clearly apply.  In such cases, there is no
1186*946379e7Schristosneed to use <CODE>ngettext</CODE>; a simple <CODE>gettext</CODE> call with a form suitable
1187*946379e7Schristosfor all values will do.  For example:
1188*946379e7Schristos
1189*946379e7Schristos</P>
1190*946379e7Schristos
1191*946379e7Schristos<PRE>
1192*946379e7Schristosprintf (gettext ("Time elapsed: %.3f seconds"),
1193*946379e7Schristos        num_milliseconds * 0.001);
1194*946379e7Schristos</PRE>
1195*946379e7Schristos
1196*946379e7Schristos<P>
1197*946379e7SchristosEven if <VAR>num_milliseconds</VAR> happens to be a multiple of 1000, the output
1198*946379e7Schristos
1199*946379e7Schristos<PRE>
1200*946379e7SchristosTime elapsed: 1.000 seconds
1201*946379e7Schristos</PRE>
1202*946379e7Schristos
1203*946379e7Schristos<P>
1204*946379e7Schristosis acceptable in English, and similarly for other languages.
1205*946379e7Schristos
1206*946379e7Schristos</P>
1207*946379e7Schristos
1208*946379e7Schristos
1209*946379e7Schristos<H3><A NAME="SEC175" HREF="gettext_toc.html#TOC175">11.2.7  Optimization of the *gettext functions</A></H3>
1210*946379e7Schristos<P>
1211*946379e7Schristos<A NAME="IDX1039"></A>
1212*946379e7Schristos
1213*946379e7Schristos</P>
1214*946379e7Schristos<P>
1215*946379e7SchristosAt this point of the discussion we should talk about an advantage of the
1216*946379e7SchristosGNU <CODE>gettext</CODE> implementation.  Some readers might have pointed out
1217*946379e7Schristosthat an internationalized program might have a poor performance if some
1218*946379e7Schristosstring has to be translated in an inner loop.  While this is unavoidable
1219*946379e7Schristoswhen the string varies from one run of the loop to the other it is
1220*946379e7Schristossimply a waste of time when the string is always the same.  Take the
1221*946379e7Schristosfollowing example:
1222*946379e7Schristos
1223*946379e7Schristos</P>
1224*946379e7Schristos
1225*946379e7Schristos<PRE>
1226*946379e7Schristos{
1227*946379e7Schristos  while (...)
1228*946379e7Schristos    {
1229*946379e7Schristos      puts (gettext ("Hello world"));
1230*946379e7Schristos    }
1231*946379e7Schristos}
1232*946379e7Schristos</PRE>
1233*946379e7Schristos
1234*946379e7Schristos<P>
1235*946379e7SchristosWhen the locale selection does not change between two runs the resulting
1236*946379e7Schristosstring is always the same.  One way to use this is:
1237*946379e7Schristos
1238*946379e7Schristos</P>
1239*946379e7Schristos
1240*946379e7Schristos<PRE>
1241*946379e7Schristos{
1242*946379e7Schristos  str = gettext ("Hello world");
1243*946379e7Schristos  while (...)
1244*946379e7Schristos    {
1245*946379e7Schristos      puts (str);
1246*946379e7Schristos    }
1247*946379e7Schristos}
1248*946379e7Schristos</PRE>
1249*946379e7Schristos
1250*946379e7Schristos<P>
1251*946379e7SchristosBut this solution is not usable in all situation (e.g. when the locale
1252*946379e7Schristosselection changes) nor does it lead to legible code.
1253*946379e7Schristos
1254*946379e7Schristos</P>
1255*946379e7Schristos<P>
1256*946379e7SchristosFor this reason, GNU <CODE>gettext</CODE> caches previous translation results.
1257*946379e7SchristosWhen the same translation is requested twice, with no new message
1258*946379e7Schristoscatalogs being loaded in between, <CODE>gettext</CODE> will, the second time,
1259*946379e7Schristosfind the result through a single cache lookup.
1260*946379e7Schristos
1261*946379e7Schristos</P>
1262*946379e7Schristos
1263*946379e7Schristos
1264*946379e7Schristos<H2><A NAME="SEC176" HREF="gettext_toc.html#TOC176">11.3  Comparing the Two Interfaces</A></H2>
1265*946379e7Schristos<P>
1266*946379e7Schristos<A NAME="IDX1040"></A>
1267*946379e7Schristos<A NAME="IDX1041"></A>
1268*946379e7Schristos
1269*946379e7Schristos</P>
1270*946379e7Schristos
1271*946379e7Schristos<P>
1272*946379e7SchristosThe following discussion is perhaps a little bit colored.  As said
1273*946379e7Schristosabove we implemented GNU <CODE>gettext</CODE> following the Uniforum
1274*946379e7Schristosproposal and this surely has its reasons.  But it should show how we
1275*946379e7Schristoscame to this decision.
1276*946379e7Schristos
1277*946379e7Schristos</P>
1278*946379e7Schristos<P>
1279*946379e7SchristosFirst we take a look at the developing process.  When we write an
1280*946379e7Schristosapplication using NLS provided by <CODE>gettext</CODE> we proceed as always.
1281*946379e7SchristosOnly when we come to a string which might be seen by the users and thus
1282*946379e7Schristoshas to be translated we use <CODE>gettext("...")</CODE> instead of
1283*946379e7Schristos<CODE>"..."</CODE>.  At the beginning of each source file (or in a central
1284*946379e7Schristosheader file) we define
1285*946379e7Schristos
1286*946379e7Schristos</P>
1287*946379e7Schristos
1288*946379e7Schristos<PRE>
1289*946379e7Schristos#define gettext(String) (String)
1290*946379e7Schristos</PRE>
1291*946379e7Schristos
1292*946379e7Schristos<P>
1293*946379e7SchristosEven this definition can be avoided when the system supports the
1294*946379e7Schristos<CODE>gettext</CODE> function in its C library.  When we compile this code the
1295*946379e7Schristosresult is the same as if no NLS code is used.  When  you take a look at
1296*946379e7Schristosthe GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>
1297*946379e7Schristosinstead of <CODE>gettext("...")</CODE>.  This reduces the number of
1298*946379e7Schristosadditional characters per translatable string to <EM>3</EM> (in words:
1299*946379e7Schristosthree).
1300*946379e7Schristos
1301*946379e7Schristos</P>
1302*946379e7Schristos<P>
1303*946379e7SchristosWhen now a production version of the program is needed we simply replace
1304*946379e7Schristosthe definition
1305*946379e7Schristos
1306*946379e7Schristos</P>
1307*946379e7Schristos
1308*946379e7Schristos<PRE>
1309*946379e7Schristos#define _(String) (String)
1310*946379e7Schristos</PRE>
1311*946379e7Schristos
1312*946379e7Schristos<P>
1313*946379e7Schristosby
1314*946379e7Schristos
1315*946379e7Schristos</P>
1316*946379e7Schristos<P>
1317*946379e7Schristos<A NAME="IDX1042"></A>
1318*946379e7Schristos
1319*946379e7Schristos<PRE>
1320*946379e7Schristos#include &#60;libintl.h&#62;
1321*946379e7Schristos#define _(String) gettext (String)
1322*946379e7Schristos</PRE>
1323*946379e7Schristos
1324*946379e7Schristos<P>
1325*946379e7SchristosAdditionally we run the program <TT>&lsquo;xgettext&rsquo;</TT> on all source code file
1326*946379e7Schristoswhich contain translatable strings and that's it: we have a running
1327*946379e7Schristosprogram which does not depend on translations to be available, but which
1328*946379e7Schristoscan use any that becomes available.
1329*946379e7Schristos
1330*946379e7Schristos</P>
1331*946379e7Schristos<P>
1332*946379e7Schristos<A NAME="IDX1043"></A>
1333*946379e7SchristosThe same procedure can be done for the <CODE>gettext_noop</CODE> invocations
1334*946379e7Schristos(see section <A HREF="gettext_4.html#SEC18">4.7  Special Cases of Translatable Strings</A>).  One usually defines <CODE>gettext_noop</CODE> as a
1335*946379e7Schristosno-op macro.  So you should consider the following code for your project:
1336*946379e7Schristos
1337*946379e7Schristos</P>
1338*946379e7Schristos
1339*946379e7Schristos<PRE>
1340*946379e7Schristos#define gettext_noop(String) String
1341*946379e7Schristos#define N_(String) gettext_noop (String)
1342*946379e7Schristos</PRE>
1343*946379e7Schristos
1344*946379e7Schristos<P>
1345*946379e7Schristos<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>.  The <TT>&lsquo;Makefile&rsquo;</TT> in
1346*946379e7Schristosthe <TT>&lsquo;po/&rsquo;</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the
1347*946379e7Schristosmentioned short forms so you are invited to follow this proposal for
1348*946379e7Schristosyour own ease.
1349*946379e7Schristos
1350*946379e7Schristos</P>
1351*946379e7Schristos<P>
1352*946379e7SchristosNow to <CODE>catgets</CODE>.  The main problem is the work for the
1353*946379e7Schristosprogrammer.  Every time he comes to a translatable string he has to
1354*946379e7Schristosdefine a number (or a symbolic constant) which has also be defined in
1355*946379e7Schristosthe message catalog file.  He also has to take care for duplicate
1356*946379e7Schristosentries, duplicate message IDs etc.  If he wants to have the same
1357*946379e7Schristosquality in the message catalog as the GNU <CODE>gettext</CODE> program
1358*946379e7Schristosprovides he also has to put the descriptive comments for the strings and
1359*946379e7Schristosthe location in all source code files in the message catalog.  This is
1360*946379e7Schristosnearly a Mission: Impossible.
1361*946379e7Schristos
1362*946379e7Schristos</P>
1363*946379e7Schristos<P>
1364*946379e7SchristosBut there are also some points people might call advantages speaking for
1365*946379e7Schristos<CODE>catgets</CODE>.  If you have a single word in a string and this string
1366*946379e7Schristosis used in different contexts it is likely that in one or the other
1367*946379e7Schristoslanguage the word has different translations.  Example:
1368*946379e7Schristos
1369*946379e7Schristos</P>
1370*946379e7Schristos
1371*946379e7Schristos<PRE>
1372*946379e7Schristosprintf ("%s: %d", gettext ("number"), number_of_errors)
1373*946379e7Schristos
1374*946379e7Schristosprintf ("you should see %d %s", number_count,
1375*946379e7Schristos        number_count == 1 ? gettext ("number") : gettext ("numbers"))
1376*946379e7Schristos</PRE>
1377*946379e7Schristos
1378*946379e7Schristos<P>
1379*946379e7SchristosHere we have to translate two times the string <CODE>"number"</CODE>.  Even
1380*946379e7Schristosif you do not speak a language beside English it might be possible to
1381*946379e7Schristosrecognize that the two words have a different meaning.  In German the
1382*946379e7Schristosfirst appearance has to be translated to <CODE>"Anzahl"</CODE> and the second
1383*946379e7Schristosto <CODE>"Zahl"</CODE>.
1384*946379e7Schristos
1385*946379e7Schristos</P>
1386*946379e7Schristos<P>
1387*946379e7SchristosNow you can say that this example is really esoteric.  And you are
1388*946379e7Schristosright!  This is exactly how we felt about this problem and decide that
1389*946379e7Schristosit does not weight that much.  The solution for the above problem could
1390*946379e7Schristosbe very easy:
1391*946379e7Schristos
1392*946379e7Schristos</P>
1393*946379e7Schristos
1394*946379e7Schristos<PRE>
1395*946379e7Schristosprintf ("%s %d", gettext ("number:"), number_of_errors)
1396*946379e7Schristos
1397*946379e7Schristosprintf (number_count == 1 ? gettext ("you should see %d number")
1398*946379e7Schristos                          : gettext ("you should see %d numbers"),
1399*946379e7Schristos        number_count)
1400*946379e7Schristos</PRE>
1401*946379e7Schristos
1402*946379e7Schristos<P>
1403*946379e7SchristosWe believe that we can solve all conflicts with this method.  If it is
1404*946379e7Schristosdifficult one can also consider changing one of the conflicting string a
1405*946379e7Schristoslittle bit.  But it is not impossible to overcome.
1406*946379e7Schristos
1407*946379e7Schristos</P>
1408*946379e7Schristos<P>
1409*946379e7Schristos<CODE>catgets</CODE> allows same original entry to have different translations,
1410*946379e7Schristosbut <CODE>gettext</CODE> has another, scalable approach for solving ambiguities
1411*946379e7Schristosof this kind: See section <A HREF="gettext_11.html#SEC170">11.2.2  Solving Ambiguities</A>.
1412*946379e7Schristos
1413*946379e7Schristos</P>
1414*946379e7Schristos
1415*946379e7Schristos
1416*946379e7Schristos<H2><A NAME="SEC177" HREF="gettext_toc.html#TOC177">11.4  Using libintl.a in own programs</A></H2>
1417*946379e7Schristos
1418*946379e7Schristos<P>
1419*946379e7SchristosStarting with version 0.9.4 the library <CODE>libintl.h</CODE> should be
1420*946379e7Schristosself-contained.  I.e., you can use it in your own programs without
1421*946379e7Schristosproviding additional functions.  The <TT>&lsquo;Makefile&rsquo;</TT> will put the header
1422*946379e7Schristosand the library in directories selected using the <CODE>$(prefix)</CODE>.
1423*946379e7Schristos
1424*946379e7Schristos</P>
1425*946379e7Schristos
1426*946379e7Schristos
1427*946379e7Schristos<H2><A NAME="SEC178" HREF="gettext_toc.html#TOC178">11.5  Being a <CODE>gettext</CODE> grok</A></H2>
1428*946379e7Schristos
1429*946379e7Schristos<P>
1430*946379e7Schristos<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be
1431*946379e7Schristosrevised.
1432*946379e7Schristos
1433*946379e7Schristos</P>
1434*946379e7Schristos<P>
1435*946379e7SchristosTo fully exploit the functionality of the GNU <CODE>gettext</CODE> library it
1436*946379e7Schristosis surely helpful to read the source code.  But for those who don't want
1437*946379e7Schristosto spend that much time in reading the (sometimes complicated) code here
1438*946379e7Schristosis a list comments:
1439*946379e7Schristos
1440*946379e7Schristos</P>
1441*946379e7Schristos
1442*946379e7Schristos<UL>
1443*946379e7Schristos<LI>Changing the language at runtime
1444*946379e7Schristos
1445*946379e7Schristos<A NAME="IDX1044"></A>
1446*946379e7Schristos
1447*946379e7SchristosFor interactive programs it might be useful to offer a selection of the
1448*946379e7Schristosused language at runtime.  To understand how to do this one need to know
1449*946379e7Schristoshow the used language is determined while executing the <CODE>gettext</CODE>
1450*946379e7Schristosfunction.  The method which is presented here only works correctly
1451*946379e7Schristoswith the GNU implementation of the <CODE>gettext</CODE> functions.
1452*946379e7Schristos
1453*946379e7SchristosIn the function <CODE>dcgettext</CODE> at every call the current setting of
1454*946379e7Schristosthe highest priority environment variable is determined and used.
1455*946379e7SchristosHighest priority means here the following list with decreasing
1456*946379e7Schristospriority:
1457*946379e7Schristos
1458*946379e7Schristos
1459*946379e7Schristos<OL>
1460*946379e7Schristos<LI><CODE>LANGUAGE</CODE>
1461*946379e7Schristos
1462*946379e7Schristos<A NAME="IDX1045"></A>
1463*946379e7Schristos
1464*946379e7Schristos<A NAME="IDX1046"></A>
1465*946379e7Schristos<LI><CODE>LC_ALL</CODE>
1466*946379e7Schristos
1467*946379e7Schristos<A NAME="IDX1047"></A>
1468*946379e7Schristos<A NAME="IDX1048"></A>
1469*946379e7Schristos<A NAME="IDX1049"></A>
1470*946379e7Schristos<A NAME="IDX1050"></A>
1471*946379e7Schristos<A NAME="IDX1051"></A>
1472*946379e7Schristos<A NAME="IDX1052"></A>
1473*946379e7Schristos<LI><CODE>LC_xxx</CODE>, according to selected locale
1474*946379e7Schristos
1475*946379e7Schristos<A NAME="IDX1053"></A>
1476*946379e7Schristos<LI><CODE>LANG</CODE>
1477*946379e7Schristos
1478*946379e7Schristos</OL>
1479*946379e7Schristos
1480*946379e7SchristosAfterwards the path is constructed using the found value and the
1481*946379e7Schristostranslation file is loaded if available.
1482*946379e7Schristos
1483*946379e7SchristosWhat happens now when the value for, say, <CODE>LANGUAGE</CODE> changes?  According
1484*946379e7Schristosto the process explained above the new value of this variable is found
1485*946379e7Schristosas soon as the <CODE>dcgettext</CODE> function is called.  But this also means
1486*946379e7Schristosthe (perhaps) different message catalog file is loaded.  In other
1487*946379e7Schristoswords: the used language is changed.
1488*946379e7Schristos
1489*946379e7SchristosBut there is one little hook.  The code for gcc-2.7.0 and up provides
1490*946379e7Schristossome optimization.  This optimization normally prevents the calling of
1491*946379e7Schristosthe <CODE>dcgettext</CODE> function as long as no new catalog is loaded.  But
1492*946379e7Schristosif <CODE>dcgettext</CODE> is not called the program also cannot find the
1493*946379e7Schristos<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_11.html#SEC175">11.2.7  Optimization of the *gettext functions</A>).  A
1494*946379e7Schristossolution for this is very easy.  Include the following code in the
1495*946379e7Schristoslanguage switching function.
1496*946379e7Schristos
1497*946379e7Schristos
1498*946379e7Schristos<PRE>
1499*946379e7Schristos  /* Change language.  */
1500*946379e7Schristos  setenv ("LANGUAGE", "fr", 1);
1501*946379e7Schristos
1502*946379e7Schristos  /* Make change known.  */
1503*946379e7Schristos  {
1504*946379e7Schristos    extern int  _nl_msg_cat_cntr;
1505*946379e7Schristos    ++_nl_msg_cat_cntr;
1506*946379e7Schristos  }
1507*946379e7Schristos</PRE>
1508*946379e7Schristos
1509*946379e7Schristos<A NAME="IDX1054"></A>
1510*946379e7SchristosThe variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>&lsquo;loadmsgcat.c&rsquo;</TT>.
1511*946379e7SchristosYou don't need to know what this is for.  But it can be used to detect
1512*946379e7Schristoswhether a <CODE>gettext</CODE> implementation is GNU gettext and not non-GNU
1513*946379e7Schristossystem's native gettext implementation.
1514*946379e7Schristos
1515*946379e7Schristos</UL>
1516*946379e7Schristos
1517*946379e7Schristos
1518*946379e7Schristos
1519*946379e7Schristos<H2><A NAME="SEC179" HREF="gettext_toc.html#TOC179">11.6  Temporary Notes for the Programmers Chapter</A></H2>
1520*946379e7Schristos
1521*946379e7Schristos<P>
1522*946379e7Schristos<STRONG> NOTE: </STRONG> This documentation section is outdated and needs to be
1523*946379e7Schristosrevised.
1524*946379e7Schristos
1525*946379e7Schristos</P>
1526*946379e7Schristos
1527*946379e7Schristos
1528*946379e7Schristos
1529*946379e7Schristos<H3><A NAME="SEC180" HREF="gettext_toc.html#TOC180">11.6.1  Temporary - Two Possible Implementations</A></H3>
1530*946379e7Schristos
1531*946379e7Schristos<P>
1532*946379e7SchristosThere are two competing methods for language independent messages:
1533*946379e7Schristosthe X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>
1534*946379e7Schristosmethod.  The <CODE>catgets</CODE> method indexes messages by integers; the
1535*946379e7Schristos<CODE>gettext</CODE> method indexes them by their English translations.
1536*946379e7SchristosThe <CODE>catgets</CODE> method has been around longer and is supported
1537*946379e7Schristosby more vendors.  The <CODE>gettext</CODE> method is supported by Sun,
1538*946379e7Schristosand it has been heard that the COSE multi-vendor initiative is
1539*946379e7Schristossupporting it.  Neither method is a POSIX standard; the POSIX.1
1540*946379e7Schristoscommittee had a lot of disagreement in this area.
1541*946379e7Schristos
1542*946379e7Schristos</P>
1543*946379e7Schristos<P>
1544*946379e7SchristosNeither one is in the POSIX standard.  There was much disagreement
1545*946379e7Schristosin the POSIX.1 committee about using the <CODE>gettext</CODE> routines
1546*946379e7Schristosvs. <CODE>catgets</CODE> (XPG).  In the end the committee couldn't
1547*946379e7Schristosagree on anything, so no messaging system was included as part
1548*946379e7Schristosof the standard.  I believe the informative annex of the standard
1549*946379e7Schristosincludes the XPG3 messaging interfaces, “...as an example of
1550*946379e7Schristosa messaging system that has been implemented...”
1551*946379e7Schristos
1552*946379e7Schristos</P>
1553*946379e7Schristos<P>
1554*946379e7SchristosThey were very careful not to say anywhere that you should use one
1555*946379e7Schristosset of interfaces over the other.  For more on this topic please
1556*946379e7Schristossee the Programming for Internationalization FAQ.
1557*946379e7Schristos
1558*946379e7Schristos</P>
1559*946379e7Schristos
1560*946379e7Schristos
1561*946379e7Schristos<H3><A NAME="SEC181" HREF="gettext_toc.html#TOC181">11.6.2  Temporary - About <CODE>catgets</CODE></A></H3>
1562*946379e7Schristos
1563*946379e7Schristos<P>
1564*946379e7SchristosThere have been a few discussions of late on the use of
1565*946379e7Schristos<CODE>catgets</CODE> as a base.  I think it important to present both
1566*946379e7Schristossides of the argument and hence am opting to play devil's advocate
1567*946379e7Schristosfor a little bit.
1568*946379e7Schristos
1569*946379e7Schristos</P>
1570*946379e7Schristos<P>
1571*946379e7SchristosI'll not deny the fact that <CODE>catgets</CODE> could have been designed
1572*946379e7Schristosa lot better.  It currently has quite a number of limitations and
1573*946379e7Schristosthese have already been pointed out.
1574*946379e7Schristos
1575*946379e7Schristos</P>
1576*946379e7Schristos<P>
1577*946379e7SchristosHowever there is a great deal to be said for consistency and
1578*946379e7Schristosstandardization.  A common recurring problem when writing Unix
1579*946379e7Schristossoftware is the myriad portability problems across Unix platforms.
1580*946379e7SchristosIt seems as if every Unix vendor had a look at the operating system
1581*946379e7Schristosand found parts they could improve upon.  Undoubtedly, these
1582*946379e7Schristosmodifications are probably innovative and solve real problems.
1583*946379e7SchristosHowever, software developers have a hard time keeping up with all
1584*946379e7Schristosthese changes across so many platforms.
1585*946379e7Schristos
1586*946379e7Schristos</P>
1587*946379e7Schristos<P>
1588*946379e7SchristosAnd this has prompted the Unix vendors to begin to standardize their
1589*946379e7Schristossystems.  Hence the impetus for Spec1170.  Every major Unix vendor
1590*946379e7Schristoshas committed to supporting this standard and every Unix software
1591*946379e7Schristosdeveloper waits with glee the day they can write software to this
1592*946379e7Schristosstandard and simply recompile (without having to use autoconf)
1593*946379e7Schristosacross different platforms.
1594*946379e7Schristos
1595*946379e7Schristos</P>
1596*946379e7Schristos<P>
1597*946379e7SchristosAs I understand it, Spec1170 is roughly based upon version 4 of the
1598*946379e7SchristosX/Open Portability Guidelines (XPG4).  Because <CODE>catgets</CODE> and
1599*946379e7Schristosfriends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>
1600*946379e7Schristosis a part of Spec1170 and hence will become a standardized component
1601*946379e7Schristosof all Unix systems.
1602*946379e7Schristos
1603*946379e7Schristos</P>
1604*946379e7Schristos
1605*946379e7Schristos
1606*946379e7Schristos<H3><A NAME="SEC182" HREF="gettext_toc.html#TOC182">11.6.3  Temporary - Why a single implementation</A></H3>
1607*946379e7Schristos
1608*946379e7Schristos<P>
1609*946379e7SchristosNow it seems kind of wasteful to me to have two different systems
1610*946379e7Schristosinstalled for accessing message catalogs.  If we do want to remedy
1611*946379e7Schristos<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>
1612*946379e7Schristos(in a compatible manner) rather than implement an entirely new system.
1613*946379e7SchristosOtherwise, we'll end up with two message catalog access systems installed
1614*946379e7Schristoswith an operating system - one set of routines for packages using GNU
1615*946379e7Schristos<CODE>gettext</CODE> for their internationalization, and another set of routines
1616*946379e7Schristos(catgets) for all other software.  Bloated?
1617*946379e7Schristos
1618*946379e7Schristos</P>
1619*946379e7Schristos<P>
1620*946379e7SchristosSupposing another catalog access system is implemented.  Which do
1621*946379e7Schristoswe recommend?  At least for Linux, we need to attract as many
1622*946379e7Schristossoftware developers as possible.  Hence we need to make it as easy
1623*946379e7Schristosfor them to port their software as possible.  Which means supporting
1624*946379e7Schristos<CODE>catgets</CODE>.  We will be implementing the <CODE>libintl</CODE> code
1625*946379e7Schristoswithin our <CODE>libc</CODE>, but does this mean we also have to incorporate
1626*946379e7Schristosanother message catalog access scheme within our <CODE>libc</CODE> as well?
1627*946379e7SchristosAnd what about people who are going to be using the <CODE>libintl</CODE>
1628*946379e7Schristos+ non-<CODE>catgets</CODE> routines.  When they port their software to
1629*946379e7Schristosother platforms, they're now going to have to include the front-end
1630*946379e7Schristos(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>
1631*946379e7Schristosaccess routines) with their software instead of just including the
1632*946379e7Schristos<CODE>libintl</CODE> code with their software.
1633*946379e7Schristos
1634*946379e7Schristos</P>
1635*946379e7Schristos<P>
1636*946379e7SchristosMessage catalog support is however only the tip of the iceberg.
1637*946379e7SchristosWhat about the data for the other locale categories.  They also have
1638*946379e7Schristosa number of deficiencies.  Are we going to abandon them as well and
1639*946379e7Schristosdevelop another duplicate set of routines (should <CODE>libintl</CODE>
1640*946379e7Schristosexpand beyond message catalog support)?
1641*946379e7Schristos
1642*946379e7Schristos</P>
1643*946379e7Schristos<P>
1644*946379e7SchristosLike many parts of Unix that can be improved upon, we're stuck with balancing
1645*946379e7Schristoscompatibility with the past with useful improvements and innovations for
1646*946379e7Schristosthe future.
1647*946379e7Schristos
1648*946379e7Schristos</P>
1649*946379e7Schristos
1650*946379e7Schristos
1651*946379e7Schristos<H3><A NAME="SEC183" HREF="gettext_toc.html#TOC183">11.6.4  Temporary - Notes</A></H3>
1652*946379e7Schristos
1653*946379e7Schristos<P>
1654*946379e7SchristosX/Open agreed very late on the standard form so that many
1655*946379e7Schristosimplementations differ from the final form.  Both of my system (old
1656*946379e7SchristosLinux catgets and Ultrix-4) have a strange variation.
1657*946379e7Schristos
1658*946379e7Schristos</P>
1659*946379e7Schristos<P>
1660*946379e7SchristosOK.  After incorporating the last changes I have to spend some time on
1661*946379e7Schristosmaking the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions.  So in future
1662*946379e7SchristosSolaris is not the only system having <CODE>gettext</CODE>.
1663*946379e7Schristos
1664*946379e7Schristos</P>
1665*946379e7Schristos<P><HR><P>
1666*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
1667*946379e7Schristos</BODY>
1668*946379e7Schristos</HTML>
1669