xref: /netbsd-src/external/gpl2/gettext/dist/gettext-tools/doc/gettext_1.html (revision 946379e7b37692fc43f68eb0d1c10daa0a7f3b6c)
1*946379e7Schristos<HTML>
2*946379e7Schristos<HEAD>
3*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b
4*946379e7Schristos     from gettext.texi on 27 November 2006 -->
5*946379e7Schristos
6*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
7*946379e7Schristos<TITLE>GNU gettext utilities - 1  Introduction</TITLE>
8*946379e7Schristos</HEAD>
9*946379e7Schristos<BODY>
10*946379e7SchristosGo to the first, previous, <A HREF="gettext_2.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
11*946379e7Schristos<P><HR><P>
12*946379e7Schristos
13*946379e7Schristos
14*946379e7Schristos
15*946379e7Schristos<H1><A NAME="SEC1" HREF="gettext_toc.html#TOC1">1  Introduction</A></H1>
16*946379e7Schristos
17*946379e7Schristos<P>
18*946379e7SchristosThis chapter explains the goals sought in the creation
19*946379e7Schristosof GNU <CODE>gettext</CODE> and the free Translation Project.
20*946379e7SchristosThen, it explains a few broad concepts around
21*946379e7SchristosNative Language Support, and positions message translation with regard
22*946379e7Schristosto other aspects of national and cultural variance, as they apply
23*946379e7Schristosto programs.  It also surveys those files used to convey the
24*946379e7Schristostranslations.  It explains how the various tools interact in the
25*946379e7Schristosinitial generation of these files, and later, how the maintenance
26*946379e7Schristoscycle should usually operate.
27*946379e7Schristos
28*946379e7Schristos</P>
29*946379e7Schristos<P>
30*946379e7Schristos<A NAME="IDX1"></A>
31*946379e7Schristos<A NAME="IDX2"></A>
32*946379e7Schristos<A NAME="IDX3"></A>
33*946379e7SchristosIn this manual, we use <EM>he</EM> when speaking of the programmer or
34*946379e7Schristosmaintainer, <EM>she</EM> when speaking of the translator, and <EM>they</EM>
35*946379e7Schristoswhen speaking of the installers or end users of the translated program.
36*946379e7SchristosThis is only a convenience for clarifying the documentation.  It is
37*946379e7Schristos<EM>absolutely</EM> not meant to imply that some roles are more appropriate
38*946379e7Schristosto males or females.  Besides, as you might guess, GNU <CODE>gettext</CODE>
39*946379e7Schristosis meant to be useful for people using computers, whatever their sex,
40*946379e7Schristosrace, religion or nationality!
41*946379e7Schristos
42*946379e7Schristos</P>
43*946379e7Schristos<P>
44*946379e7Schristos<A NAME="IDX4"></A>
45*946379e7SchristosPlease send suggestions and corrections to:
46*946379e7Schristos
47*946379e7Schristos</P>
48*946379e7Schristos
49*946379e7Schristos<PRE>
50*946379e7SchristosInternet address:
51*946379e7Schristos    bug-gnu-gettext@gnu.org
52*946379e7Schristos</PRE>
53*946379e7Schristos
54*946379e7Schristos<P>
55*946379e7SchristosPlease include the manual's edition number and update date in your messages.
56*946379e7Schristos
57*946379e7Schristos</P>
58*946379e7Schristos
59*946379e7Schristos
60*946379e7Schristos
61*946379e7Schristos<H2><A NAME="SEC2" HREF="gettext_toc.html#TOC2">1.1  The Purpose of GNU <CODE>gettext</CODE></A></H2>
62*946379e7Schristos
63*946379e7Schristos<P>
64*946379e7SchristosUsually, programs are written and documented in English, and use
65*946379e7SchristosEnglish at execution time to interact with users.  This is true
66*946379e7Schristosnot only of GNU software, but also of a great deal of commercial
67*946379e7Schristosand free software.  Using a common language is quite handy for
68*946379e7Schristoscommunication between developers, maintainers and users from all
69*946379e7Schristoscountries.  On the other hand, most people are less comfortable with
70*946379e7SchristosEnglish than with their own native language, and would prefer to
71*946379e7Schristosuse their mother tongue for day to day's work, as far as possible.
72*946379e7SchristosMany would simply <EM>love</EM> to see their computer screen showing
73*946379e7Schristosa lot less of English, and far more of their own language.
74*946379e7Schristos
75*946379e7Schristos</P>
76*946379e7Schristos<P>
77*946379e7Schristos<A NAME="IDX5"></A>
78*946379e7SchristosHowever, to many people, this dream might appear so far fetched that
79*946379e7Schristosthey may believe it is not even worth spending time thinking about
80*946379e7Schristosit.  They have no confidence at all that the dream might ever
81*946379e7Schristosbecome true.  Yet some have not lost hope, and have organized themselves.
82*946379e7SchristosThe Translation Project is a formalization of this hope into a
83*946379e7Schristosworkable structure, which has a good chance to get all of us nearer
84*946379e7Schristosthe achievement of a truly multi-lingual set of programs.
85*946379e7Schristos
86*946379e7Schristos</P>
87*946379e7Schristos<P>
88*946379e7SchristosGNU <CODE>gettext</CODE> is an important step for the Translation Project,
89*946379e7Schristosas it is an asset on which we may build many other steps.  This package
90*946379e7Schristosoffers to programmers, translators and even users, a well integrated
91*946379e7Schristosset of tools and documentation.  Specifically, the GNU <CODE>gettext</CODE>
92*946379e7Schristosutilities are a set of tools that provides a framework within which
93*946379e7Schristosother free packages may produce multi-lingual messages.  These tools
94*946379e7Schristosinclude
95*946379e7Schristos
96*946379e7Schristos</P>
97*946379e7Schristos
98*946379e7Schristos<UL>
99*946379e7Schristos<LI>
100*946379e7Schristos
101*946379e7SchristosA set of conventions about how programs should be written to support
102*946379e7Schristosmessage catalogs.
103*946379e7Schristos
104*946379e7Schristos<LI>
105*946379e7Schristos
106*946379e7SchristosA directory and file naming organization for the message catalogs
107*946379e7Schristosthemselves.
108*946379e7Schristos
109*946379e7Schristos<LI>
110*946379e7Schristos
111*946379e7SchristosA runtime library supporting the retrieval of translated messages.
112*946379e7Schristos
113*946379e7Schristos<LI>
114*946379e7Schristos
115*946379e7SchristosA few stand-alone programs to massage in various ways the sets of
116*946379e7Schristostranslatable strings, or already translated strings.
117*946379e7Schristos
118*946379e7Schristos<LI>
119*946379e7Schristos
120*946379e7SchristosA library supporting the parsing and creation of files containing
121*946379e7Schristostranslated messages.
122*946379e7Schristos
123*946379e7Schristos<LI>
124*946379e7Schristos
125*946379e7SchristosA special mode for Emacs<A NAME="DOCF1" HREF="gettext_foot.html#FOOT1">(1)</A> which helps preparing these sets
126*946379e7Schristosand bringing them up to date.
127*946379e7Schristos</UL>
128*946379e7Schristos
129*946379e7Schristos<P>
130*946379e7SchristosGNU <CODE>gettext</CODE> is designed to minimize the impact of
131*946379e7Schristosinternationalization on program sources, keeping this impact as small
132*946379e7Schristosand hardly noticeable as possible.  Internationalization has better
133*946379e7Schristoschances of succeeding if it is very light weighted, or at least,
134*946379e7Schristosappear to be so, when looking at program sources.
135*946379e7Schristos
136*946379e7Schristos</P>
137*946379e7Schristos<P>
138*946379e7SchristosThe Translation Project also uses the GNU <CODE>gettext</CODE> distribution
139*946379e7Schristosas a vehicle for documenting its structure and methods.  This goes
140*946379e7Schristosbeyond the strict technicalities of documenting the GNU <CODE>gettext</CODE>
141*946379e7Schristosproper.  By so doing, translators will find in a single place, as
142*946379e7Schristosfar as possible, all they need to know for properly doing their
143*946379e7Schristostranslating work.  Also, this supplemental documentation might also
144*946379e7Schristoshelp programmers, and even curious users, in understanding how GNU
145*946379e7Schristos<CODE>gettext</CODE> is related to the remainder of the Translation
146*946379e7SchristosProject, and consequently, have a glimpse at the <EM>big picture</EM>.
147*946379e7Schristos
148*946379e7Schristos</P>
149*946379e7Schristos
150*946379e7Schristos
151*946379e7Schristos<H2><A NAME="SEC3" HREF="gettext_toc.html#TOC3">1.2  I18n, L10n, and Such</A></H2>
152*946379e7Schristos
153*946379e7Schristos<P>
154*946379e7Schristos<A NAME="IDX6"></A>
155*946379e7Schristos<A NAME="IDX7"></A>
156*946379e7SchristosTwo long words appear all the time when we discuss support of native
157*946379e7Schristoslanguage in programs, and these words have a precise meaning, worth
158*946379e7Schristosbeing explained here, once and for all in this document.  The words are
159*946379e7Schristos<EM>internationalization</EM> and <EM>localization</EM>.  Many people,
160*946379e7Schristostired of writing these long words over and over again, took the
161*946379e7Schristoshabit of writing <EM>i18n</EM> and <EM>l10n</EM> instead, quoting the first
162*946379e7Schristosand last letter of each word, and replacing the run of intermediate
163*946379e7Schristosletters by a number merely telling how many such letters there are.
164*946379e7SchristosBut in this manual, in the sake of clarity, we will patiently write
165*946379e7Schristosthe names in full, each time...
166*946379e7Schristos
167*946379e7Schristos</P>
168*946379e7Schristos<P>
169*946379e7Schristos<A NAME="IDX8"></A>
170*946379e7SchristosBy <EM>internationalization</EM>, one refers to the operation by which a
171*946379e7Schristosprogram, or a set of programs turned into a package, is made aware of and
172*946379e7Schristosable to support multiple languages.  This is a generalization process,
173*946379e7Schristosby which the programs are untied from calling only English strings or
174*946379e7Schristosother English specific habits, and connected to generic ways of doing
175*946379e7Schristosthe same, instead.  Program developers may use various techniques to
176*946379e7Schristosinternationalize their programs.  Some of these have been standardized.
177*946379e7SchristosGNU <CODE>gettext</CODE> offers one of these standards.  See section <A HREF="gettext_11.html#SEC164">11  The Programmer's View</A>.
178*946379e7Schristos
179*946379e7Schristos</P>
180*946379e7Schristos<P>
181*946379e7Schristos<A NAME="IDX9"></A>
182*946379e7SchristosBy <EM>localization</EM>, one means the operation by which, in a set
183*946379e7Schristosof programs already internationalized, one gives the program all
184*946379e7Schristosneeded information so that it can adapt itself to handle its input
185*946379e7Schristosand output in a fashion which is correct for some native language and
186*946379e7Schristoscultural habits.  This is a particularisation process, by which generic
187*946379e7Schristosmethods already implemented in an internationalized program are used
188*946379e7Schristosin specific ways.  The programming environment puts several functions
189*946379e7Schristosto the programmers disposal which allow this runtime configuration.
190*946379e7SchristosThe formal description of specific set of cultural habits for some
191*946379e7Schristoscountry, together with all associated translations targeted to the
192*946379e7Schristossame native language, is called the <EM>locale</EM> for this language
193*946379e7Schristosor country.  Users achieve localization of programs by setting proper
194*946379e7Schristosvalues to special environment variables, prior to executing those
195*946379e7Schristosprograms, identifying which locale should be used.
196*946379e7Schristos
197*946379e7Schristos</P>
198*946379e7Schristos<P>
199*946379e7SchristosIn fact, locale message support is only one component of the cultural
200*946379e7Schristosdata that makes up a particular locale.  There are a whole host of
201*946379e7Schristosroutines and functions provided to aid programmers in developing
202*946379e7Schristosinternationalized software and which allow them to access the data
203*946379e7Schristosstored in a particular locale.  When someone presently refers to a
204*946379e7Schristosparticular locale, they are obviously referring to the data stored
205*946379e7Schristoswithin that particular locale.  Similarly, if a programmer is referring
206*946379e7Schristosto “accessing the locale routines”, they are referring to the
207*946379e7Schristoscomplete suite of routines that access all of the locale's information.
208*946379e7Schristos
209*946379e7Schristos</P>
210*946379e7Schristos<P>
211*946379e7Schristos<A NAME="IDX10"></A>
212*946379e7Schristos<A NAME="IDX11"></A>
213*946379e7Schristos<A NAME="IDX12"></A>
214*946379e7SchristosOne uses the expression <EM>Native Language Support</EM>, or merely NLS,
215*946379e7Schristosfor speaking of the overall activity or feature encompassing both
216*946379e7Schristosinternationalization and localization, allowing for multi-lingual
217*946379e7Schristosinteractions in a program.  In a nutshell, one could say that
218*946379e7Schristosinternationalization is the operation by which further localizations
219*946379e7Schristosare made possible.
220*946379e7Schristos
221*946379e7Schristos</P>
222*946379e7Schristos<P>
223*946379e7SchristosAlso, very roughly said, when it comes to multi-lingual messages,
224*946379e7Schristosinternationalization is usually taken care of by programmers, and
225*946379e7Schristoslocalization is usually taken care of by translators.
226*946379e7Schristos
227*946379e7Schristos</P>
228*946379e7Schristos
229*946379e7Schristos
230*946379e7Schristos<H2><A NAME="SEC4" HREF="gettext_toc.html#TOC4">1.3  Aspects in Native Language Support</A></H2>
231*946379e7Schristos
232*946379e7Schristos<P>
233*946379e7Schristos<A NAME="IDX13"></A>
234*946379e7SchristosFor a totally multi-lingual distribution, there are many things to
235*946379e7Schristostranslate beyond output messages.
236*946379e7Schristos
237*946379e7Schristos</P>
238*946379e7Schristos
239*946379e7Schristos<UL>
240*946379e7Schristos<LI>
241*946379e7Schristos
242*946379e7SchristosAs of today, GNU <CODE>gettext</CODE> offers a complete toolset for
243*946379e7Schristostranslating messages output by C programs.  Perl scripts and shell
244*946379e7Schristosscripts will also need to be translated.  Even if there are today some hooks
245*946379e7Schristosby which this can be done, these hooks are not integrated as well as they
246*946379e7Schristosshould be.
247*946379e7Schristos
248*946379e7Schristos<LI>
249*946379e7Schristos
250*946379e7SchristosSome programs, like <CODE>autoconf</CODE> or <CODE>bison</CODE>, are able
251*946379e7Schristosto produce other programs (or scripts).  Even if the generating
252*946379e7Schristosprograms themselves are internationalized, the generated programs they
253*946379e7Schristosproduce may need internationalization on their own, and this indirect
254*946379e7Schristosinternationalization could be automated right from the generating
255*946379e7Schristosprogram.  In fact, quite usually, generating and generated programs
256*946379e7Schristoscould be internationalized independently, as the effort needed is
257*946379e7Schristosfairly orthogonal.
258*946379e7Schristos
259*946379e7Schristos<LI>
260*946379e7Schristos
261*946379e7SchristosA few programs include textual tables which might need translation
262*946379e7Schristosthemselves, independently of the strings contained in the program
263*946379e7Schristositself.  For example, RFC 1345 gives an English description for each
264*946379e7Schristoscharacter which the <CODE>recode</CODE> program is able to reconstruct at execution.
265*946379e7SchristosSince these descriptions are extracted from the RFC by mechanical means,
266*946379e7Schristostranslating them properly would require a prior translation of the RFC
267*946379e7Schristositself.
268*946379e7Schristos
269*946379e7Schristos<LI>
270*946379e7Schristos
271*946379e7SchristosAlmost all programs accept options, which are often worded out so to
272*946379e7Schristosbe descriptive for the English readers; one might want to consider
273*946379e7Schristosoffering translated versions for program options as well.
274*946379e7Schristos
275*946379e7Schristos<LI>
276*946379e7Schristos
277*946379e7SchristosMany programs read, interpret, compile, or are somewhat driven by
278*946379e7Schristosinput files which are texts containing keywords, identifiers, or
279*946379e7Schristosreplies which are inherently translatable.  For example, one may want
280*946379e7Schristos<CODE>gcc</CODE> to allow diacriticized characters in identifiers or use
281*946379e7Schristostranslated keywords; <SAMP>&lsquo;rm -i&rsquo;</SAMP> might accept something else than
282*946379e7Schristos<SAMP>&lsquo;y&rsquo;</SAMP> or <SAMP>&lsquo;n&rsquo;</SAMP> for replies, etc.  Even if the program will
283*946379e7Schristoseventually make most of its output in the foreign languages, one has
284*946379e7Schristosto decide whether the input syntax, option values, etc., are to be
285*946379e7Schristoslocalized or not.
286*946379e7Schristos
287*946379e7Schristos<LI>
288*946379e7Schristos
289*946379e7SchristosThe manual accompanying a package, as well as all documentation files
290*946379e7Schristosin the distribution, could surely be translated, too.  Translating a
291*946379e7Schristosmanual, with the intent of later keeping up with updates, is a major
292*946379e7Schristosundertaking in itself, generally.
293*946379e7Schristos
294*946379e7Schristos</UL>
295*946379e7Schristos
296*946379e7Schristos<P>
297*946379e7SchristosAs we already stressed, translation is only one aspect of locales.
298*946379e7SchristosOther internationalization aspects are system services and are handled
299*946379e7Schristosin GNU <CODE>libc</CODE>.  There
300*946379e7Schristosare many attributes that are needed to define a country's cultural
301*946379e7Schristosconventions.  These attributes include beside the country's native
302*946379e7Schristoslanguage, the formatting of the date and time, the representation of
303*946379e7Schristosnumbers, the symbols for currency, etc.  These local <EM>rules</EM> are
304*946379e7Schristostermed the country's locale.  The locale represents the knowledge
305*946379e7Schristosneeded to support the country's native attributes.
306*946379e7Schristos
307*946379e7Schristos</P>
308*946379e7Schristos<P>
309*946379e7Schristos<A NAME="IDX14"></A>
310*946379e7SchristosThere are a few major areas which may vary between countries and
311*946379e7Schristoshence, define what a locale must describe.  The following list helps
312*946379e7Schristosputting multi-lingual messages into the proper context of other tasks
313*946379e7Schristosrelated to locales.  See the GNU <CODE>libc</CODE> manual for details.
314*946379e7Schristos
315*946379e7Schristos</P>
316*946379e7Schristos<DL COMPACT>
317*946379e7Schristos
318*946379e7Schristos<DT><EM>Characters and Codesets</EM>
319*946379e7Schristos<DD>
320*946379e7Schristos<A NAME="IDX15"></A>
321*946379e7Schristos<A NAME="IDX16"></A>
322*946379e7Schristos<A NAME="IDX17"></A>
323*946379e7Schristos<A NAME="IDX18"></A>
324*946379e7Schristos
325*946379e7SchristosThe codeset most commonly used through out the USA and most English
326*946379e7Schristosspeaking parts of the world is the ASCII codeset.  However, there are
327*946379e7Schristosmany characters needed by various locales that are not found within
328*946379e7Schristosthis codeset.  The 8-bit ISO 8859-1 code set has most of the special
329*946379e7Schristoscharacters needed to handle the major European languages.  However, in
330*946379e7Schristosmany cases, choosing ISO 8859-1 is nevertheless not adequate: it
331*946379e7Schristosdoesn't even handle the major European currency.  Hence each locale
332*946379e7Schristoswill need to specify which codeset they need to use and will need
333*946379e7Schristosto have the appropriate character handling routines to cope with
334*946379e7Schristosthe codeset.
335*946379e7Schristos
336*946379e7Schristos<DT><EM>Currency</EM>
337*946379e7Schristos<DD>
338*946379e7Schristos<A NAME="IDX19"></A>
339*946379e7Schristos<A NAME="IDX20"></A>
340*946379e7Schristos
341*946379e7SchristosThe symbols used vary from country to country as does the position
342*946379e7Schristosused by the symbol.  Software needs to be able to transparently
343*946379e7Schristosdisplay currency figures in the native mode for each locale.
344*946379e7Schristos
345*946379e7Schristos<DT><EM>Dates</EM>
346*946379e7Schristos<DD>
347*946379e7Schristos<A NAME="IDX21"></A>
348*946379e7Schristos<A NAME="IDX22"></A>
349*946379e7Schristos
350*946379e7SchristosThe format of date varies between locales.  For example, Christmas day
351*946379e7Schristosin 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
352*946379e7SchristosOther countries might use ISO 8601 dates, etc.
353*946379e7Schristos
354*946379e7SchristosTime of the day may be noted as <VAR>hh</VAR>:<VAR>mm</VAR>, <VAR>hh</VAR>.<VAR>mm</VAR>,
355*946379e7Schristosor otherwise.  Some locales require time to be specified in 24-hour
356*946379e7Schristosmode rather than as AM or PM.  Further, the nature and yearly extent
357*946379e7Schristosof the Daylight Saving correction vary widely between countries.
358*946379e7Schristos
359*946379e7Schristos<DT><EM>Numbers</EM>
360*946379e7Schristos<DD>
361*946379e7Schristos<A NAME="IDX23"></A>
362*946379e7Schristos<A NAME="IDX24"></A>
363*946379e7Schristos
364*946379e7SchristosNumbers can be represented differently in different locales.
365*946379e7SchristosFor example, the following numbers are all written correctly for
366*946379e7Schristostheir respective locales:
367*946379e7Schristos
368*946379e7Schristos
369*946379e7Schristos<PRE>
370*946379e7Schristos12,345.67       English
371*946379e7Schristos12.345,67       German
372*946379e7Schristos 12345,67       French
373*946379e7Schristos1,2345.67       Asia
374*946379e7Schristos</PRE>
375*946379e7Schristos
376*946379e7SchristosSome programs could go further and use different unit systems, like
377*946379e7SchristosEnglish units or Metric units, or even take into account variants
378*946379e7Schristosabout how numbers are spelled in full.
379*946379e7Schristos
380*946379e7Schristos<DT><EM>Messages</EM>
381*946379e7Schristos<DD>
382*946379e7Schristos<A NAME="IDX25"></A>
383*946379e7Schristos<A NAME="IDX26"></A>
384*946379e7Schristos
385*946379e7SchristosThe most obvious area is the language support within a locale.  This is
386*946379e7Schristoswhere GNU <CODE>gettext</CODE> provides the means for developers and users to
387*946379e7Schristoseasily change the language that the software uses to communicate to
388*946379e7Schristosthe user.
389*946379e7Schristos
390*946379e7Schristos</DL>
391*946379e7Schristos
392*946379e7Schristos<P>
393*946379e7Schristos<A NAME="IDX27"></A>
394*946379e7SchristosComponents of locale outside of message handling are standardized in
395*946379e7Schristosthe ISO C standard and the SUSV2 specification.  GNU <CODE>libc</CODE>
396*946379e7Schristosfully implements this, and most other modern systems provide a more
397*946379e7Schristosor less reasonable support for at least some of the missing components.
398*946379e7Schristos
399*946379e7Schristos</P>
400*946379e7Schristos
401*946379e7Schristos
402*946379e7Schristos<H2><A NAME="SEC5" HREF="gettext_toc.html#TOC5">1.4  Files Conveying Translations</A></H2>
403*946379e7Schristos
404*946379e7Schristos<P>
405*946379e7Schristos<A NAME="IDX28"></A>
406*946379e7SchristosThe letters PO in <TT>&lsquo;.po&rsquo;</TT> files means Portable Object, to
407*946379e7Schristosdistinguish it from <TT>&lsquo;.mo&rsquo;</TT> files, where MO stands for Machine
408*946379e7SchristosObject.  This paradigm, as well as the PO file format, is inspired
409*946379e7Schristosby the NLS standard developed by Uniforum, and first implemented by
410*946379e7SchristosSun in their Solaris system.
411*946379e7Schristos
412*946379e7Schristos</P>
413*946379e7Schristos<P>
414*946379e7SchristosPO files are meant to be read and edited by humans, and associate each
415*946379e7Schristosoriginal, translatable string of a given package with its translation
416*946379e7Schristosin a particular target language.  A single PO file is dedicated to
417*946379e7Schristosa single target language.  If a package supports many languages,
418*946379e7Schristosthere is one such PO file per language supported, and each package
419*946379e7Schristoshas its own set of PO files.  These PO files are best created by
420*946379e7Schristosthe <CODE>xgettext</CODE> program, and later updated or refreshed through
421*946379e7Schristosthe <CODE>msgmerge</CODE> program.  Program <CODE>xgettext</CODE> extracts all
422*946379e7Schristosmarked messages from a set of C files and initializes a PO file with
423*946379e7Schristosempty translations.  Program <CODE>msgmerge</CODE> takes care of adjusting
424*946379e7SchristosPO files between releases of the corresponding sources, commenting
425*946379e7Schristosobsolete entries, initializing new ones, and updating all source
426*946379e7Schristosline references.  Files ending with <TT>&lsquo;.pot&rsquo;</TT> are kind of base
427*946379e7Schristostranslation files found in distributions, in PO file format.
428*946379e7Schristos
429*946379e7Schristos</P>
430*946379e7Schristos<P>
431*946379e7SchristosMO files are meant to be read by programs, and are binary in nature.
432*946379e7SchristosA few systems already offer tools for creating and handling MO files
433*946379e7Schristosas part of the Native Language Support coming with the system, but the
434*946379e7Schristosformat of these MO files is often different from system to system,
435*946379e7Schristosand non-portable.  The tools already provided with these systems don't
436*946379e7Schristossupport all the features of GNU <CODE>gettext</CODE>.  Therefore GNU
437*946379e7Schristos<CODE>gettext</CODE> uses its own format for MO files.  Files ending with
438*946379e7Schristos<TT>&lsquo;.gmo&rsquo;</TT> are really MO files, when it is known that these files use
439*946379e7Schristosthe GNU format.
440*946379e7Schristos
441*946379e7Schristos</P>
442*946379e7Schristos
443*946379e7Schristos
444*946379e7Schristos<H2><A NAME="SEC6" HREF="gettext_toc.html#TOC6">1.5  Overview of GNU <CODE>gettext</CODE></A></H2>
445*946379e7Schristos
446*946379e7Schristos<P>
447*946379e7Schristos<A NAME="IDX29"></A>
448*946379e7Schristos<A NAME="IDX30"></A>
449*946379e7Schristos<A NAME="IDX31"></A>
450*946379e7SchristosThe following diagram summarizes the relation between the files
451*946379e7Schristoshandled by GNU <CODE>gettext</CODE> and the tools acting on these files.
452*946379e7SchristosIt is followed by somewhat detailed explanations, which you should
453*946379e7Schristosread while keeping an eye on the diagram.  Having a clear understanding
454*946379e7Schristosof these interrelations will surely help programmers, translators
455*946379e7Schristosand maintainers.
456*946379e7Schristos
457*946379e7Schristos</P>
458*946379e7Schristos
459*946379e7Schristos<PRE>
460*946379e7Schristos@group
461*946379e7SchristosOriginal C Sources ───> Preparation ───> Marked C Sources ───╮
462*946379e7Schristos463*946379e7Schristos              ╭─────────<─── GNU gettext Library464*946379e7Schristos╭─── make <───┤                                              │
465*946379e7Schristos│             ╰─────────<────────────────────┬───────────────╯
466*946379e7Schristos│                                            │
467*946379e7Schristos│   ╭─────<─── PACKAGE.pot <─── xgettext <───╯   ╭───<─── PO Compendium
468*946379e7Schristos│   │                                            │              ↑
469*946379e7Schristos│   │                                            ╰───╮          │
470*946379e7Schristos│   ╰───╮                                            ├───> PO editor ───╮
471*946379e7Schristos│       ├────> msgmerge ──────> LANG.po ────>────────╯                  │
472*946379e7Schristos│   ╭───╯                                                               │
473*946379e7Schristos│   │                                                                   │
474*946379e7Schristos│   ╰─────────────<───────────────╮                                     │
475*946379e7Schristos│                                 ├─── New LANG.po <────────────────────╯
476*946379e7Schristos│   ╭─── LANG.gmo <─── msgfmt <───╯
477*946379e7Schristos│   │
478*946379e7Schristos│   ╰───> install ───> /.../LANG/PACKAGE.mo ───╮
479*946379e7Schristos│                                              ├───> "Hello world!"
480*946379e7Schristos╰───────> install ───> /.../bin/PROGRAM ───────╯
481*946379e7Schristos@end group
482*946379e7Schristos</PRE>
483*946379e7Schristos
484*946379e7Schristos<P>
485*946379e7Schristos<A NAME="IDX32"></A>
486*946379e7SchristosAs a programmer, the first step to bringing GNU <CODE>gettext</CODE>
487*946379e7Schristosinto your package is identifying, right in the C sources, those strings
488*946379e7Schristoswhich are meant to be translatable, and those which are untranslatable.
489*946379e7SchristosThis tedious job can be done a little more comfortably using emacs PO
490*946379e7Schristosmode, but you can use any means familiar to you for modifying your
491*946379e7SchristosC sources.  Beside this some other simple, standard changes are needed to
492*946379e7Schristosproperly initialize the translation library.  See section <A HREF="gettext_4.html#SEC11">4  Preparing Program Sources</A>, for
493*946379e7Schristosmore information about all this.
494*946379e7Schristos
495*946379e7Schristos</P>
496*946379e7Schristos<P>
497*946379e7SchristosFor newly written software the strings of course can and should be
498*946379e7Schristosmarked while writing it.  The <CODE>gettext</CODE> approach makes this
499*946379e7Schristosvery easy.  Simply put the following lines at the beginning of each file
500*946379e7Schristosor in a central header file:
501*946379e7Schristos
502*946379e7Schristos</P>
503*946379e7Schristos
504*946379e7Schristos<PRE>
505*946379e7Schristos#define _(String) (String)
506*946379e7Schristos#define N_(String) String
507*946379e7Schristos#define textdomain(Domain)
508*946379e7Schristos#define bindtextdomain(Package, Directory)
509*946379e7Schristos</PRE>
510*946379e7Schristos
511*946379e7Schristos<P>
512*946379e7SchristosDoing this allows you to prepare the sources for internationalization.
513*946379e7SchristosLater when you feel ready for the step to use the <CODE>gettext</CODE> library
514*946379e7Schristossimply replace these definitions by the following:
515*946379e7Schristos
516*946379e7Schristos</P>
517*946379e7Schristos<P>
518*946379e7Schristos<A NAME="IDX33"></A>
519*946379e7Schristos
520*946379e7Schristos<PRE>
521*946379e7Schristos#include &#60;libintl.h&#62;
522*946379e7Schristos#define _(String) gettext (String)
523*946379e7Schristos#define gettext_noop(String) String
524*946379e7Schristos#define N_(String) gettext_noop (String)
525*946379e7Schristos</PRE>
526*946379e7Schristos
527*946379e7Schristos<P>
528*946379e7Schristos<A NAME="IDX34"></A>
529*946379e7Schristos<A NAME="IDX35"></A>
530*946379e7Schristosand link against <TT>&lsquo;libintl.a&rsquo;</TT> or <TT>&lsquo;libintl.so&rsquo;</TT>.  Note that on
531*946379e7SchristosGNU systems, you don't need to link with <CODE>libintl</CODE> because the
532*946379e7Schristos<CODE>gettext</CODE> library functions are already contained in GNU libc.
533*946379e7SchristosThat is all you have to change.
534*946379e7Schristos
535*946379e7Schristos</P>
536*946379e7Schristos<P>
537*946379e7Schristos<A NAME="IDX36"></A>
538*946379e7Schristos<A NAME="IDX37"></A>
539*946379e7SchristosOnce the C sources have been modified, the <CODE>xgettext</CODE> program
540*946379e7Schristosis used to find and extract all translatable strings, and create a
541*946379e7SchristosPO template file out of all these.  This <TT>&lsquo;<VAR>package</VAR>.pot&rsquo;</TT> file
542*946379e7Schristoscontains all original program strings.  It has sets of pointers to
543*946379e7Schristosexactly where in C sources each string is used.  All translations
544*946379e7Schristosare set to empty.  The letter <CODE>t</CODE> in <TT>&lsquo;.pot&rsquo;</TT> marks this as
545*946379e7Schristosa Template PO file, not yet oriented towards any particular language.
546*946379e7SchristosSee section <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A>, for more details about how one calls the
547*946379e7Schristos<CODE>xgettext</CODE> program.  If you are <EM>really</EM> lazy, you might
548*946379e7Schristosbe interested at working a lot more right away, and preparing the
549*946379e7Schristoswhole distribution setup (see section <A HREF="gettext_13.html#SEC196">13  The Maintainer's View</A>).  By doing so, you
550*946379e7Schristosspare yourself typing the <CODE>xgettext</CODE> command, as <CODE>make</CODE>
551*946379e7Schristosshould now generate the proper things automatically for you!
552*946379e7Schristos
553*946379e7Schristos</P>
554*946379e7Schristos<P>
555*946379e7SchristosThe first time through, there is no <TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT> yet, so the
556*946379e7Schristos<CODE>msgmerge</CODE> step may be skipped and replaced by a mere copy of
557*946379e7Schristos<TT>&lsquo;<VAR>package</VAR>.pot&rsquo;</TT> to <TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT>, where <VAR>lang</VAR>
558*946379e7Schristosrepresents the target language.  See section <A HREF="gettext_6.html#SEC31">6  Creating a New PO File</A> for details.
559*946379e7Schristos
560*946379e7Schristos</P>
561*946379e7Schristos<P>
562*946379e7SchristosThen comes the initial translation of messages.  Translation in
563*946379e7Schristositself is a whole matter, still exclusively meant for humans,
564*946379e7Schristosand whose complexity far overwhelms the level of this manual.
565*946379e7SchristosNevertheless, a few hints are given in some other chapter of this
566*946379e7Schristosmanual (see section <A HREF="gettext_12.html#SEC184">12  The Translator's View</A>).  You will also find there indications
567*946379e7Schristosabout how to contact translating teams, or becoming part of them,
568*946379e7Schristosfor sharing your translating concerns with others who target the same
569*946379e7Schristosnative language.
570*946379e7Schristos
571*946379e7Schristos</P>
572*946379e7Schristos<P>
573*946379e7SchristosWhile adding the translated messages into the <TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT>
574*946379e7SchristosPO file, if you are not using one of the dedicated PO file editors
575*946379e7Schristos(see section <A HREF="gettext_8.html#SEC49">8  Editing PO Files</A>), you are on your own
576*946379e7Schristosfor ensuring that your efforts fully respect the PO file format, and quoting
577*946379e7Schristosconventions (see section <A HREF="gettext_3.html#SEC10">3  The Format of PO Files</A>).  This is surely not an impossible task,
578*946379e7Schristosas this is the way many people have handled PO files around 1995.
579*946379e7SchristosOn the other hand, by using a PO file editor, most details
580*946379e7Schristosof PO file format are taken care of for you, but you have to acquire
581*946379e7Schristossome familiarity with PO file editor itself.
582*946379e7Schristos
583*946379e7Schristos</P>
584*946379e7Schristos<P>
585*946379e7SchristosIf some common translations have already been saved into a compendium
586*946379e7SchristosPO file, translators may use PO mode for initializing untranslated
587*946379e7Schristosentries from the compendium, and also save selected translations into
588*946379e7Schristosthe compendium, updating it (see section <A HREF="gettext_8.html#SEC66">8.3.14  Using Translation Compendia</A>).  Compendium files
589*946379e7Schristosare meant to be exchanged between members of a given translation team.
590*946379e7Schristos
591*946379e7Schristos</P>
592*946379e7Schristos<P>
593*946379e7SchristosPrograms, or packages of programs, are dynamic in nature: users write
594*946379e7Schristosbug reports and suggestion for improvements, maintainers react by
595*946379e7Schristosmodifying programs in various ways.  The fact that a package has
596*946379e7Schristosalready been internationalized should not make maintainers shy
597*946379e7Schristosof adding new strings, or modifying strings already translated.
598*946379e7SchristosThey just do their job the best they can.  For the Translation
599*946379e7SchristosProject to work smoothly, it is important that maintainers do not
600*946379e7Schristoscarry translation concerns on their already loaded shoulders, and that
601*946379e7Schristostranslators be kept as free as possible of programming concerns.
602*946379e7Schristos
603*946379e7Schristos</P>
604*946379e7Schristos<P>
605*946379e7SchristosThe only concern maintainers should have is carefully marking new
606*946379e7Schristosstrings as translatable, when they should be, and do not otherwise
607*946379e7Schristosworry about them being translated, as this will come in proper time.
608*946379e7SchristosConsequently, when programs and their strings are adjusted in various
609*946379e7Schristosways by maintainers, and for matters usually unrelated to translation,
610*946379e7Schristos<CODE>xgettext</CODE> would construct <TT>&lsquo;<VAR>package</VAR>.pot&rsquo;</TT> files which are
611*946379e7Schristosevolving over time, so the translations carried by <TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT>
612*946379e7Schristosare slowly fading out of date.
613*946379e7Schristos
614*946379e7Schristos</P>
615*946379e7Schristos<P>
616*946379e7Schristos<A NAME="IDX38"></A>
617*946379e7SchristosIt is important for translators (and even maintainers) to understand
618*946379e7Schristosthat package translation is a continuous process in the lifetime of a
619*946379e7Schristospackage, and not something which is done once and for all at the start.
620*946379e7SchristosAfter an initial burst of translation activity for a given package,
621*946379e7Schristosinterventions are needed once in a while, because here and there,
622*946379e7Schristostranslated entries become obsolete, and new untranslated entries
623*946379e7Schristosappear, needing translation.
624*946379e7Schristos
625*946379e7Schristos</P>
626*946379e7Schristos<P>
627*946379e7SchristosThe <CODE>msgmerge</CODE> program has the purpose of refreshing an already
628*946379e7Schristosexisting <TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT> file, by comparing it with a newer
629*946379e7Schristos<TT>&lsquo;<VAR>package</VAR>.pot&rsquo;</TT> template file, extracted by <CODE>xgettext</CODE>
630*946379e7Schristosout of recent C sources.  The refreshing operation adjusts all
631*946379e7Schristosreferences to C source locations for strings, since these strings
632*946379e7Schristosmove as programs are modified.  Also, <CODE>msgmerge</CODE> comments out as
633*946379e7Schristosobsolete, in <TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT>, those already translated entries
634*946379e7Schristoswhich are no longer used in the program sources (see section <A HREF="gettext_8.html#SEC60">8.3.8  Obsolete Entries</A>).  It finally discovers new strings and inserts them in
635*946379e7Schristosthe resulting PO file as untranslated entries (see section <A HREF="gettext_8.html#SEC59">8.3.7  Untranslated Entries</A>).  See section <A HREF="gettext_7.html#SEC40">7.1  Invoking the <CODE>msgmerge</CODE> Program</A>, for more information about what
636*946379e7Schristos<CODE>msgmerge</CODE> really does.
637*946379e7Schristos
638*946379e7Schristos</P>
639*946379e7Schristos<P>
640*946379e7SchristosWhatever route or means taken, the goal is to obtain an updated
641*946379e7Schristos<TT>&lsquo;<VAR>lang</VAR>.po&rsquo;</TT> file offering translations for all strings.
642*946379e7Schristos
643*946379e7Schristos</P>
644*946379e7Schristos<P>
645*946379e7SchristosThe temporal mobility, or fluidity of PO files, is an integral part of
646*946379e7Schristosthe translation game, and should be well understood, and accepted.
647*946379e7SchristosPeople resisting it will have a hard time participating in the
648*946379e7SchristosTranslation Project, or will give a hard time to other participants!  In
649*946379e7Schristosparticular, maintainers should relax and include all available official
650*946379e7SchristosPO files in their distributions, even if these have not recently been
651*946379e7Schristosupdated, without exerting pressure on the translator teams to get the
652*946379e7Schristosjob done.  The pressure should rather come
653*946379e7Schristosfrom the community of users speaking a particular language, and
654*946379e7Schristosmaintainers should consider themselves fairly relieved of any concern
655*946379e7Schristosabout the adequacy of translation files.  On the other hand, translators
656*946379e7Schristosshould reasonably try updating the PO files they are responsible for,
657*946379e7Schristoswhile the package is undergoing pretest, prior to an official
658*946379e7Schristosdistribution.
659*946379e7Schristos
660*946379e7Schristos</P>
661*946379e7Schristos<P>
662*946379e7SchristosOnce the PO file is complete and dependable, the <CODE>msgfmt</CODE> program
663*946379e7Schristosis used for turning the PO file into a machine-oriented format, which
664*946379e7Schristosmay yield efficient retrieval of translations by the programs of the
665*946379e7Schristospackage, whenever needed at runtime (see section <A HREF="gettext_10.html#SEC163">10.3  The Format of GNU MO Files</A>).  See section <A HREF="gettext_10.html#SEC143">10.1  Invoking the <CODE>msgfmt</CODE> Program</A>, for more information about all modes of execution
666*946379e7Schristosfor the <CODE>msgfmt</CODE> program.
667*946379e7Schristos
668*946379e7Schristos</P>
669*946379e7Schristos<P>
670*946379e7SchristosFinally, the modified and marked C sources are compiled and linked
671*946379e7Schristoswith the GNU <CODE>gettext</CODE> library, usually through the operation of
672*946379e7Schristos<CODE>make</CODE>, given a suitable <TT>&lsquo;Makefile&rsquo;</TT> exists for the project,
673*946379e7Schristosand the resulting executable is installed somewhere users will find it.
674*946379e7SchristosThe MO files themselves should also be properly installed.  Given the
675*946379e7Schristosappropriate environment variables are set (see section <A HREF="gettext_2.html#SEC9">2.2  Magic for End Users</A>), the
676*946379e7Schristosprogram should localize itself automatically, whenever it executes.
677*946379e7Schristos
678*946379e7Schristos</P>
679*946379e7Schristos<P>
680*946379e7SchristosThe remainder of this manual has the purpose of explaining in depth the various
681*946379e7Schristossteps outlined above.
682*946379e7Schristos
683*946379e7Schristos</P>
684*946379e7Schristos<P><HR><P>
685*946379e7SchristosGo to the first, previous, <A HREF="gettext_2.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
686*946379e7Schristos</BODY>
687*946379e7Schristos</HTML>
688