1*946379e7Schristos<HTML> 2*946379e7Schristos<HEAD> 3*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b 4*946379e7Schristos from gettext.texi on 27 November 2006 --> 5*946379e7Schristos 6*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> 7*946379e7Schristos<TITLE>GNU gettext utilities - 1 Introduction</TITLE> 8*946379e7Schristos</HEAD> 9*946379e7Schristos<BODY> 10*946379e7SchristosGo to the first, previous, <A HREF="gettext_2.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 11*946379e7Schristos<P><HR><P> 12*946379e7Schristos 13*946379e7Schristos 14*946379e7Schristos 15*946379e7Schristos<H1><A NAME="SEC1" HREF="gettext_toc.html#TOC1">1 Introduction</A></H1> 16*946379e7Schristos 17*946379e7Schristos<P> 18*946379e7SchristosThis chapter explains the goals sought in the creation 19*946379e7Schristosof GNU <CODE>gettext</CODE> and the free Translation Project. 20*946379e7SchristosThen, it explains a few broad concepts around 21*946379e7SchristosNative Language Support, and positions message translation with regard 22*946379e7Schristosto other aspects of national and cultural variance, as they apply 23*946379e7Schristosto programs. It also surveys those files used to convey the 24*946379e7Schristostranslations. It explains how the various tools interact in the 25*946379e7Schristosinitial generation of these files, and later, how the maintenance 26*946379e7Schristoscycle should usually operate. 27*946379e7Schristos 28*946379e7Schristos</P> 29*946379e7Schristos<P> 30*946379e7Schristos<A NAME="IDX1"></A> 31*946379e7Schristos<A NAME="IDX2"></A> 32*946379e7Schristos<A NAME="IDX3"></A> 33*946379e7SchristosIn this manual, we use <EM>he</EM> when speaking of the programmer or 34*946379e7Schristosmaintainer, <EM>she</EM> when speaking of the translator, and <EM>they</EM> 35*946379e7Schristoswhen speaking of the installers or end users of the translated program. 36*946379e7SchristosThis is only a convenience for clarifying the documentation. It is 37*946379e7Schristos<EM>absolutely</EM> not meant to imply that some roles are more appropriate 38*946379e7Schristosto males or females. Besides, as you might guess, GNU <CODE>gettext</CODE> 39*946379e7Schristosis meant to be useful for people using computers, whatever their sex, 40*946379e7Schristosrace, religion or nationality! 41*946379e7Schristos 42*946379e7Schristos</P> 43*946379e7Schristos<P> 44*946379e7Schristos<A NAME="IDX4"></A> 45*946379e7SchristosPlease send suggestions and corrections to: 46*946379e7Schristos 47*946379e7Schristos</P> 48*946379e7Schristos 49*946379e7Schristos<PRE> 50*946379e7SchristosInternet address: 51*946379e7Schristos bug-gnu-gettext@gnu.org 52*946379e7Schristos</PRE> 53*946379e7Schristos 54*946379e7Schristos<P> 55*946379e7SchristosPlease include the manual's edition number and update date in your messages. 56*946379e7Schristos 57*946379e7Schristos</P> 58*946379e7Schristos 59*946379e7Schristos 60*946379e7Schristos 61*946379e7Schristos<H2><A NAME="SEC2" HREF="gettext_toc.html#TOC2">1.1 The Purpose of GNU <CODE>gettext</CODE></A></H2> 62*946379e7Schristos 63*946379e7Schristos<P> 64*946379e7SchristosUsually, programs are written and documented in English, and use 65*946379e7SchristosEnglish at execution time to interact with users. This is true 66*946379e7Schristosnot only of GNU software, but also of a great deal of commercial 67*946379e7Schristosand free software. Using a common language is quite handy for 68*946379e7Schristoscommunication between developers, maintainers and users from all 69*946379e7Schristoscountries. On the other hand, most people are less comfortable with 70*946379e7SchristosEnglish than with their own native language, and would prefer to 71*946379e7Schristosuse their mother tongue for day to day's work, as far as possible. 72*946379e7SchristosMany would simply <EM>love</EM> to see their computer screen showing 73*946379e7Schristosa lot less of English, and far more of their own language. 74*946379e7Schristos 75*946379e7Schristos</P> 76*946379e7Schristos<P> 77*946379e7Schristos<A NAME="IDX5"></A> 78*946379e7SchristosHowever, to many people, this dream might appear so far fetched that 79*946379e7Schristosthey may believe it is not even worth spending time thinking about 80*946379e7Schristosit. They have no confidence at all that the dream might ever 81*946379e7Schristosbecome true. Yet some have not lost hope, and have organized themselves. 82*946379e7SchristosThe Translation Project is a formalization of this hope into a 83*946379e7Schristosworkable structure, which has a good chance to get all of us nearer 84*946379e7Schristosthe achievement of a truly multi-lingual set of programs. 85*946379e7Schristos 86*946379e7Schristos</P> 87*946379e7Schristos<P> 88*946379e7SchristosGNU <CODE>gettext</CODE> is an important step for the Translation Project, 89*946379e7Schristosas it is an asset on which we may build many other steps. This package 90*946379e7Schristosoffers to programmers, translators and even users, a well integrated 91*946379e7Schristosset of tools and documentation. Specifically, the GNU <CODE>gettext</CODE> 92*946379e7Schristosutilities are a set of tools that provides a framework within which 93*946379e7Schristosother free packages may produce multi-lingual messages. These tools 94*946379e7Schristosinclude 95*946379e7Schristos 96*946379e7Schristos</P> 97*946379e7Schristos 98*946379e7Schristos<UL> 99*946379e7Schristos<LI> 100*946379e7Schristos 101*946379e7SchristosA set of conventions about how programs should be written to support 102*946379e7Schristosmessage catalogs. 103*946379e7Schristos 104*946379e7Schristos<LI> 105*946379e7Schristos 106*946379e7SchristosA directory and file naming organization for the message catalogs 107*946379e7Schristosthemselves. 108*946379e7Schristos 109*946379e7Schristos<LI> 110*946379e7Schristos 111*946379e7SchristosA runtime library supporting the retrieval of translated messages. 112*946379e7Schristos 113*946379e7Schristos<LI> 114*946379e7Schristos 115*946379e7SchristosA few stand-alone programs to massage in various ways the sets of 116*946379e7Schristostranslatable strings, or already translated strings. 117*946379e7Schristos 118*946379e7Schristos<LI> 119*946379e7Schristos 120*946379e7SchristosA library supporting the parsing and creation of files containing 121*946379e7Schristostranslated messages. 122*946379e7Schristos 123*946379e7Schristos<LI> 124*946379e7Schristos 125*946379e7SchristosA special mode for Emacs<A NAME="DOCF1" HREF="gettext_foot.html#FOOT1">(1)</A> which helps preparing these sets 126*946379e7Schristosand bringing them up to date. 127*946379e7Schristos</UL> 128*946379e7Schristos 129*946379e7Schristos<P> 130*946379e7SchristosGNU <CODE>gettext</CODE> is designed to minimize the impact of 131*946379e7Schristosinternationalization on program sources, keeping this impact as small 132*946379e7Schristosand hardly noticeable as possible. Internationalization has better 133*946379e7Schristoschances of succeeding if it is very light weighted, or at least, 134*946379e7Schristosappear to be so, when looking at program sources. 135*946379e7Schristos 136*946379e7Schristos</P> 137*946379e7Schristos<P> 138*946379e7SchristosThe Translation Project also uses the GNU <CODE>gettext</CODE> distribution 139*946379e7Schristosas a vehicle for documenting its structure and methods. This goes 140*946379e7Schristosbeyond the strict technicalities of documenting the GNU <CODE>gettext</CODE> 141*946379e7Schristosproper. By so doing, translators will find in a single place, as 142*946379e7Schristosfar as possible, all they need to know for properly doing their 143*946379e7Schristostranslating work. Also, this supplemental documentation might also 144*946379e7Schristoshelp programmers, and even curious users, in understanding how GNU 145*946379e7Schristos<CODE>gettext</CODE> is related to the remainder of the Translation 146*946379e7SchristosProject, and consequently, have a glimpse at the <EM>big picture</EM>. 147*946379e7Schristos 148*946379e7Schristos</P> 149*946379e7Schristos 150*946379e7Schristos 151*946379e7Schristos<H2><A NAME="SEC3" HREF="gettext_toc.html#TOC3">1.2 I18n, L10n, and Such</A></H2> 152*946379e7Schristos 153*946379e7Schristos<P> 154*946379e7Schristos<A NAME="IDX6"></A> 155*946379e7Schristos<A NAME="IDX7"></A> 156*946379e7SchristosTwo long words appear all the time when we discuss support of native 157*946379e7Schristoslanguage in programs, and these words have a precise meaning, worth 158*946379e7Schristosbeing explained here, once and for all in this document. The words are 159*946379e7Schristos<EM>internationalization</EM> and <EM>localization</EM>. Many people, 160*946379e7Schristostired of writing these long words over and over again, took the 161*946379e7Schristoshabit of writing <EM>i18n</EM> and <EM>l10n</EM> instead, quoting the first 162*946379e7Schristosand last letter of each word, and replacing the run of intermediate 163*946379e7Schristosletters by a number merely telling how many such letters there are. 164*946379e7SchristosBut in this manual, in the sake of clarity, we will patiently write 165*946379e7Schristosthe names in full, each time... 166*946379e7Schristos 167*946379e7Schristos</P> 168*946379e7Schristos<P> 169*946379e7Schristos<A NAME="IDX8"></A> 170*946379e7SchristosBy <EM>internationalization</EM>, one refers to the operation by which a 171*946379e7Schristosprogram, or a set of programs turned into a package, is made aware of and 172*946379e7Schristosable to support multiple languages. This is a generalization process, 173*946379e7Schristosby which the programs are untied from calling only English strings or 174*946379e7Schristosother English specific habits, and connected to generic ways of doing 175*946379e7Schristosthe same, instead. Program developers may use various techniques to 176*946379e7Schristosinternationalize their programs. Some of these have been standardized. 177*946379e7SchristosGNU <CODE>gettext</CODE> offers one of these standards. See section <A HREF="gettext_11.html#SEC164">11 The Programmer's View</A>. 178*946379e7Schristos 179*946379e7Schristos</P> 180*946379e7Schristos<P> 181*946379e7Schristos<A NAME="IDX9"></A> 182*946379e7SchristosBy <EM>localization</EM>, one means the operation by which, in a set 183*946379e7Schristosof programs already internationalized, one gives the program all 184*946379e7Schristosneeded information so that it can adapt itself to handle its input 185*946379e7Schristosand output in a fashion which is correct for some native language and 186*946379e7Schristoscultural habits. This is a particularisation process, by which generic 187*946379e7Schristosmethods already implemented in an internationalized program are used 188*946379e7Schristosin specific ways. The programming environment puts several functions 189*946379e7Schristosto the programmers disposal which allow this runtime configuration. 190*946379e7SchristosThe formal description of specific set of cultural habits for some 191*946379e7Schristoscountry, together with all associated translations targeted to the 192*946379e7Schristossame native language, is called the <EM>locale</EM> for this language 193*946379e7Schristosor country. Users achieve localization of programs by setting proper 194*946379e7Schristosvalues to special environment variables, prior to executing those 195*946379e7Schristosprograms, identifying which locale should be used. 196*946379e7Schristos 197*946379e7Schristos</P> 198*946379e7Schristos<P> 199*946379e7SchristosIn fact, locale message support is only one component of the cultural 200*946379e7Schristosdata that makes up a particular locale. There are a whole host of 201*946379e7Schristosroutines and functions provided to aid programmers in developing 202*946379e7Schristosinternationalized software and which allow them to access the data 203*946379e7Schristosstored in a particular locale. When someone presently refers to a 204*946379e7Schristosparticular locale, they are obviously referring to the data stored 205*946379e7Schristoswithin that particular locale. Similarly, if a programmer is referring 206*946379e7Schristosto “accessing the locale routines”, they are referring to the 207*946379e7Schristoscomplete suite of routines that access all of the locale's information. 208*946379e7Schristos 209*946379e7Schristos</P> 210*946379e7Schristos<P> 211*946379e7Schristos<A NAME="IDX10"></A> 212*946379e7Schristos<A NAME="IDX11"></A> 213*946379e7Schristos<A NAME="IDX12"></A> 214*946379e7SchristosOne uses the expression <EM>Native Language Support</EM>, or merely NLS, 215*946379e7Schristosfor speaking of the overall activity or feature encompassing both 216*946379e7Schristosinternationalization and localization, allowing for multi-lingual 217*946379e7Schristosinteractions in a program. In a nutshell, one could say that 218*946379e7Schristosinternationalization is the operation by which further localizations 219*946379e7Schristosare made possible. 220*946379e7Schristos 221*946379e7Schristos</P> 222*946379e7Schristos<P> 223*946379e7SchristosAlso, very roughly said, when it comes to multi-lingual messages, 224*946379e7Schristosinternationalization is usually taken care of by programmers, and 225*946379e7Schristoslocalization is usually taken care of by translators. 226*946379e7Schristos 227*946379e7Schristos</P> 228*946379e7Schristos 229*946379e7Schristos 230*946379e7Schristos<H2><A NAME="SEC4" HREF="gettext_toc.html#TOC4">1.3 Aspects in Native Language Support</A></H2> 231*946379e7Schristos 232*946379e7Schristos<P> 233*946379e7Schristos<A NAME="IDX13"></A> 234*946379e7SchristosFor a totally multi-lingual distribution, there are many things to 235*946379e7Schristostranslate beyond output messages. 236*946379e7Schristos 237*946379e7Schristos</P> 238*946379e7Schristos 239*946379e7Schristos<UL> 240*946379e7Schristos<LI> 241*946379e7Schristos 242*946379e7SchristosAs of today, GNU <CODE>gettext</CODE> offers a complete toolset for 243*946379e7Schristostranslating messages output by C programs. Perl scripts and shell 244*946379e7Schristosscripts will also need to be translated. Even if there are today some hooks 245*946379e7Schristosby which this can be done, these hooks are not integrated as well as they 246*946379e7Schristosshould be. 247*946379e7Schristos 248*946379e7Schristos<LI> 249*946379e7Schristos 250*946379e7SchristosSome programs, like <CODE>autoconf</CODE> or <CODE>bison</CODE>, are able 251*946379e7Schristosto produce other programs (or scripts). Even if the generating 252*946379e7Schristosprograms themselves are internationalized, the generated programs they 253*946379e7Schristosproduce may need internationalization on their own, and this indirect 254*946379e7Schristosinternationalization could be automated right from the generating 255*946379e7Schristosprogram. In fact, quite usually, generating and generated programs 256*946379e7Schristoscould be internationalized independently, as the effort needed is 257*946379e7Schristosfairly orthogonal. 258*946379e7Schristos 259*946379e7Schristos<LI> 260*946379e7Schristos 261*946379e7SchristosA few programs include textual tables which might need translation 262*946379e7Schristosthemselves, independently of the strings contained in the program 263*946379e7Schristositself. For example, RFC 1345 gives an English description for each 264*946379e7Schristoscharacter which the <CODE>recode</CODE> program is able to reconstruct at execution. 265*946379e7SchristosSince these descriptions are extracted from the RFC by mechanical means, 266*946379e7Schristostranslating them properly would require a prior translation of the RFC 267*946379e7Schristositself. 268*946379e7Schristos 269*946379e7Schristos<LI> 270*946379e7Schristos 271*946379e7SchristosAlmost all programs accept options, which are often worded out so to 272*946379e7Schristosbe descriptive for the English readers; one might want to consider 273*946379e7Schristosoffering translated versions for program options as well. 274*946379e7Schristos 275*946379e7Schristos<LI> 276*946379e7Schristos 277*946379e7SchristosMany programs read, interpret, compile, or are somewhat driven by 278*946379e7Schristosinput files which are texts containing keywords, identifiers, or 279*946379e7Schristosreplies which are inherently translatable. For example, one may want 280*946379e7Schristos<CODE>gcc</CODE> to allow diacriticized characters in identifiers or use 281*946379e7Schristostranslated keywords; <SAMP>‘rm -i’</SAMP> might accept something else than 282*946379e7Schristos<SAMP>‘y’</SAMP> or <SAMP>‘n’</SAMP> for replies, etc. Even if the program will 283*946379e7Schristoseventually make most of its output in the foreign languages, one has 284*946379e7Schristosto decide whether the input syntax, option values, etc., are to be 285*946379e7Schristoslocalized or not. 286*946379e7Schristos 287*946379e7Schristos<LI> 288*946379e7Schristos 289*946379e7SchristosThe manual accompanying a package, as well as all documentation files 290*946379e7Schristosin the distribution, could surely be translated, too. Translating a 291*946379e7Schristosmanual, with the intent of later keeping up with updates, is a major 292*946379e7Schristosundertaking in itself, generally. 293*946379e7Schristos 294*946379e7Schristos</UL> 295*946379e7Schristos 296*946379e7Schristos<P> 297*946379e7SchristosAs we already stressed, translation is only one aspect of locales. 298*946379e7SchristosOther internationalization aspects are system services and are handled 299*946379e7Schristosin GNU <CODE>libc</CODE>. There 300*946379e7Schristosare many attributes that are needed to define a country's cultural 301*946379e7Schristosconventions. These attributes include beside the country's native 302*946379e7Schristoslanguage, the formatting of the date and time, the representation of 303*946379e7Schristosnumbers, the symbols for currency, etc. These local <EM>rules</EM> are 304*946379e7Schristostermed the country's locale. The locale represents the knowledge 305*946379e7Schristosneeded to support the country's native attributes. 306*946379e7Schristos 307*946379e7Schristos</P> 308*946379e7Schristos<P> 309*946379e7Schristos<A NAME="IDX14"></A> 310*946379e7SchristosThere are a few major areas which may vary between countries and 311*946379e7Schristoshence, define what a locale must describe. The following list helps 312*946379e7Schristosputting multi-lingual messages into the proper context of other tasks 313*946379e7Schristosrelated to locales. See the GNU <CODE>libc</CODE> manual for details. 314*946379e7Schristos 315*946379e7Schristos</P> 316*946379e7Schristos<DL COMPACT> 317*946379e7Schristos 318*946379e7Schristos<DT><EM>Characters and Codesets</EM> 319*946379e7Schristos<DD> 320*946379e7Schristos<A NAME="IDX15"></A> 321*946379e7Schristos<A NAME="IDX16"></A> 322*946379e7Schristos<A NAME="IDX17"></A> 323*946379e7Schristos<A NAME="IDX18"></A> 324*946379e7Schristos 325*946379e7SchristosThe codeset most commonly used through out the USA and most English 326*946379e7Schristosspeaking parts of the world is the ASCII codeset. However, there are 327*946379e7Schristosmany characters needed by various locales that are not found within 328*946379e7Schristosthis codeset. The 8-bit ISO 8859-1 code set has most of the special 329*946379e7Schristoscharacters needed to handle the major European languages. However, in 330*946379e7Schristosmany cases, choosing ISO 8859-1 is nevertheless not adequate: it 331*946379e7Schristosdoesn't even handle the major European currency. Hence each locale 332*946379e7Schristoswill need to specify which codeset they need to use and will need 333*946379e7Schristosto have the appropriate character handling routines to cope with 334*946379e7Schristosthe codeset. 335*946379e7Schristos 336*946379e7Schristos<DT><EM>Currency</EM> 337*946379e7Schristos<DD> 338*946379e7Schristos<A NAME="IDX19"></A> 339*946379e7Schristos<A NAME="IDX20"></A> 340*946379e7Schristos 341*946379e7SchristosThe symbols used vary from country to country as does the position 342*946379e7Schristosused by the symbol. Software needs to be able to transparently 343*946379e7Schristosdisplay currency figures in the native mode for each locale. 344*946379e7Schristos 345*946379e7Schristos<DT><EM>Dates</EM> 346*946379e7Schristos<DD> 347*946379e7Schristos<A NAME="IDX21"></A> 348*946379e7Schristos<A NAME="IDX22"></A> 349*946379e7Schristos 350*946379e7SchristosThe format of date varies between locales. For example, Christmas day 351*946379e7Schristosin 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. 352*946379e7SchristosOther countries might use ISO 8601 dates, etc. 353*946379e7Schristos 354*946379e7SchristosTime of the day may be noted as <VAR>hh</VAR>:<VAR>mm</VAR>, <VAR>hh</VAR>.<VAR>mm</VAR>, 355*946379e7Schristosor otherwise. Some locales require time to be specified in 24-hour 356*946379e7Schristosmode rather than as AM or PM. Further, the nature and yearly extent 357*946379e7Schristosof the Daylight Saving correction vary widely between countries. 358*946379e7Schristos 359*946379e7Schristos<DT><EM>Numbers</EM> 360*946379e7Schristos<DD> 361*946379e7Schristos<A NAME="IDX23"></A> 362*946379e7Schristos<A NAME="IDX24"></A> 363*946379e7Schristos 364*946379e7SchristosNumbers can be represented differently in different locales. 365*946379e7SchristosFor example, the following numbers are all written correctly for 366*946379e7Schristostheir respective locales: 367*946379e7Schristos 368*946379e7Schristos 369*946379e7Schristos<PRE> 370*946379e7Schristos12,345.67 English 371*946379e7Schristos12.345,67 German 372*946379e7Schristos 12345,67 French 373*946379e7Schristos1,2345.67 Asia 374*946379e7Schristos</PRE> 375*946379e7Schristos 376*946379e7SchristosSome programs could go further and use different unit systems, like 377*946379e7SchristosEnglish units or Metric units, or even take into account variants 378*946379e7Schristosabout how numbers are spelled in full. 379*946379e7Schristos 380*946379e7Schristos<DT><EM>Messages</EM> 381*946379e7Schristos<DD> 382*946379e7Schristos<A NAME="IDX25"></A> 383*946379e7Schristos<A NAME="IDX26"></A> 384*946379e7Schristos 385*946379e7SchristosThe most obvious area is the language support within a locale. This is 386*946379e7Schristoswhere GNU <CODE>gettext</CODE> provides the means for developers and users to 387*946379e7Schristoseasily change the language that the software uses to communicate to 388*946379e7Schristosthe user. 389*946379e7Schristos 390*946379e7Schristos</DL> 391*946379e7Schristos 392*946379e7Schristos<P> 393*946379e7Schristos<A NAME="IDX27"></A> 394*946379e7SchristosComponents of locale outside of message handling are standardized in 395*946379e7Schristosthe ISO C standard and the SUSV2 specification. GNU <CODE>libc</CODE> 396*946379e7Schristosfully implements this, and most other modern systems provide a more 397*946379e7Schristosor less reasonable support for at least some of the missing components. 398*946379e7Schristos 399*946379e7Schristos</P> 400*946379e7Schristos 401*946379e7Schristos 402*946379e7Schristos<H2><A NAME="SEC5" HREF="gettext_toc.html#TOC5">1.4 Files Conveying Translations</A></H2> 403*946379e7Schristos 404*946379e7Schristos<P> 405*946379e7Schristos<A NAME="IDX28"></A> 406*946379e7SchristosThe letters PO in <TT>‘.po’</TT> files means Portable Object, to 407*946379e7Schristosdistinguish it from <TT>‘.mo’</TT> files, where MO stands for Machine 408*946379e7SchristosObject. This paradigm, as well as the PO file format, is inspired 409*946379e7Schristosby the NLS standard developed by Uniforum, and first implemented by 410*946379e7SchristosSun in their Solaris system. 411*946379e7Schristos 412*946379e7Schristos</P> 413*946379e7Schristos<P> 414*946379e7SchristosPO files are meant to be read and edited by humans, and associate each 415*946379e7Schristosoriginal, translatable string of a given package with its translation 416*946379e7Schristosin a particular target language. A single PO file is dedicated to 417*946379e7Schristosa single target language. If a package supports many languages, 418*946379e7Schristosthere is one such PO file per language supported, and each package 419*946379e7Schristoshas its own set of PO files. These PO files are best created by 420*946379e7Schristosthe <CODE>xgettext</CODE> program, and later updated or refreshed through 421*946379e7Schristosthe <CODE>msgmerge</CODE> program. Program <CODE>xgettext</CODE> extracts all 422*946379e7Schristosmarked messages from a set of C files and initializes a PO file with 423*946379e7Schristosempty translations. Program <CODE>msgmerge</CODE> takes care of adjusting 424*946379e7SchristosPO files between releases of the corresponding sources, commenting 425*946379e7Schristosobsolete entries, initializing new ones, and updating all source 426*946379e7Schristosline references. Files ending with <TT>‘.pot’</TT> are kind of base 427*946379e7Schristostranslation files found in distributions, in PO file format. 428*946379e7Schristos 429*946379e7Schristos</P> 430*946379e7Schristos<P> 431*946379e7SchristosMO files are meant to be read by programs, and are binary in nature. 432*946379e7SchristosA few systems already offer tools for creating and handling MO files 433*946379e7Schristosas part of the Native Language Support coming with the system, but the 434*946379e7Schristosformat of these MO files is often different from system to system, 435*946379e7Schristosand non-portable. The tools already provided with these systems don't 436*946379e7Schristossupport all the features of GNU <CODE>gettext</CODE>. Therefore GNU 437*946379e7Schristos<CODE>gettext</CODE> uses its own format for MO files. Files ending with 438*946379e7Schristos<TT>‘.gmo’</TT> are really MO files, when it is known that these files use 439*946379e7Schristosthe GNU format. 440*946379e7Schristos 441*946379e7Schristos</P> 442*946379e7Schristos 443*946379e7Schristos 444*946379e7Schristos<H2><A NAME="SEC6" HREF="gettext_toc.html#TOC6">1.5 Overview of GNU <CODE>gettext</CODE></A></H2> 445*946379e7Schristos 446*946379e7Schristos<P> 447*946379e7Schristos<A NAME="IDX29"></A> 448*946379e7Schristos<A NAME="IDX30"></A> 449*946379e7Schristos<A NAME="IDX31"></A> 450*946379e7SchristosThe following diagram summarizes the relation between the files 451*946379e7Schristoshandled by GNU <CODE>gettext</CODE> and the tools acting on these files. 452*946379e7SchristosIt is followed by somewhat detailed explanations, which you should 453*946379e7Schristosread while keeping an eye on the diagram. Having a clear understanding 454*946379e7Schristosof these interrelations will surely help programmers, translators 455*946379e7Schristosand maintainers. 456*946379e7Schristos 457*946379e7Schristos</P> 458*946379e7Schristos 459*946379e7Schristos<PRE> 460*946379e7Schristos@group 461*946379e7SchristosOriginal C Sources ───> Preparation ───> Marked C Sources ───╮ 462*946379e7Schristos │ 463*946379e7Schristos ╭─────────<─── GNU gettext Library │ 464*946379e7Schristos╭─── make <───┤ │ 465*946379e7Schristos│ ╰─────────<────────────────────┬───────────────╯ 466*946379e7Schristos│ │ 467*946379e7Schristos│ ╭─────<─── PACKAGE.pot <─── xgettext <───╯ ╭───<─── PO Compendium 468*946379e7Schristos│ │ │ ↑ 469*946379e7Schristos│ │ ╰───╮ │ 470*946379e7Schristos│ ╰───╮ ├───> PO editor ───╮ 471*946379e7Schristos│ ├────> msgmerge ──────> LANG.po ────>────────╯ │ 472*946379e7Schristos│ ╭───╯ │ 473*946379e7Schristos│ │ │ 474*946379e7Schristos│ ╰─────────────<───────────────╮ │ 475*946379e7Schristos│ ├─── New LANG.po <────────────────────╯ 476*946379e7Schristos│ ╭─── LANG.gmo <─── msgfmt <───╯ 477*946379e7Schristos│ │ 478*946379e7Schristos│ ╰───> install ───> /.../LANG/PACKAGE.mo ───╮ 479*946379e7Schristos│ ├───> "Hello world!" 480*946379e7Schristos╰───────> install ───> /.../bin/PROGRAM ───────╯ 481*946379e7Schristos@end group 482*946379e7Schristos</PRE> 483*946379e7Schristos 484*946379e7Schristos<P> 485*946379e7Schristos<A NAME="IDX32"></A> 486*946379e7SchristosAs a programmer, the first step to bringing GNU <CODE>gettext</CODE> 487*946379e7Schristosinto your package is identifying, right in the C sources, those strings 488*946379e7Schristoswhich are meant to be translatable, and those which are untranslatable. 489*946379e7SchristosThis tedious job can be done a little more comfortably using emacs PO 490*946379e7Schristosmode, but you can use any means familiar to you for modifying your 491*946379e7SchristosC sources. Beside this some other simple, standard changes are needed to 492*946379e7Schristosproperly initialize the translation library. See section <A HREF="gettext_4.html#SEC11">4 Preparing Program Sources</A>, for 493*946379e7Schristosmore information about all this. 494*946379e7Schristos 495*946379e7Schristos</P> 496*946379e7Schristos<P> 497*946379e7SchristosFor newly written software the strings of course can and should be 498*946379e7Schristosmarked while writing it. The <CODE>gettext</CODE> approach makes this 499*946379e7Schristosvery easy. Simply put the following lines at the beginning of each file 500*946379e7Schristosor in a central header file: 501*946379e7Schristos 502*946379e7Schristos</P> 503*946379e7Schristos 504*946379e7Schristos<PRE> 505*946379e7Schristos#define _(String) (String) 506*946379e7Schristos#define N_(String) String 507*946379e7Schristos#define textdomain(Domain) 508*946379e7Schristos#define bindtextdomain(Package, Directory) 509*946379e7Schristos</PRE> 510*946379e7Schristos 511*946379e7Schristos<P> 512*946379e7SchristosDoing this allows you to prepare the sources for internationalization. 513*946379e7SchristosLater when you feel ready for the step to use the <CODE>gettext</CODE> library 514*946379e7Schristossimply replace these definitions by the following: 515*946379e7Schristos 516*946379e7Schristos</P> 517*946379e7Schristos<P> 518*946379e7Schristos<A NAME="IDX33"></A> 519*946379e7Schristos 520*946379e7Schristos<PRE> 521*946379e7Schristos#include <libintl.h> 522*946379e7Schristos#define _(String) gettext (String) 523*946379e7Schristos#define gettext_noop(String) String 524*946379e7Schristos#define N_(String) gettext_noop (String) 525*946379e7Schristos</PRE> 526*946379e7Schristos 527*946379e7Schristos<P> 528*946379e7Schristos<A NAME="IDX34"></A> 529*946379e7Schristos<A NAME="IDX35"></A> 530*946379e7Schristosand link against <TT>‘libintl.a’</TT> or <TT>‘libintl.so’</TT>. Note that on 531*946379e7SchristosGNU systems, you don't need to link with <CODE>libintl</CODE> because the 532*946379e7Schristos<CODE>gettext</CODE> library functions are already contained in GNU libc. 533*946379e7SchristosThat is all you have to change. 534*946379e7Schristos 535*946379e7Schristos</P> 536*946379e7Schristos<P> 537*946379e7Schristos<A NAME="IDX36"></A> 538*946379e7Schristos<A NAME="IDX37"></A> 539*946379e7SchristosOnce the C sources have been modified, the <CODE>xgettext</CODE> program 540*946379e7Schristosis used to find and extract all translatable strings, and create a 541*946379e7SchristosPO template file out of all these. This <TT>‘<VAR>package</VAR>.pot’</TT> file 542*946379e7Schristoscontains all original program strings. It has sets of pointers to 543*946379e7Schristosexactly where in C sources each string is used. All translations 544*946379e7Schristosare set to empty. The letter <CODE>t</CODE> in <TT>‘.pot’</TT> marks this as 545*946379e7Schristosa Template PO file, not yet oriented towards any particular language. 546*946379e7SchristosSee section <A HREF="gettext_5.html#SEC22">5.1 Invoking the <CODE>xgettext</CODE> Program</A>, for more details about how one calls the 547*946379e7Schristos<CODE>xgettext</CODE> program. If you are <EM>really</EM> lazy, you might 548*946379e7Schristosbe interested at working a lot more right away, and preparing the 549*946379e7Schristoswhole distribution setup (see section <A HREF="gettext_13.html#SEC196">13 The Maintainer's View</A>). By doing so, you 550*946379e7Schristosspare yourself typing the <CODE>xgettext</CODE> command, as <CODE>make</CODE> 551*946379e7Schristosshould now generate the proper things automatically for you! 552*946379e7Schristos 553*946379e7Schristos</P> 554*946379e7Schristos<P> 555*946379e7SchristosThe first time through, there is no <TT>‘<VAR>lang</VAR>.po’</TT> yet, so the 556*946379e7Schristos<CODE>msgmerge</CODE> step may be skipped and replaced by a mere copy of 557*946379e7Schristos<TT>‘<VAR>package</VAR>.pot’</TT> to <TT>‘<VAR>lang</VAR>.po’</TT>, where <VAR>lang</VAR> 558*946379e7Schristosrepresents the target language. See section <A HREF="gettext_6.html#SEC31">6 Creating a New PO File</A> for details. 559*946379e7Schristos 560*946379e7Schristos</P> 561*946379e7Schristos<P> 562*946379e7SchristosThen comes the initial translation of messages. Translation in 563*946379e7Schristositself is a whole matter, still exclusively meant for humans, 564*946379e7Schristosand whose complexity far overwhelms the level of this manual. 565*946379e7SchristosNevertheless, a few hints are given in some other chapter of this 566*946379e7Schristosmanual (see section <A HREF="gettext_12.html#SEC184">12 The Translator's View</A>). You will also find there indications 567*946379e7Schristosabout how to contact translating teams, or becoming part of them, 568*946379e7Schristosfor sharing your translating concerns with others who target the same 569*946379e7Schristosnative language. 570*946379e7Schristos 571*946379e7Schristos</P> 572*946379e7Schristos<P> 573*946379e7SchristosWhile adding the translated messages into the <TT>‘<VAR>lang</VAR>.po’</TT> 574*946379e7SchristosPO file, if you are not using one of the dedicated PO file editors 575*946379e7Schristos(see section <A HREF="gettext_8.html#SEC49">8 Editing PO Files</A>), you are on your own 576*946379e7Schristosfor ensuring that your efforts fully respect the PO file format, and quoting 577*946379e7Schristosconventions (see section <A HREF="gettext_3.html#SEC10">3 The Format of PO Files</A>). This is surely not an impossible task, 578*946379e7Schristosas this is the way many people have handled PO files around 1995. 579*946379e7SchristosOn the other hand, by using a PO file editor, most details 580*946379e7Schristosof PO file format are taken care of for you, but you have to acquire 581*946379e7Schristossome familiarity with PO file editor itself. 582*946379e7Schristos 583*946379e7Schristos</P> 584*946379e7Schristos<P> 585*946379e7SchristosIf some common translations have already been saved into a compendium 586*946379e7SchristosPO file, translators may use PO mode for initializing untranslated 587*946379e7Schristosentries from the compendium, and also save selected translations into 588*946379e7Schristosthe compendium, updating it (see section <A HREF="gettext_8.html#SEC66">8.3.14 Using Translation Compendia</A>). Compendium files 589*946379e7Schristosare meant to be exchanged between members of a given translation team. 590*946379e7Schristos 591*946379e7Schristos</P> 592*946379e7Schristos<P> 593*946379e7SchristosPrograms, or packages of programs, are dynamic in nature: users write 594*946379e7Schristosbug reports and suggestion for improvements, maintainers react by 595*946379e7Schristosmodifying programs in various ways. The fact that a package has 596*946379e7Schristosalready been internationalized should not make maintainers shy 597*946379e7Schristosof adding new strings, or modifying strings already translated. 598*946379e7SchristosThey just do their job the best they can. For the Translation 599*946379e7SchristosProject to work smoothly, it is important that maintainers do not 600*946379e7Schristoscarry translation concerns on their already loaded shoulders, and that 601*946379e7Schristostranslators be kept as free as possible of programming concerns. 602*946379e7Schristos 603*946379e7Schristos</P> 604*946379e7Schristos<P> 605*946379e7SchristosThe only concern maintainers should have is carefully marking new 606*946379e7Schristosstrings as translatable, when they should be, and do not otherwise 607*946379e7Schristosworry about them being translated, as this will come in proper time. 608*946379e7SchristosConsequently, when programs and their strings are adjusted in various 609*946379e7Schristosways by maintainers, and for matters usually unrelated to translation, 610*946379e7Schristos<CODE>xgettext</CODE> would construct <TT>‘<VAR>package</VAR>.pot’</TT> files which are 611*946379e7Schristosevolving over time, so the translations carried by <TT>‘<VAR>lang</VAR>.po’</TT> 612*946379e7Schristosare slowly fading out of date. 613*946379e7Schristos 614*946379e7Schristos</P> 615*946379e7Schristos<P> 616*946379e7Schristos<A NAME="IDX38"></A> 617*946379e7SchristosIt is important for translators (and even maintainers) to understand 618*946379e7Schristosthat package translation is a continuous process in the lifetime of a 619*946379e7Schristospackage, and not something which is done once and for all at the start. 620*946379e7SchristosAfter an initial burst of translation activity for a given package, 621*946379e7Schristosinterventions are needed once in a while, because here and there, 622*946379e7Schristostranslated entries become obsolete, and new untranslated entries 623*946379e7Schristosappear, needing translation. 624*946379e7Schristos 625*946379e7Schristos</P> 626*946379e7Schristos<P> 627*946379e7SchristosThe <CODE>msgmerge</CODE> program has the purpose of refreshing an already 628*946379e7Schristosexisting <TT>‘<VAR>lang</VAR>.po’</TT> file, by comparing it with a newer 629*946379e7Schristos<TT>‘<VAR>package</VAR>.pot’</TT> template file, extracted by <CODE>xgettext</CODE> 630*946379e7Schristosout of recent C sources. The refreshing operation adjusts all 631*946379e7Schristosreferences to C source locations for strings, since these strings 632*946379e7Schristosmove as programs are modified. Also, <CODE>msgmerge</CODE> comments out as 633*946379e7Schristosobsolete, in <TT>‘<VAR>lang</VAR>.po’</TT>, those already translated entries 634*946379e7Schristoswhich are no longer used in the program sources (see section <A HREF="gettext_8.html#SEC60">8.3.8 Obsolete Entries</A>). It finally discovers new strings and inserts them in 635*946379e7Schristosthe resulting PO file as untranslated entries (see section <A HREF="gettext_8.html#SEC59">8.3.7 Untranslated Entries</A>). See section <A HREF="gettext_7.html#SEC40">7.1 Invoking the <CODE>msgmerge</CODE> Program</A>, for more information about what 636*946379e7Schristos<CODE>msgmerge</CODE> really does. 637*946379e7Schristos 638*946379e7Schristos</P> 639*946379e7Schristos<P> 640*946379e7SchristosWhatever route or means taken, the goal is to obtain an updated 641*946379e7Schristos<TT>‘<VAR>lang</VAR>.po’</TT> file offering translations for all strings. 642*946379e7Schristos 643*946379e7Schristos</P> 644*946379e7Schristos<P> 645*946379e7SchristosThe temporal mobility, or fluidity of PO files, is an integral part of 646*946379e7Schristosthe translation game, and should be well understood, and accepted. 647*946379e7SchristosPeople resisting it will have a hard time participating in the 648*946379e7SchristosTranslation Project, or will give a hard time to other participants! In 649*946379e7Schristosparticular, maintainers should relax and include all available official 650*946379e7SchristosPO files in their distributions, even if these have not recently been 651*946379e7Schristosupdated, without exerting pressure on the translator teams to get the 652*946379e7Schristosjob done. The pressure should rather come 653*946379e7Schristosfrom the community of users speaking a particular language, and 654*946379e7Schristosmaintainers should consider themselves fairly relieved of any concern 655*946379e7Schristosabout the adequacy of translation files. On the other hand, translators 656*946379e7Schristosshould reasonably try updating the PO files they are responsible for, 657*946379e7Schristoswhile the package is undergoing pretest, prior to an official 658*946379e7Schristosdistribution. 659*946379e7Schristos 660*946379e7Schristos</P> 661*946379e7Schristos<P> 662*946379e7SchristosOnce the PO file is complete and dependable, the <CODE>msgfmt</CODE> program 663*946379e7Schristosis used for turning the PO file into a machine-oriented format, which 664*946379e7Schristosmay yield efficient retrieval of translations by the programs of the 665*946379e7Schristospackage, whenever needed at runtime (see section <A HREF="gettext_10.html#SEC163">10.3 The Format of GNU MO Files</A>). See section <A HREF="gettext_10.html#SEC143">10.1 Invoking the <CODE>msgfmt</CODE> Program</A>, for more information about all modes of execution 666*946379e7Schristosfor the <CODE>msgfmt</CODE> program. 667*946379e7Schristos 668*946379e7Schristos</P> 669*946379e7Schristos<P> 670*946379e7SchristosFinally, the modified and marked C sources are compiled and linked 671*946379e7Schristoswith the GNU <CODE>gettext</CODE> library, usually through the operation of 672*946379e7Schristos<CODE>make</CODE>, given a suitable <TT>‘Makefile’</TT> exists for the project, 673*946379e7Schristosand the resulting executable is installed somewhere users will find it. 674*946379e7SchristosThe MO files themselves should also be properly installed. Given the 675*946379e7Schristosappropriate environment variables are set (see section <A HREF="gettext_2.html#SEC9">2.2 Magic for End Users</A>), the 676*946379e7Schristosprogram should localize itself automatically, whenever it executes. 677*946379e7Schristos 678*946379e7Schristos</P> 679*946379e7Schristos<P> 680*946379e7SchristosThe remainder of this manual has the purpose of explaining in depth the various 681*946379e7Schristossteps outlined above. 682*946379e7Schristos 683*946379e7Schristos</P> 684*946379e7Schristos<P><HR><P> 685*946379e7SchristosGo to the first, previous, <A HREF="gettext_2.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>. 686*946379e7Schristos</BODY> 687*946379e7Schristos</HTML> 688