1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@c The @documentencoding is needed for makeinfo, but not for texi2html. 4@c @ifhtml 5@c @documentencoding UTF-8 6@c @end ifhtml 7@setfilename gettext.info 8@settitle GNU @code{gettext} utilities 9@finalout 10@c Indices: 11@c am = autoconf macro @amindex 12@c cp = concept @cindex 13@c ef = emacs function @efindex 14@c em = emacs mode @emindex 15@c ev = emacs variable @evindex 16@c fn = function @findex 17@c kw = keyword @kwindex 18@c op = option @opindex 19@c pg = program @pindex 20@c vr = variable @vindex 21@c Unused predefined indices: 22@c tp = type @tindex 23@c ky = keystroke @kindex 24@defcodeindex am 25@defcodeindex ef 26@defindex em 27@defcodeindex ev 28@defcodeindex kw 29@defcodeindex op 30@syncodeindex ef em 31@syncodeindex ev em 32@syncodeindex fn cp 33@syncodeindex kw cp 34@c %**end of header 35 36@include version.texi 37 38@ifinfo 39@dircategory GNU Gettext Utilities 40@direntry 41* gettext: (gettext). GNU gettext utilities. 42* autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure. 43* envsubst: (gettext)envsubst Invocation. Expand environment variables. 44* gettextize: (gettext)gettextize Invocation. Prepare a package for gettext. 45* msgattrib: (gettext)msgattrib Invocation. Select part of a PO file. 46* msgcat: (gettext)msgcat Invocation. Combine several PO files. 47* msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template. 48* msgcomm: (gettext)msgcomm Invocation. Match two PO files. 49* msgconv: (gettext)msgconv Invocation. Convert PO file to encoding. 50* msgen: (gettext)msgen Invocation. Create an English PO file. 51* msgexec: (gettext)msgexec Invocation. Process a PO file. 52* msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter. 53* msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files. 54* msggrep: (gettext)msggrep Invocation. Select part of a PO file. 55* msginit: (gettext)msginit Invocation. Create a fresh PO file. 56* msgmerge: (gettext)msgmerge Invocation. Update a PO file from template. 57* msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file. 58* msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file. 59* ngettext: (gettext)ngettext Invocation. Translate a message with plural. 60* xgettext: (gettext)xgettext Invocation. Extract strings into a PO file. 61* ISO639: (gettext)Language Codes. ISO 639 language codes. 62* ISO3166: (gettext)Country Codes. ISO 3166 country codes. 63@end direntry 64@end ifinfo 65 66@ifinfo 67This file provides documentation for GNU @code{gettext} utilities. 68It also serves as a reference for the free Translation Project. 69 70@copying 71Copyright (C) 1995-1998, 2001-2006 Free Software Foundation, Inc. 72 73This manual is free documentation. It is dually licensed under the 74GNU FDL and the GNU GPL. This means that you can redistribute this 75manual under either of these two licenses, at your choice. 76 77This manual is covered by the GNU FDL. Permission is granted to copy, 78distribute and/or modify this document under the terms of the 79GNU Free Documentation License (FDL), either version 1.2 of the 80License, or (at your option) any later version published by the 81Free Software Foundation (FSF); with no Invariant Sections, with no 82Front-Cover Text, and with no Back-Cover Texts. 83A copy of the license is included in @ref{GNU FDL}. 84 85This manual is covered by the GNU GPL. You can redistribute it and/or 86modify it under the terms of the GNU General Public License (GPL), either 87version 2 of the License, or (at your option) any later version published 88by the Free Software Foundation (FSF). 89A copy of the license is included in @ref{GNU GPL}. 90@end copying 91@end ifinfo 92 93@titlepage 94@title GNU gettext tools, version @value{VERSION} 95@subtitle Native Language Support Library and Tools 96@subtitle Edition @value{EDITION}, @value{UPDATED} 97@author Ulrich Drepper 98@author Jim Meyering 99@author Fran@,{c}ois Pinard 100@author Bruno Haible 101 102@ifnothtml 103@page 104@vskip 0pt plus 1filll 105@c @insertcopying 106Copyright (C) 1995-1998, 2001-2006 Free Software Foundation, Inc. 107 108This manual is free documentation. It is dually licensed under the 109GNU FDL and the GNU GPL. This means that you can redistribute this 110manual under either of these two licenses, at your choice. 111 112This manual is covered by the GNU FDL. Permission is granted to copy, 113distribute and/or modify this document under the terms of the 114GNU Free Documentation License (FDL), either version 1.2 of the 115License, or (at your option) any later version published by the 116Free Software Foundation (FSF); with no Invariant Sections, with no 117Front-Cover Text, and with no Back-Cover Texts. 118A copy of the license is included in @ref{GNU FDL}. 119 120This manual is covered by the GNU GPL. You can redistribute it and/or 121modify it under the terms of the GNU General Public License (GPL), either 122version 2 of the License, or (at your option) any later version published 123by the Free Software Foundation (FSF). 124A copy of the license is included in @ref{GNU GPL}. 125@end ifnothtml 126@end titlepage 127 128@ifnottex 129@c Table of Contents 130@contents 131@end ifnottex 132 133@ifinfo 134@node Top, Introduction, (dir), (dir) 135@top GNU @code{gettext} utilities 136 137This manual documents the GNU gettext tools and the GNU libintl library, 138version @value{VERSION}. 139 140@menu 141* Introduction:: Introduction 142* Users:: The User's View 143* PO Files:: The Format of PO Files 144* Sources:: Preparing Program Sources 145* Template:: Making the PO Template File 146* Creating:: Creating a New PO File 147* Updating:: Updating Existing PO Files 148* Editing:: Editing PO Files 149* Manipulating:: Manipulating PO Files 150* Binaries:: Producing Binary MO Files 151* Programmers:: The Programmer's View 152* Translators:: The Translator's View 153* Maintainers:: The Maintainer's View 154* Installers:: The Installer's and Distributor's View 155* Programming Languages:: Other Programming Languages 156* Conclusion:: Concluding Remarks 157 158* Language Codes:: ISO 639 language codes 159* Country Codes:: ISO 3166 country codes 160* Licenses:: Licenses 161 162* Program Index:: Index of Programs 163* Option Index:: Index of Command-Line Options 164* Variable Index:: Index of Environment Variables 165* PO Mode Index:: Index of Emacs PO Mode Commands 166* Autoconf Macro Index:: Index of Autoconf Macros 167* Index:: General Index 168 169@detailmenu 170 --- The Detailed Node Listing --- 171 172Introduction 173 174* Why:: The Purpose of GNU @code{gettext} 175* Concepts:: I18n, L10n, and Such 176* Aspects:: Aspects in Native Language Support 177* Files:: Files Conveying Translations 178* Overview:: Overview of GNU @code{gettext} 179 180The User's View 181 182* Matrix:: The Current @file{ABOUT-NLS} Matrix 183* End Users:: Magic for End Users 184 185Preparing Program Sources 186 187* Importing:: Importing the @code{gettext} declaration 188* Triggering:: Triggering @code{gettext} Operations 189* Preparing Strings:: Preparing Translatable Strings 190* Mark Keywords:: How Marks Appear in Sources 191* Marking:: Marking Translatable Strings 192* c-format Flag:: Telling something about the following string 193* Special cases:: Special Cases of Translatable Strings 194* Names:: Marking Proper Names for Translation 195* Libraries:: Preparing Library Sources 196 197Making the PO Template File 198 199* xgettext Invocation:: Invoking the @code{xgettext} Program 200 201Creating a New PO File 202 203* msginit Invocation:: Invoking the @code{msginit} Program 204* Header Entry:: Filling in the Header Entry 205 206Updating Existing PO Files 207 208* msgmerge Invocation:: Invoking the @code{msgmerge} Program 209 210Editing PO Files 211 212* KBabel:: KDE's PO File Editor 213* Gtranslator:: GNOME's PO File Editor 214* PO Mode:: Emacs's PO File Editor 215 216Emacs's PO File Editor 217 218* Installation:: Completing GNU @code{gettext} Installation 219* Main PO Commands:: Main Commands 220* Entry Positioning:: Entry Positioning 221* Normalizing:: Normalizing Strings in Entries 222* Translated Entries:: Translated Entries 223* Fuzzy Entries:: Fuzzy Entries 224* Untranslated Entries:: Untranslated Entries 225* Obsolete Entries:: Obsolete Entries 226* Modifying Translations:: Modifying Translations 227* Modifying Comments:: Modifying Comments 228* Subedit:: Mode for Editing Translations 229* C Sources Context:: C Sources Context 230* Auxiliary:: Consulting Auxiliary PO Files 231* Compendium:: Using Translation Compendia 232 233Using Translation Compendia 234 235* Creating Compendia:: Merging translations for later use 236* Using Compendia:: Using older translations if they fit 237 238Manipulating PO Files 239 240* msgcat Invocation:: Invoking the @code{msgcat} Program 241* msgconv Invocation:: Invoking the @code{msgconv} Program 242* msggrep Invocation:: Invoking the @code{msggrep} Program 243* msgfilter Invocation:: Invoking the @code{msgfilter} Program 244* msguniq Invocation:: Invoking the @code{msguniq} Program 245* msgcomm Invocation:: Invoking the @code{msgcomm} Program 246* msgcmp Invocation:: Invoking the @code{msgcmp} Program 247* msgattrib Invocation:: Invoking the @code{msgattrib} Program 248* msgen Invocation:: Invoking the @code{msgen} Program 249* msgexec Invocation:: Invoking the @code{msgexec} Program 250* libgettextpo:: Writing your own programs that process PO files 251 252Producing Binary MO Files 253 254* msgfmt Invocation:: Invoking the @code{msgfmt} Program 255* msgunfmt Invocation:: Invoking the @code{msgunfmt} Program 256* MO Files:: The Format of GNU MO Files 257 258The Programmer's View 259 260* catgets:: About @code{catgets} 261* gettext:: About @code{gettext} 262* Comparison:: Comparing the two interfaces 263* Using libintl.a:: Using libintl.a in own programs 264* gettext grok:: Being a @code{gettext} grok 265* Temp Programmers:: Temporary Notes for the Programmers Chapter 266 267About @code{catgets} 268 269* Interface to catgets:: The interface 270* Problems with catgets:: Problems with the @code{catgets} interface?! 271 272About @code{gettext} 273 274* Interface to gettext:: The interface 275* Ambiguities:: Solving ambiguities 276* Locating Catalogs:: Locating message catalog files 277* Charset conversion:: How to request conversion to Unicode 278* Contexts:: Solving ambiguities in GUI programs 279* Plural forms:: Additional functions for handling plurals 280* Optimized gettext:: Optimization of the *gettext functions 281 282Temporary Notes for the Programmers Chapter 283 284* Temp Implementations:: Temporary - Two Possible Implementations 285* Temp catgets:: Temporary - About @code{catgets} 286* Temp WSI:: Temporary - Why a single implementation 287* Temp Notes:: Temporary - Notes 288 289The Translator's View 290 291* Trans Intro 0:: Introduction 0 292* Trans Intro 1:: Introduction 1 293* Discussions:: Discussions 294* Organization:: Organization 295* Information Flow:: Information Flow 296* Prioritizing messages:: How to find which messages to translate first 297 298Organization 299 300* Central Coordination:: Central Coordination 301* National Teams:: National Teams 302* Mailing Lists:: Mailing Lists 303 304National Teams 305 306* Sub-Cultures:: Sub-Cultures 307* Organizational Ideas:: Organizational Ideas 308 309The Maintainer's View 310 311* Flat and Non-Flat:: Flat or Non-Flat Directory Structures 312* Prerequisites:: Prerequisite Works 313* gettextize Invocation:: Invoking the @code{gettextize} Program 314* Adjusting Files:: Files You Must Create or Alter 315* autoconf macros:: Autoconf macros for use in @file{configure.in} 316* CVS Issues:: Integrating with CVS 317* Release Management:: Creating a Distribution Tarball 318 319Files You Must Create or Alter 320 321* po/POTFILES.in:: @file{POTFILES.in} in @file{po/} 322* po/LINGUAS:: @file{LINGUAS} in @file{po/} 323* po/Makevars:: @file{Makevars} in @file{po/} 324* po/Rules-*:: Extending @file{Makefile} in @file{po/} 325* configure.in:: @file{configure.in} at top level 326* config.guess:: @file{config.guess}, @file{config.sub} at top level 327* mkinstalldirs:: @file{mkinstalldirs} at top level 328* aclocal:: @file{aclocal.m4} at top level 329* acconfig:: @file{acconfig.h} at top level 330* config.h.in:: @file{config.h.in} at top level 331* Makefile:: @file{Makefile.in} at top level 332* src/Makefile:: @file{Makefile.in} in @file{src/} 333* lib/gettext.h:: @file{gettext.h} in @file{lib/} 334 335Autoconf macros for use in @file{configure.in} 336 337* AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4} 338* AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 339* AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4} 340* AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4} 341* AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4} 342* AM_ICONV:: AM_ICONV in @file{iconv.m4} 343 344Integrating with CVS 345 346* Distributed CVS:: Avoiding version mismatch in distributed development 347* Files under CVS:: Files to put under CVS version control 348* autopoint Invocation:: Invoking the @code{autopoint} Program 349 350Other Programming Languages 351 352* Language Implementors:: The Language Implementor's View 353* Programmers for other Languages:: The Programmer's View 354* Translators for other Languages:: The Translator's View 355* Maintainers for other Languages:: The Maintainer's View 356* List of Programming Languages:: Individual Programming Languages 357* List of Data Formats:: Internationalizable Data 358 359The Translator's View 360 361* c-format:: C Format Strings 362* objc-format:: Objective C Format Strings 363* sh-format:: Shell Format Strings 364* python-format:: Python Format Strings 365* lisp-format:: Lisp Format Strings 366* elisp-format:: Emacs Lisp Format Strings 367* librep-format:: librep Format Strings 368* scheme-format:: Scheme Format Strings 369* smalltalk-format:: Smalltalk Format Strings 370* java-format:: Java Format Strings 371* csharp-format:: C# Format Strings 372* awk-format:: awk Format Strings 373* object-pascal-format:: Object Pascal Format Strings 374* ycp-format:: YCP Format Strings 375* tcl-format:: Tcl Format Strings 376* perl-format:: Perl Format Strings 377* php-format:: PHP Format Strings 378* gcc-internal-format:: GCC internal Format Strings 379* qt-format:: Qt Format Strings 380* boost-format:: Boost Format Strings 381 382Individual Programming Languages 383 384* C:: C, C++, Objective C 385* sh:: sh - Shell Script 386* bash:: bash - Bourne-Again Shell Script 387* Python:: Python 388* Common Lisp:: GNU clisp - Common Lisp 389* clisp C:: GNU clisp C sources 390* Emacs Lisp:: Emacs Lisp 391* librep:: librep 392* Scheme:: GNU guile - Scheme 393* Smalltalk:: GNU Smalltalk 394* Java:: Java 395* C#:: C# 396* gawk:: GNU awk 397* Pascal:: Pascal - Free Pascal Compiler 398* wxWidgets:: wxWidgets library 399* YCP:: YCP - YaST2 scripting language 400* Tcl:: Tcl - Tk's scripting language 401* Perl:: Perl 402* PHP:: PHP Hypertext Preprocessor 403* Pike:: Pike 404* GCC-source:: GNU Compiler Collection sources 405 406sh - Shell Script 407 408* Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization 409* gettext.sh:: Contents of @code{gettext.sh} 410* gettext Invocation:: Invoking the @code{gettext} program 411* ngettext Invocation:: Invoking the @code{ngettext} program 412* envsubst Invocation:: Invoking the @code{envsubst} program 413* eval_gettext Invocation:: Invoking the @code{eval_gettext} function 414* eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function 415 416Perl 417 418* General Problems:: General Problems Parsing Perl Code 419* Default Keywords:: Which Keywords Will xgettext Look For? 420* Special Keywords:: How to Extract Hash Keys 421* Quote-like Expressions:: What are Strings And Quote-like Expressions? 422* Interpolation I:: Invalid String Interpolation 423* Interpolation II:: Valid String Interpolation 424* Parentheses:: When To Use Parentheses 425* Long Lines:: How To Grok with Long Lines 426* Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work 427 428Internationalizable Data 429 430* POT:: POT - Portable Object Template 431* RST:: Resource String Table 432* Glade:: Glade - GNOME user interface description 433 434Concluding Remarks 435 436* History:: History of GNU @code{gettext} 437* References:: Related Readings 438 439Language Codes 440 441* Usual Language Codes:: Two-letter ISO 639 language codes 442* Rare Language Codes:: Three-letter ISO 639 language codes 443 444Licenses 445 446* GNU GPL:: GNU General Public License 447* GNU LGPL:: GNU Lesser General Public License 448* GNU FDL:: GNU Free Documentation License 449 450@end detailmenu 451@end menu 452 453@end ifinfo 454 455@node Introduction, Users, Top, Top 456@chapter Introduction 457 458This chapter explains the goals sought in the creation 459of GNU @code{gettext} and the free Translation Project. 460Then, it explains a few broad concepts around 461Native Language Support, and positions message translation with regard 462to other aspects of national and cultural variance, as they apply 463to programs. It also surveys those files used to convey the 464translations. It explains how the various tools interact in the 465initial generation of these files, and later, how the maintenance 466cycle should usually operate. 467 468@cindex sex 469@cindex he, she, and they 470@cindex she, he, and they 471In this manual, we use @emph{he} when speaking of the programmer or 472maintainer, @emph{she} when speaking of the translator, and @emph{they} 473when speaking of the installers or end users of the translated program. 474This is only a convenience for clarifying the documentation. It is 475@emph{absolutely} not meant to imply that some roles are more appropriate 476to males or females. Besides, as you might guess, GNU @code{gettext} 477is meant to be useful for people using computers, whatever their sex, 478race, religion or nationality! 479 480@cindex bug report address 481Please send suggestions and corrections to: 482 483@example 484@group 485@r{Internet address:} 486 bug-gnu-gettext@@gnu.org 487@end group 488@end example 489 490@noindent 491Please include the manual's edition number and update date in your messages. 492 493@menu 494* Why:: The Purpose of GNU @code{gettext} 495* Concepts:: I18n, L10n, and Such 496* Aspects:: Aspects in Native Language Support 497* Files:: Files Conveying Translations 498* Overview:: Overview of GNU @code{gettext} 499@end menu 500 501@node Why, Concepts, Introduction, Introduction 502@section The Purpose of GNU @code{gettext} 503 504Usually, programs are written and documented in English, and use 505English at execution time to interact with users. This is true 506not only of GNU software, but also of a great deal of commercial 507and free software. Using a common language is quite handy for 508communication between developers, maintainers and users from all 509countries. On the other hand, most people are less comfortable with 510English than with their own native language, and would prefer to 511use their mother tongue for day to day's work, as far as possible. 512Many would simply @emph{love} to see their computer screen showing 513a lot less of English, and far more of their own language. 514 515@cindex Translation Project 516However, to many people, this dream might appear so far fetched that 517they may believe it is not even worth spending time thinking about 518it. They have no confidence at all that the dream might ever 519become true. Yet some have not lost hope, and have organized themselves. 520The Translation Project is a formalization of this hope into a 521workable structure, which has a good chance to get all of us nearer 522the achievement of a truly multi-lingual set of programs. 523 524GNU @code{gettext} is an important step for the Translation Project, 525as it is an asset on which we may build many other steps. This package 526offers to programmers, translators and even users, a well integrated 527set of tools and documentation. Specifically, the GNU @code{gettext} 528utilities are a set of tools that provides a framework within which 529other free packages may produce multi-lingual messages. These tools 530include 531 532@itemize @bullet 533@item 534A set of conventions about how programs should be written to support 535message catalogs. 536 537@item 538A directory and file naming organization for the message catalogs 539themselves. 540 541@item 542A runtime library supporting the retrieval of translated messages. 543 544@item 545A few stand-alone programs to massage in various ways the sets of 546translatable strings, or already translated strings. 547 548@item 549A library supporting the parsing and creation of files containing 550translated messages. 551 552@item 553A special mode for Emacs@footnote{In this manual, all mentions of Emacs 554refers to either GNU Emacs or to XEmacs, which people sometimes call FSF 555Emacs and Lucid Emacs, respectively.} which helps preparing these sets 556and bringing them up to date. 557@end itemize 558 559GNU @code{gettext} is designed to minimize the impact of 560internationalization on program sources, keeping this impact as small 561and hardly noticeable as possible. Internationalization has better 562chances of succeeding if it is very light weighted, or at least, 563appear to be so, when looking at program sources. 564 565The Translation Project also uses the GNU @code{gettext} distribution 566as a vehicle for documenting its structure and methods. This goes 567beyond the strict technicalities of documenting the GNU @code{gettext} 568proper. By so doing, translators will find in a single place, as 569far as possible, all they need to know for properly doing their 570translating work. Also, this supplemental documentation might also 571help programmers, and even curious users, in understanding how GNU 572@code{gettext} is related to the remainder of the Translation 573Project, and consequently, have a glimpse at the @emph{big picture}. 574 575@node Concepts, Aspects, Why, Introduction 576@section I18n, L10n, and Such 577 578@cindex i18n 579@cindex l10n 580Two long words appear all the time when we discuss support of native 581language in programs, and these words have a precise meaning, worth 582being explained here, once and for all in this document. The words are 583@emph{internationalization} and @emph{localization}. Many people, 584tired of writing these long words over and over again, took the 585habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first 586and last letter of each word, and replacing the run of intermediate 587letters by a number merely telling how many such letters there are. 588But in this manual, in the sake of clarity, we will patiently write 589the names in full, each time@dots{} 590 591@cindex internationalization 592By @dfn{internationalization}, one refers to the operation by which a 593program, or a set of programs turned into a package, is made aware of and 594able to support multiple languages. This is a generalization process, 595by which the programs are untied from calling only English strings or 596other English specific habits, and connected to generic ways of doing 597the same, instead. Program developers may use various techniques to 598internationalize their programs. Some of these have been standardized. 599GNU @code{gettext} offers one of these standards. @xref{Programmers}. 600 601@cindex localization 602By @dfn{localization}, one means the operation by which, in a set 603of programs already internationalized, one gives the program all 604needed information so that it can adapt itself to handle its input 605and output in a fashion which is correct for some native language and 606cultural habits. This is a particularisation process, by which generic 607methods already implemented in an internationalized program are used 608in specific ways. The programming environment puts several functions 609to the programmers disposal which allow this runtime configuration. 610The formal description of specific set of cultural habits for some 611country, together with all associated translations targeted to the 612same native language, is called the @dfn{locale} for this language 613or country. Users achieve localization of programs by setting proper 614values to special environment variables, prior to executing those 615programs, identifying which locale should be used. 616 617In fact, locale message support is only one component of the cultural 618data that makes up a particular locale. There are a whole host of 619routines and functions provided to aid programmers in developing 620internationalized software and which allow them to access the data 621stored in a particular locale. When someone presently refers to a 622particular locale, they are obviously referring to the data stored 623within that particular locale. Similarly, if a programmer is referring 624to ``accessing the locale routines'', they are referring to the 625complete suite of routines that access all of the locale's information. 626 627@cindex NLS 628@cindex Native Language Support 629@cindex Natural Language Support 630One uses the expression @dfn{Native Language Support}, or merely NLS, 631for speaking of the overall activity or feature encompassing both 632internationalization and localization, allowing for multi-lingual 633interactions in a program. In a nutshell, one could say that 634internationalization is the operation by which further localizations 635are made possible. 636 637Also, very roughly said, when it comes to multi-lingual messages, 638internationalization is usually taken care of by programmers, and 639localization is usually taken care of by translators. 640 641@node Aspects, Files, Concepts, Introduction 642@section Aspects in Native Language Support 643 644@cindex translation aspects 645For a totally multi-lingual distribution, there are many things to 646translate beyond output messages. 647 648@itemize @bullet 649@item 650As of today, GNU @code{gettext} offers a complete toolset for 651translating messages output by C programs. Perl scripts and shell 652scripts will also need to be translated. Even if there are today some hooks 653by which this can be done, these hooks are not integrated as well as they 654should be. 655 656@item 657Some programs, like @code{autoconf} or @code{bison}, are able 658to produce other programs (or scripts). Even if the generating 659programs themselves are internationalized, the generated programs they 660produce may need internationalization on their own, and this indirect 661internationalization could be automated right from the generating 662program. In fact, quite usually, generating and generated programs 663could be internationalized independently, as the effort needed is 664fairly orthogonal. 665 666@item 667A few programs include textual tables which might need translation 668themselves, independently of the strings contained in the program 669itself. For example, @w{RFC 1345} gives an English description for each 670character which the @code{recode} program is able to reconstruct at execution. 671Since these descriptions are extracted from the RFC by mechanical means, 672translating them properly would require a prior translation of the RFC 673itself. 674 675@item 676Almost all programs accept options, which are often worded out so to 677be descriptive for the English readers; one might want to consider 678offering translated versions for program options as well. 679 680@item 681Many programs read, interpret, compile, or are somewhat driven by 682input files which are texts containing keywords, identifiers, or 683replies which are inherently translatable. For example, one may want 684@code{gcc} to allow diacriticized characters in identifiers or use 685translated keywords; @samp{rm -i} might accept something else than 686@samp{y} or @samp{n} for replies, etc. Even if the program will 687eventually make most of its output in the foreign languages, one has 688to decide whether the input syntax, option values, etc., are to be 689localized or not. 690 691@item 692The manual accompanying a package, as well as all documentation files 693in the distribution, could surely be translated, too. Translating a 694manual, with the intent of later keeping up with updates, is a major 695undertaking in itself, generally. 696 697@end itemize 698 699As we already stressed, translation is only one aspect of locales. 700Other internationalization aspects are system services and are handled 701in GNU @code{libc}. There 702are many attributes that are needed to define a country's cultural 703conventions. These attributes include beside the country's native 704language, the formatting of the date and time, the representation of 705numbers, the symbols for currency, etc. These local @dfn{rules} are 706termed the country's locale. The locale represents the knowledge 707needed to support the country's native attributes. 708 709@cindex locale facets 710There are a few major areas which may vary between countries and 711hence, define what a locale must describe. The following list helps 712putting multi-lingual messages into the proper context of other tasks 713related to locales. See the GNU @code{libc} manual for details. 714 715@table @emph 716 717@item Characters and Codesets 718@cindex codeset 719@cindex encoding 720@cindex character encoding 721@cindex locale facet, LC_CTYPE 722 723The codeset most commonly used through out the USA and most English 724speaking parts of the world is the ASCII codeset. However, there are 725many characters needed by various locales that are not found within 726this codeset. The 8-bit @w{ISO 8859-1} code set has most of the special 727characters needed to handle the major European languages. However, in 728many cases, choosing @w{ISO 8859-1} is nevertheless not adequate: it 729doesn't even handle the major European currency. Hence each locale 730will need to specify which codeset they need to use and will need 731to have the appropriate character handling routines to cope with 732the codeset. 733 734@item Currency 735@cindex currency symbols 736@cindex locale facet, LC_MONETARY 737 738The symbols used vary from country to country as does the position 739used by the symbol. Software needs to be able to transparently 740display currency figures in the native mode for each locale. 741 742@item Dates 743@cindex date format 744@cindex locale facet, LC_TIME 745 746The format of date varies between locales. For example, Christmas day 747in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. 748Other countries might use @w{ISO 8601} dates, etc. 749 750Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm}, 751or otherwise. Some locales require time to be specified in 24-hour 752mode rather than as AM or PM. Further, the nature and yearly extent 753of the Daylight Saving correction vary widely between countries. 754 755@item Numbers 756@cindex number format 757@cindex locale facet, LC_NUMERIC 758 759Numbers can be represented differently in different locales. 760For example, the following numbers are all written correctly for 761their respective locales: 762 763@example 76412,345.67 English 76512.345,67 German 766 12345,67 French 7671,2345.67 Asia 768@end example 769 770Some programs could go further and use different unit systems, like 771English units or Metric units, or even take into account variants 772about how numbers are spelled in full. 773 774@item Messages 775@cindex messages 776@cindex locale facet, LC_MESSAGES 777 778The most obvious area is the language support within a locale. This is 779where GNU @code{gettext} provides the means for developers and users to 780easily change the language that the software uses to communicate to 781the user. 782 783@end table 784 785@cindex Linux 786Components of locale outside of message handling are standardized in 787the ISO C standard and the SUSV2 specification. GNU @code{libc} 788fully implements this, and most other modern systems provide a more 789or less reasonable support for at least some of the missing components. 790 791@node Files, Overview, Aspects, Introduction 792@section Files Conveying Translations 793 794@cindex files, @file{.po} and @file{.mo} 795The letters PO in @file{.po} files means Portable Object, to 796distinguish it from @file{.mo} files, where MO stands for Machine 797Object. This paradigm, as well as the PO file format, is inspired 798by the NLS standard developed by Uniforum, and first implemented by 799Sun in their Solaris system. 800 801PO files are meant to be read and edited by humans, and associate each 802original, translatable string of a given package with its translation 803in a particular target language. A single PO file is dedicated to 804a single target language. If a package supports many languages, 805there is one such PO file per language supported, and each package 806has its own set of PO files. These PO files are best created by 807the @code{xgettext} program, and later updated or refreshed through 808the @code{msgmerge} program. Program @code{xgettext} extracts all 809marked messages from a set of C files and initializes a PO file with 810empty translations. Program @code{msgmerge} takes care of adjusting 811PO files between releases of the corresponding sources, commenting 812obsolete entries, initializing new ones, and updating all source 813line references. Files ending with @file{.pot} are kind of base 814translation files found in distributions, in PO file format. 815 816MO files are meant to be read by programs, and are binary in nature. 817A few systems already offer tools for creating and handling MO files 818as part of the Native Language Support coming with the system, but the 819format of these MO files is often different from system to system, 820and non-portable. The tools already provided with these systems don't 821support all the features of GNU @code{gettext}. Therefore GNU 822@code{gettext} uses its own format for MO files. Files ending with 823@file{.gmo} are really MO files, when it is known that these files use 824the GNU format. 825 826@node Overview, , Files, Introduction 827@section Overview of GNU @code{gettext} 828 829@cindex overview of @code{gettext} 830@cindex big picture 831@cindex tutorial of @code{gettext} usage 832The following diagram summarizes the relation between the files 833handled by GNU @code{gettext} and the tools acting on these files. 834It is followed by somewhat detailed explanations, which you should 835read while keeping an eye on the diagram. Having a clear understanding 836of these interrelations will surely help programmers, translators 837and maintainers. 838 839@example 840@ifhtml 841@group 842Original C Sources ───> Preparation ───> Marked C Sources ───╮ 843 │ 844 ╭─────────<─── GNU gettext Library │ 845╭─── make <───┤ │ 846│ ╰─────────<────────────────────┬───────────────╯ 847│ │ 848│ ╭─────<─── PACKAGE.pot <─── xgettext <───╯ ╭───<─── PO Compendium 849│ │ │ ↑ 850│ │ ╰───╮ │ 851│ ╰───╮ ├───> PO editor ───╮ 852│ ├────> msgmerge ──────> LANG.po ────>────────╯ │ 853│ ╭───╯ │ 854│ │ │ 855│ ╰─────────────<───────────────╮ │ 856│ ├─── New LANG.po <────────────────────╯ 857│ ╭─── LANG.gmo <─── msgfmt <───╯ 858│ │ 859│ ╰───> install ───> /.../LANG/PACKAGE.mo ───╮ 860│ ├───> "Hello world!" 861╰───────> install ───> /.../bin/PROGRAM ───────╯ 862@end group 863@end ifhtml 864@ifnothtml 865@group 866Original C Sources ---> Preparation ---> Marked C Sources ---. 867 | 868 .---------<--- GNU gettext Library | 869.--- make <---+ | 870| `---------<--------------------+---------------' 871| | 872| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium 873| | | ^ 874| | `---. | 875| `---. +---> PO editor ---. 876| +----> msgmerge ------> LANG.po ---->--------' | 877| .---' | 878| | | 879| `-------------<---------------. | 880| +--- New LANG.po <--------------------' 881| .--- LANG.gmo <--- msgfmt <---' 882| | 883| `---> install ---> /.../LANG/PACKAGE.mo ---. 884| +---> "Hello world!" 885`-------> install ---> /.../bin/PROGRAM -------' 886@end group 887@end ifnothtml 888@end example 889 890@cindex marking translatable strings 891As a programmer, the first step to bringing GNU @code{gettext} 892into your package is identifying, right in the C sources, those strings 893which are meant to be translatable, and those which are untranslatable. 894This tedious job can be done a little more comfortably using emacs PO 895mode, but you can use any means familiar to you for modifying your 896C sources. Beside this some other simple, standard changes are needed to 897properly initialize the translation library. @xref{Sources}, for 898more information about all this. 899 900For newly written software the strings of course can and should be 901marked while writing it. The @code{gettext} approach makes this 902very easy. Simply put the following lines at the beginning of each file 903or in a central header file: 904 905@example 906@group 907#define _(String) (String) 908#define N_(String) String 909#define textdomain(Domain) 910#define bindtextdomain(Package, Directory) 911@end group 912@end example 913 914@noindent 915Doing this allows you to prepare the sources for internationalization. 916Later when you feel ready for the step to use the @code{gettext} library 917simply replace these definitions by the following: 918 919@cindex include file @file{libintl.h} 920@example 921@group 922#include <libintl.h> 923#define _(String) gettext (String) 924#define gettext_noop(String) String 925#define N_(String) gettext_noop (String) 926@end group 927@end example 928 929@cindex link with @file{libintl} 930@cindex Linux 931@noindent 932and link against @file{libintl.a} or @file{libintl.so}. Note that on 933GNU systems, you don't need to link with @code{libintl} because the 934@code{gettext} library functions are already contained in GNU libc. 935That is all you have to change. 936 937@cindex template PO file 938@cindex files, @file{.pot} 939Once the C sources have been modified, the @code{xgettext} program 940is used to find and extract all translatable strings, and create a 941PO template file out of all these. This @file{@var{package}.pot} file 942contains all original program strings. It has sets of pointers to 943exactly where in C sources each string is used. All translations 944are set to empty. The letter @code{t} in @file{.pot} marks this as 945a Template PO file, not yet oriented towards any particular language. 946@xref{xgettext Invocation}, for more details about how one calls the 947@code{xgettext} program. If you are @emph{really} lazy, you might 948be interested at working a lot more right away, and preparing the 949whole distribution setup (@pxref{Maintainers}). By doing so, you 950spare yourself typing the @code{xgettext} command, as @code{make} 951should now generate the proper things automatically for you! 952 953The first time through, there is no @file{@var{lang}.po} yet, so the 954@code{msgmerge} step may be skipped and replaced by a mere copy of 955@file{@var{package}.pot} to @file{@var{lang}.po}, where @var{lang} 956represents the target language. See @ref{Creating} for details. 957 958Then comes the initial translation of messages. Translation in 959itself is a whole matter, still exclusively meant for humans, 960and whose complexity far overwhelms the level of this manual. 961Nevertheless, a few hints are given in some other chapter of this 962manual (@pxref{Translators}). You will also find there indications 963about how to contact translating teams, or becoming part of them, 964for sharing your translating concerns with others who target the same 965native language. 966 967While adding the translated messages into the @file{@var{lang}.po} 968PO file, if you are not using one of the dedicated PO file editors 969(@pxref{Editing}), you are on your own 970for ensuring that your efforts fully respect the PO file format, and quoting 971conventions (@pxref{PO Files}). This is surely not an impossible task, 972as this is the way many people have handled PO files around 1995. 973On the other hand, by using a PO file editor, most details 974of PO file format are taken care of for you, but you have to acquire 975some familiarity with PO file editor itself. 976 977If some common translations have already been saved into a compendium 978PO file, translators may use PO mode for initializing untranslated 979entries from the compendium, and also save selected translations into 980the compendium, updating it (@pxref{Compendium}). Compendium files 981are meant to be exchanged between members of a given translation team. 982 983Programs, or packages of programs, are dynamic in nature: users write 984bug reports and suggestion for improvements, maintainers react by 985modifying programs in various ways. The fact that a package has 986already been internationalized should not make maintainers shy 987of adding new strings, or modifying strings already translated. 988They just do their job the best they can. For the Translation 989Project to work smoothly, it is important that maintainers do not 990carry translation concerns on their already loaded shoulders, and that 991translators be kept as free as possible of programming concerns. 992 993The only concern maintainers should have is carefully marking new 994strings as translatable, when they should be, and do not otherwise 995worry about them being translated, as this will come in proper time. 996Consequently, when programs and their strings are adjusted in various 997ways by maintainers, and for matters usually unrelated to translation, 998@code{xgettext} would construct @file{@var{package}.pot} files which are 999evolving over time, so the translations carried by @file{@var{lang}.po} 1000are slowly fading out of date. 1001 1002@cindex evolution of packages 1003It is important for translators (and even maintainers) to understand 1004that package translation is a continuous process in the lifetime of a 1005package, and not something which is done once and for all at the start. 1006After an initial burst of translation activity for a given package, 1007interventions are needed once in a while, because here and there, 1008translated entries become obsolete, and new untranslated entries 1009appear, needing translation. 1010 1011The @code{msgmerge} program has the purpose of refreshing an already 1012existing @file{@var{lang}.po} file, by comparing it with a newer 1013@file{@var{package}.pot} template file, extracted by @code{xgettext} 1014out of recent C sources. The refreshing operation adjusts all 1015references to C source locations for strings, since these strings 1016move as programs are modified. Also, @code{msgmerge} comments out as 1017obsolete, in @file{@var{lang}.po}, those already translated entries 1018which are no longer used in the program sources (@pxref{Obsolete 1019Entries}). It finally discovers new strings and inserts them in 1020the resulting PO file as untranslated entries (@pxref{Untranslated 1021Entries}). @xref{msgmerge Invocation}, for more information about what 1022@code{msgmerge} really does. 1023 1024Whatever route or means taken, the goal is to obtain an updated 1025@file{@var{lang}.po} file offering translations for all strings. 1026 1027The temporal mobility, or fluidity of PO files, is an integral part of 1028the translation game, and should be well understood, and accepted. 1029People resisting it will have a hard time participating in the 1030Translation Project, or will give a hard time to other participants! In 1031particular, maintainers should relax and include all available official 1032PO files in their distributions, even if these have not recently been 1033updated, without exerting pressure on the translator teams to get the 1034job done. The pressure should rather come 1035from the community of users speaking a particular language, and 1036maintainers should consider themselves fairly relieved of any concern 1037about the adequacy of translation files. On the other hand, translators 1038should reasonably try updating the PO files they are responsible for, 1039while the package is undergoing pretest, prior to an official 1040distribution. 1041 1042Once the PO file is complete and dependable, the @code{msgfmt} program 1043is used for turning the PO file into a machine-oriented format, which 1044may yield efficient retrieval of translations by the programs of the 1045package, whenever needed at runtime (@pxref{MO Files}). @xref{msgfmt 1046Invocation}, for more information about all modes of execution 1047for the @code{msgfmt} program. 1048 1049Finally, the modified and marked C sources are compiled and linked 1050with the GNU @code{gettext} library, usually through the operation of 1051@code{make}, given a suitable @file{Makefile} exists for the project, 1052and the resulting executable is installed somewhere users will find it. 1053The MO files themselves should also be properly installed. Given the 1054appropriate environment variables are set (@pxref{End Users}), the 1055program should localize itself automatically, whenever it executes. 1056 1057The remainder of this manual has the purpose of explaining in depth the various 1058steps outlined above. 1059 1060@node Users, PO Files, Introduction, Top 1061@chapter The User's View 1062 1063When GNU @code{gettext} will truly have reached its goal, average users 1064should feel some kind of astonished pleasure, seeing the effect of 1065that strange kind of magic that just makes their own native language 1066appear everywhere on their screens. As for naive users, they would 1067ideally have no special pleasure about it, merely taking their own 1068language for @emph{granted}, and becoming rather unhappy otherwise. 1069 1070So, let's try to describe here how we would like the magic to operate, 1071as we want the users' view to be the simplest, among all ways one 1072could look at GNU @code{gettext}. All other software engineers: 1073programmers, translators, maintainers, should work together in such a 1074way that the magic becomes possible. This is a long and progressive 1075undertaking, and information is available about the progress of the 1076Translation Project. 1077 1078When a package is distributed, there are two kinds of users: 1079@dfn{installers} who fetch the distribution, unpack it, configure 1080it, compile it and install it for themselves or others to use; and 1081@dfn{end users} that call programs of the package, once these have 1082been installed at their site. GNU @code{gettext} is offering magic 1083for both installers and end users. 1084 1085@menu 1086* Matrix:: The Current @file{ABOUT-NLS} Matrix 1087* End Users:: Magic for End Users 1088@end menu 1089 1090@node Matrix, End Users, Users, Users 1091@section The Current @file{ABOUT-NLS} Matrix 1092@cindex Translation Matrix 1093@cindex available translations 1094@cindex @file{ABOUT-NLS} file 1095 1096Languages are not equally supported in all packages using GNU 1097@code{gettext}. To know if some package uses GNU @code{gettext}, one 1098may check the distribution for the @file{ABOUT-NLS} information file, for 1099some @file{@var{ll}.po} files, often kept together into some @file{po/} 1100directory, or for an @file{intl/} directory. Internationalized packages 1101have usually many @file{@var{ll}.po} files, where @var{ll} represents 1102the language. @ref{End Users} for a complete description of the format 1103for @var{ll}. 1104 1105More generally, a matrix is available for showing the current state 1106of the Translation Project, listing which packages are prepared for 1107multi-lingual messages, and which languages are supported by each. 1108Because this information changes often, this matrix is not kept within 1109this GNU @code{gettext} manual. This information is often found in 1110file @file{ABOUT-NLS} from various distributions, but is also as old as 1111the distribution itself. A recent copy of this @file{ABOUT-NLS} file, 1112containing up-to-date information, should generally be found on the 1113Translation Project sites, and also on most GNU archive sites. 1114 1115@node End Users, , Matrix, Users 1116@section Magic for End Users 1117@cindex setting up @code{gettext} at run time 1118@cindex selecting message language 1119@cindex language selection 1120 1121@vindex LANG@r{, environment variable} 1122We consider here those packages using GNU @code{gettext} internally, 1123and for which the installers did not disable translation at 1124@emph{configure} time. Then, users only have to set the @code{LANG} 1125environment variable to the appropriate @samp{@var{ll}_@var{CC}} 1126combination prior to using the programs in the package. @xref{Matrix}. 1127For example, let's presume a German site. At the shell prompt, users 1128merely have to execute @w{@samp{setenv LANG de_DE}} (in @code{csh}) or 1129@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}). They could even do 1130this from their @file{.login} or @file{.profile} file. 1131 1132@node PO Files, Sources, Users, Top 1133@chapter The Format of PO Files 1134@cindex PO files' format 1135@cindex file format, @file{.po} 1136 1137The GNU @code{gettext} toolset helps programmers and translators 1138at producing, updating and using translation files, mainly those 1139PO files which are textual, editable files. This chapter explains 1140the format of PO files. 1141 1142A PO file is made up of many entries, each entry holding the relation 1143between an original untranslated string and its corresponding 1144translation. All entries in a given PO file usually pertain 1145to a single project, and all translations are expressed in a single 1146target language. One PO file @dfn{entry} has the following schematic 1147structure: 1148 1149@example 1150@var{white-space} 1151# @var{translator-comments} 1152#. @var{extracted-comments} 1153#: @var{reference}@dots{} 1154#, @var{flag}@dots{} 1155#| msgid @var{previous-untranslated-string} 1156msgid @var{untranslated-string} 1157msgstr @var{translated-string} 1158@end example 1159 1160The general structure of a PO file should be well understood by 1161the translator. When using PO mode, very little has to be known 1162about the format details, as PO mode takes care of them for her. 1163 1164A simple entry can look like this: 1165 1166@example 1167#: lib/error.c:116 1168msgid "Unknown system error" 1169msgstr "Error desconegut del sistema" 1170@end example 1171 1172@cindex comments, translator 1173@cindex comments, automatic 1174@cindex comments, extracted 1175Entries begin with some optional white space. Usually, when generated 1176through GNU @code{gettext} tools, there is exactly one blank line 1177between entries. Then comments follow, on lines all starting with the 1178character @code{#}. There are two kinds of comments: those which have 1179some white space immediately following the @code{#} - the @var{translator 1180comments} -, which comments are created and maintained exclusively by the 1181translator, and those which have some non-white character just after the 1182@code{#} - the @var{automatic comments} -, which comments are created and 1183maintained automatically by GNU @code{gettext} tools. Comment lines 1184starting with @code{#.} contain comments given by the programmer, directed 1185at the translator; these comments are called @var{extracted comments} 1186because the @code{xgettext} program extracts them from the program's 1187source code. Comment lines starting with @code{#:} contain references to 1188the program's source code. Comment lines starting with @code{#,} contain 1189flags; more about these below. Comment lines starting with @code{#|} 1190contain the previous untranslated string for which the translator gave 1191a translation. 1192 1193All comments, of either kind, are optional. 1194 1195@kwindex msgid 1196@kwindex msgstr 1197After white space and comments, entries show two strings, namely 1198first the untranslated string as it appears in the original program 1199sources, and then, the translation of this string. The original 1200string is introduced by the keyword @code{msgid}, and the translation, 1201by @code{msgstr}. The two strings, untranslated and translated, 1202are quoted in various ways in the PO file, using @code{"} 1203delimiters and @code{\} escapes, but the translator does not really 1204have to pay attention to the precise quoting format, as PO mode fully 1205takes care of quoting for her. 1206 1207The @code{msgid} strings, as well as automatic comments, are produced 1208and managed by other GNU @code{gettext} tools, and PO mode does not 1209provide means for the translator to alter these. The most she can 1210do is merely deleting them, and only by deleting the whole entry. 1211On the other hand, the @code{msgstr} string, as well as translator 1212comments, are really meant for the translator, and PO mode gives her 1213the full control she needs. 1214 1215The comment lines beginning with @code{#,} are special because they are 1216not completely ignored by the programs as comments generally are. The 1217comma separated list of @var{flag}s is used by the @code{msgfmt} 1218program to give the user some better diagnostic messages. Currently 1219there are two forms of flags defined: 1220 1221@table @code 1222@item fuzzy 1223@kwindex fuzzy@r{ flag} 1224This flag can be generated by the @code{msgmerge} program or it can be 1225inserted by the translator herself. It shows that the @code{msgstr} 1226string might not be a correct translation (anymore). Only the translator 1227can judge if the translation requires further modification, or is 1228acceptable as is. Once satisfied with the translation, she then removes 1229this @code{fuzzy} attribute. The @code{msgmerge} program inserts this 1230when it combined the @code{msgid} and @code{msgstr} entries after fuzzy 1231search only. @xref{Fuzzy Entries}. 1232 1233@item c-format 1234@kwindex c-format@r{ flag} 1235@itemx no-c-format 1236@kwindex no-c-format@r{ flag} 1237These flags should not be added by a human. Instead only the 1238@code{xgettext} program adds them. In an automated PO file processing 1239system as proposed here the user changes would be thrown away again as 1240soon as the @code{xgettext} program generates a new template file. 1241 1242The @code{c-format} flag tells that the untranslated string and the 1243translation are supposed to be C format strings. The @code{no-c-format} 1244flag tells that they are not C format strings, even though the untranslated 1245string happens to look like a C format string (with @samp{%} directives). 1246 1247In case the @code{c-format} flag is given for a string the @code{msgfmt} 1248does some more tests to check to validity of the translation. 1249@xref{msgfmt Invocation}, @ref{c-format Flag} and @ref{c-format}. 1250 1251@item objc-format 1252@kwindex objc-format@r{ flag} 1253@itemx no-objc-format 1254@kwindex no-objc-format@r{ flag} 1255Likewise for Objective C, see @ref{objc-format}. 1256 1257@item sh-format 1258@kwindex sh-format@r{ flag} 1259@itemx no-sh-format 1260@kwindex no-sh-format@r{ flag} 1261Likewise for Shell, see @ref{sh-format}. 1262 1263@item python-format 1264@kwindex python-format@r{ flag} 1265@itemx no-python-format 1266@kwindex no-python-format@r{ flag} 1267Likewise for Python, see @ref{python-format}. 1268 1269@item lisp-format 1270@kwindex lisp-format@r{ flag} 1271@itemx no-lisp-format 1272@kwindex no-lisp-format@r{ flag} 1273Likewise for Lisp, see @ref{lisp-format}. 1274 1275@item elisp-format 1276@kwindex elisp-format@r{ flag} 1277@itemx no-elisp-format 1278@kwindex no-elisp-format@r{ flag} 1279Likewise for Emacs Lisp, see @ref{elisp-format}. 1280 1281@item librep-format 1282@kwindex librep-format@r{ flag} 1283@itemx no-librep-format 1284@kwindex no-librep-format@r{ flag} 1285Likewise for librep, see @ref{librep-format}. 1286 1287@item scheme-format 1288@kwindex scheme-format@r{ flag} 1289@itemx no-scheme-format 1290@kwindex no-scheme-format@r{ flag} 1291Likewise for Scheme, see @ref{scheme-format}. 1292 1293@item smalltalk-format 1294@kwindex smalltalk-format@r{ flag} 1295@itemx no-smalltalk-format 1296@kwindex no-smalltalk-format@r{ flag} 1297Likewise for Smalltalk, see @ref{smalltalk-format}. 1298 1299@item java-format 1300@kwindex java-format@r{ flag} 1301@itemx no-java-format 1302@kwindex no-java-format@r{ flag} 1303Likewise for Java, see @ref{java-format}. 1304 1305@item csharp-format 1306@kwindex csharp-format@r{ flag} 1307@itemx no-csharp-format 1308@kwindex no-csharp-format@r{ flag} 1309Likewise for C#, see @ref{csharp-format}. 1310 1311@item awk-format 1312@kwindex awk-format@r{ flag} 1313@itemx no-awk-format 1314@kwindex no-awk-format@r{ flag} 1315Likewise for awk, see @ref{awk-format}. 1316 1317@item object-pascal-format 1318@kwindex object-pascal-format@r{ flag} 1319@itemx no-object-pascal-format 1320@kwindex no-object-pascal-format@r{ flag} 1321Likewise for Object Pascal, see @ref{object-pascal-format}. 1322 1323@item ycp-format 1324@kwindex ycp-format@r{ flag} 1325@itemx no-ycp-format 1326@kwindex no-ycp-format@r{ flag} 1327Likewise for YCP, see @ref{ycp-format}. 1328 1329@item tcl-format 1330@kwindex tcl-format@r{ flag} 1331@itemx no-tcl-format 1332@kwindex no-tcl-format@r{ flag} 1333Likewise for Tcl, see @ref{tcl-format}. 1334 1335@item perl-format 1336@kwindex perl-format@r{ flag} 1337@itemx no-perl-format 1338@kwindex no-perl-format@r{ flag} 1339Likewise for Perl, see @ref{perl-format}. 1340 1341@item perl-brace-format 1342@kwindex perl-brace-format@r{ flag} 1343@itemx no-perl-brace-format 1344@kwindex no-perl-brace-format@r{ flag} 1345Likewise for Perl brace, see @ref{perl-format}. 1346 1347@item php-format 1348@kwindex php-format@r{ flag} 1349@itemx no-php-format 1350@kwindex no-php-format@r{ flag} 1351Likewise for PHP, see @ref{php-format}. 1352 1353@item gcc-internal-format 1354@kwindex gcc-internal-format@r{ flag} 1355@itemx no-gcc-internal-format 1356@kwindex no-gcc-internal-format@r{ flag} 1357Likewise for the GCC sources, see @ref{gcc-internal-format}. 1358 1359@item qt-format 1360@kwindex qt-format@r{ flag} 1361@itemx no-qt-format 1362@kwindex no-qt-format@r{ flag} 1363Likewise for Qt, see @ref{qt-format}. 1364 1365@item boost-format 1366@kwindex boost-format@r{ flag} 1367@itemx no-boost-format 1368@kwindex no-boost-format@r{ flag} 1369Likewise for Boost, see @ref{boost-format}. 1370 1371@end table 1372 1373@kwindex msgctxt 1374@cindex context, in PO files 1375It is also possible to have entries with a context specifier. They look like 1376this: 1377 1378@example 1379@var{white-space} 1380# @var{translator-comments} 1381#. @var{extracted-comments} 1382#: @var{reference}@dots{} 1383#, @var{flag}@dots{} 1384#| msgctxt @var{previous-context} 1385#| msgid @var{previous-untranslated-string} 1386msgctxt @var{context} 1387msgid @var{untranslated-string} 1388msgstr @var{translated-string} 1389@end example 1390 1391The context serves to disambiguate messages with the same 1392@var{untranslated-string}. It is possible to have several entries with 1393the same @var{untranslated-string} in a PO file, provided that they each 1394have a different @var{context}. Note that an empty @var{context} string 1395and an absent @code{msgctxt} line do not mean the same thing. 1396 1397@kwindex msgid_plural 1398@cindex plural forms, in PO files 1399A different kind of entries is used for translations which involve 1400plural forms. 1401 1402@example 1403@var{white-space} 1404# @var{translator-comments} 1405#. @var{extracted-comments} 1406#: @var{reference}@dots{} 1407#, @var{flag}@dots{} 1408#| msgid @var{previous-untranslated-string-singular} 1409#| msgid_plural @var{previous-untranslated-string-plural} 1410msgid @var{untranslated-string-singular} 1411msgid_plural @var{untranslated-string-plural} 1412msgstr[0] @var{translated-string-case-0} 1413... 1414msgstr[N] @var{translated-string-case-n} 1415@end example 1416 1417Such an entry can look like this: 1418 1419@example 1420#: src/msgcmp.c:338 src/po-lex.c:699 1421#, c-format 1422msgid "found %d fatal error" 1423msgid_plural "found %d fatal errors" 1424msgstr[0] "s'ha trobat %d error fatal" 1425msgstr[1] "s'han trobat %d errors fatals" 1426@end example 1427 1428Here also, a @code{msgctxt} context can be specified before @code{msgid}, 1429like above. 1430 1431The @var{previous-untranslated-string} is optionally inserted by the 1432@code{msgmerge} program, at the same time when it marks a message fuzzy. 1433It helps the translator to see which changes were done by the developers 1434on the @var{untranslated-string}. 1435 1436It happens that some lines, usually whitespace or comments, follow the 1437very last entry of a PO file. Such lines are not part of any entry, 1438and will be dropped when the PO file is processed by the tools, or may 1439disturb some PO file editors. 1440 1441The remainder of this section may be safely skipped by those using 1442a PO file editor, yet it may be interesting for everybody to have a better 1443idea of the precise format of a PO file. On the other hand, those 1444wishing to modify PO files by hand should carefully continue reading on. 1445 1446Each of @var{untranslated-string} and @var{translated-string} respects 1447the C syntax for a character string, including the surrounding quotes 1448and embedded backslashed escape sequences. When the time comes 1449to write multi-line strings, one should not use escaped newlines. 1450Instead, a closing quote should follow the last character on the 1451line to be continued, and an opening quote should resume the string 1452at the beginning of the following PO file line. For example: 1453 1454@example 1455msgid "" 1456"Here is an example of how one might continue a very long string\n" 1457"for the common case the string represents multi-line output.\n" 1458@end example 1459 1460@noindent 1461In this example, the empty string is used on the first line, to 1462allow better alignment of the @code{H} from the word @samp{Here} 1463over the @code{f} from the word @samp{for}. In this example, the 1464@code{msgid} keyword is followed by three strings, which are meant 1465to be concatenated. Concatenating the empty string does not change 1466the resulting overall string, but it is a way for us to comply with 1467the necessity of @code{msgid} to be followed by a string on the same 1468line, while keeping the multi-line presentation left-justified, as 1469we find this to be a cleaner disposition. The empty string could have 1470been omitted, but only if the string starting with @samp{Here} was 1471promoted on the first line, right after @code{msgid}.@footnote{This 1472limitation is not imposed by GNU @code{gettext}, but is for compatibility 1473with the @code{msgfmt} implementation on Solaris.} It was not really necessary 1474either to switch between the two last quoted strings immediately after 1475the newline @samp{\n}, the switch could have occurred after @emph{any} 1476other character, we just did it this way because it is neater. 1477 1478@cindex newlines in PO files 1479One should carefully distinguish between end of lines marked as 1480@samp{\n} @emph{inside} quotes, which are part of the represented 1481string, and end of lines in the PO file itself, outside string quotes, 1482which have no incidence on the represented string. 1483 1484@cindex comments in PO files 1485Outside strings, white lines and comments may be used freely. 1486Comments start at the beginning of a line with @samp{#} and extend 1487until the end of the PO file line. Comments written by translators 1488should have the initial @samp{#} immediately followed by some white 1489space. If the @samp{#} is not immediately followed by white space, 1490this comment is most likely generated and managed by specialized GNU 1491tools, and might disappear or be replaced unexpectedly when the PO 1492file is given to @code{msgmerge}. 1493 1494@node Sources, Template, PO Files, Top 1495@chapter Preparing Program Sources 1496@cindex preparing programs for translation 1497 1498@c FIXME: Rewrite (the whole chapter). 1499 1500For the programmer, changes to the C source code fall into three 1501categories. First, you have to make the localization functions 1502known to all modules needing message translation. Second, you should 1503properly trigger the operation of GNU @code{gettext} when the program 1504initializes, usually from the @code{main} function. Last, you should 1505identify, adjust and mark all constant strings in your program 1506needing translation. 1507 1508@menu 1509* Importing:: Importing the @code{gettext} declaration 1510* Triggering:: Triggering @code{gettext} Operations 1511* Preparing Strings:: Preparing Translatable Strings 1512* Mark Keywords:: How Marks Appear in Sources 1513* Marking:: Marking Translatable Strings 1514* c-format Flag:: Telling something about the following string 1515* Special cases:: Special Cases of Translatable Strings 1516* Names:: Marking Proper Names for Translation 1517* Libraries:: Preparing Library Sources 1518@end menu 1519 1520@node Importing, Triggering, Sources, Sources 1521@section Importing the @code{gettext} declaration 1522 1523Presuming that your set of programs, or package, has been adjusted 1524so all needed GNU @code{gettext} files are available, and your 1525@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module 1526having translated C strings should contain the line: 1527 1528@cindex include file @file{libintl.h} 1529@example 1530#include <libintl.h> 1531@end example 1532 1533Similarly, each C module containing @code{printf()}/@code{fprintf()}/... 1534calls with a format string that could be a translated C string (even if 1535the C string comes from a different C module) should contain the line: 1536 1537@example 1538#include <libintl.h> 1539@end example 1540 1541@node Triggering, Preparing Strings, Importing, Sources 1542@section Triggering @code{gettext} Operations 1543 1544@cindex initialization 1545The initialization of locale data should be done with more or less 1546the same code in every program, as demonstrated below: 1547 1548@example 1549@group 1550int 1551main (int argc, char *argv[]) 1552@{ 1553 @dots{} 1554 setlocale (LC_ALL, ""); 1555 bindtextdomain (PACKAGE, LOCALEDIR); 1556 textdomain (PACKAGE); 1557 @dots{} 1558@} 1559@end group 1560@end example 1561 1562@var{PACKAGE} and @var{LOCALEDIR} should be provided either by 1563@file{config.h} or by the Makefile. For now consult the @code{gettext} 1564or @code{hello} sources for more information. 1565 1566@cindex locale facet, LC_ALL 1567@cindex locale facet, LC_CTYPE 1568The use of @code{LC_ALL} might not be appropriate for you. 1569@code{LC_ALL} includes all locale categories and especially 1570@code{LC_CTYPE}. This later category is responsible for determining 1571character classes with the @code{isalnum} etc. functions from 1572@file{ctype.h} which could especially for programs, which process some 1573kind of input language, be wrong. For example this would mean that a 1574source code using the @,{c} (c-cedilla character) is runnable in 1575France but not in the U.S. 1576 1577Some systems also have problems with parsing numbers using the 1578@code{scanf} functions if an other but the @code{LC_ALL} locale is used. 1579The standards say that additional formats but the one known in the 1580@code{"C"} locale might be recognized. But some systems seem to reject 1581numbers in the @code{"C"} locale format. In some situation, it might 1582also be a problem with the notation itself which makes it impossible to 1583recognize whether the number is in the @code{"C"} locale or the local 1584format. This can happen if thousands separator characters are used. 1585Some locales define this character according to the national 1586conventions to @code{'.'} which is the same character used in the 1587@code{"C"} locale to denote the decimal point. 1588 1589So it is sometimes necessary to replace the @code{LC_ALL} line in the 1590code above by a sequence of @code{setlocale} lines 1591 1592@example 1593@group 1594@{ 1595 @dots{} 1596 setlocale (LC_CTYPE, ""); 1597 setlocale (LC_MESSAGES, ""); 1598 @dots{} 1599@} 1600@end group 1601@end example 1602 1603@cindex locale facet, LC_CTYPE 1604@cindex locale facet, LC_COLLATE 1605@cindex locale facet, LC_MONETARY 1606@cindex locale facet, LC_NUMERIC 1607@cindex locale facet, LC_TIME 1608@cindex locale facet, LC_MESSAGES 1609@cindex locale facet, LC_RESPONSES 1610@noindent 1611On all POSIX conformant systems the locale categories @code{LC_CTYPE}, 1612@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY}, 1613@code{LC_NUMERIC}, and @code{LC_TIME} are available. On some systems 1614which are only ISO C compliant, @code{LC_MESSAGES} is missing, but 1615a substitute for it is defined in GNU gettext's @code{<libintl.h>}. 1616 1617Note that changing the @code{LC_CTYPE} also affects the functions 1618declared in the @code{<ctype.h>} standard header. If this is not 1619desirable in your application (for example in a compiler's parser), 1620you can use a set of substitute functions which hardwire the C locale, 1621such as found in the @code{<c-ctype.h>} and @code{<c-ctype.c>} files 1622in the gettext source distribution. 1623 1624It is also possible to switch the locale forth and back between the 1625environment dependent locale and the C locale, but this approach is 1626normally avoided because a @code{setlocale} call is expensive, 1627because it is tedious to determine the places where a locale switch 1628is needed in a large program's source, and because switching a locale 1629is not multithread-safe. 1630 1631@node Preparing Strings, Mark Keywords, Triggering, Sources 1632@section Preparing Translatable Strings 1633 1634@cindex marking strings, preparations 1635Before strings can be marked for translations, they sometimes need to 1636be adjusted. Usually preparing a string for translation is done right 1637before marking it, during the marking phase which is described in the 1638next sections. What you have to keep in mind while doing that is the 1639following. 1640 1641@itemize @bullet 1642@item 1643Decent English style. 1644 1645@item 1646Entire sentences. 1647 1648@item 1649Split at paragraphs. 1650 1651@item 1652Use format strings instead of string concatenation. 1653 1654@item 1655Avoid unusual markup and unusual control characters. 1656@end itemize 1657 1658@noindent 1659Let's look at some examples of these guidelines. 1660 1661@cindex style 1662Translatable strings should be in good English style. If slang language 1663with abbreviations and shortcuts is used, often translators will not 1664understand the message and will produce very inappropriate translations. 1665 1666@example 1667"%s: is parameter\n" 1668@end example 1669 1670@noindent 1671This is nearly untranslatable: Is the displayed item @emph{a} parameter or 1672@emph{the} parameter? 1673 1674@example 1675"No match" 1676@end example 1677 1678@noindent 1679The ambiguity in this message makes it unintelligible: Is the program 1680attempting to set something on fire? Does it mean "The given object does 1681not match the template"? Does it mean "The template does not fit for any 1682of the objects"? 1683 1684@cindex ambiguities 1685In both cases, adding more words to the message will help both the 1686translator and the English speaking user. 1687 1688@cindex sentences 1689Translatable strings should be entire sentences. It is often not possible 1690to translate single verbs or adjectives in a substitutable way. 1691 1692@example 1693printf ("File %s is %s protected", filename, rw ? "write" : "read"); 1694@end example 1695 1696@noindent 1697Most translators will not look at the source and will thus only see the 1698string @code{"File %s is %s protected"}, which is unintelligible. Change 1699this to 1700 1701@example 1702printf (rw ? "File %s is write protected" : "File %s is read protected", 1703 filename); 1704@end example 1705 1706@noindent 1707This way the translator will not only understand the message, she will 1708also be able to find the appropriate grammatical construction. A French 1709translator for example translates "write protected" like "protected 1710against writing". 1711 1712Entire sentences are also important because in many languages, the 1713declination of some word in a sentence depends on the gender or the 1714number (singular/plural) of another part of the sentence. There are 1715usually more interdependencies between words than in English. The 1716consequence is that asking a translator to translate two half-sentences 1717and then combining these two half-sentences through dumb string concatenation 1718will not work, for many languages, even though it would work for English. 1719That's why translators need to handle entire sentences. 1720 1721Often sentences don't fit into a single line. If a sentence is output 1722using two subsequent @code{printf} statements, like this 1723 1724@example 1725printf ("Locale charset \"%s\" is different from\n", lcharset); 1726printf ("input file charset \"%s\".\n", fcharset); 1727@end example 1728 1729@noindent 1730the translator would have to translate two half sentences, but nothing 1731in the POT file would tell her that the two half sentences belong together. 1732It is necessary to merge the two @code{printf} statements so that the 1733translator can handle the entire sentence at once and decide at which 1734place to insert a line break in the translation (if at all): 1735 1736@example 1737printf ("Locale charset \"%s\" is different from\n\ 1738input file charset \"%s\".\n", lcharset, fcharset); 1739@end example 1740 1741You may now ask: how about two or more adjacent sentences? Like in this case: 1742 1743@example 1744puts ("Apollo 13 scenario: Stack overflow handling failed."); 1745puts ("On the next stack overflow we will crash!!!"); 1746@end example 1747 1748@noindent 1749Should these two statements merged into a single one? I would recommend to 1750merge them if the two sentences are related to each other, because then it 1751makes it easier for the translator to understand and translate both. On 1752the other hand, if one of the two messages is a stereotypic one, occurring 1753in other places as well, you will do a favour to the translator by not 1754merging the two. (Identical messages occurring in several places are 1755combined by xgettext, so the translator has to handle them once only.) 1756 1757@cindex paragraphs 1758Translatable strings should be limited to one paragraph; don't let a 1759single message be longer than ten lines. The reason is that when the 1760translatable string changes, the translator is faced with the task of 1761updating the entire translated string. Maybe only a single word will 1762have changed in the English string, but the translator doesn't see that 1763(with the current translation tools), therefore she has to proofread 1764the entire message. 1765 1766@cindex help option 1767Many GNU programs have a @samp{--help} output that extends over several 1768screen pages. It is a courtesy towards the translators to split such a 1769message into several ones of five to ten lines each. While doing that, 1770you can also attempt to split the documented options into groups, 1771such as the input options, the output options, and the informative 1772output options. This will help every user to find the option he is 1773looking for. 1774 1775@cindex string concatenation 1776@cindex concatenation of strings 1777Hardcoded string concatenation is sometimes used to construct English 1778strings: 1779 1780@example 1781strcpy (s, "Replace "); 1782strcat (s, object1); 1783strcat (s, " with "); 1784strcat (s, object2); 1785strcat (s, "?"); 1786@end example 1787 1788@noindent 1789In order to present to the translator only entire sentences, and also 1790because in some languages the translator might want to swap the order 1791of @code{object1} and @code{object2}, it is necessary to change this 1792to use a format string: 1793 1794@example 1795sprintf (s, "Replace %s with %s?", object1, object2); 1796@end example 1797 1798@cindex @code{inttypes.h} 1799A similar case is compile time concatenation of strings. The ISO C 99 1800include file @code{<inttypes.h>} contains a macro @code{PRId64} that 1801can be used as a formatting directive for outputting an @samp{int64_t} 1802integer through @code{printf}. It expands to a constant string, usually 1803"d" or "ld" or "lld" or something like this, depending on the platform. 1804Assume you have code like 1805 1806@example 1807printf ("The amount is %0" PRId64 "\n", number); 1808@end example 1809 1810@noindent 1811The @code{gettext} tools and library have special support for these 1812@code{<inttypes.h>} macros. You can therefore simply write 1813 1814@example 1815printf (gettext ("The amount is %0" PRId64 "\n"), number); 1816@end example 1817 1818@noindent 1819The PO file will contain the string "The amount is %0<PRId64>\n". 1820The translators will provide a translation containing "%0<PRId64>" 1821as well, and at runtime the @code{gettext} function's result will 1822contain the appropriate constant string, "d" or "ld" or "lld". 1823 1824This works only for the predefined @code{<inttypes.h>} macros. If 1825you have defined your own similar macros, let's say @samp{MYPRId64}, 1826that are not known to @code{xgettext}, the solution for this problem 1827is to change the code like this: 1828 1829@example 1830char buf1[100]; 1831sprintf (buf1, "%0" MYPRId64, number); 1832printf (gettext ("The amount is %s\n"), buf1); 1833@end example 1834 1835This means, you put the platform dependent code in one statement, and the 1836internationalization code in a different statement. Note that a buffer length 1837of 100 is safe, because all available hardware integer types are limited to 1838128 bits, and to print a 128 bit integer one needs at most 54 characters, 1839regardless whether in decimal, octal or hexadecimal. 1840 1841@cindex Java, string concatenation 1842@cindex C#, string concatenation 1843All this applies to other programming languages as well. For example, in 1844Java and C#, string concatenation is very frequently used, because it is a 1845compiler built-in operator. Like in C, in Java, you would change 1846 1847@example 1848System.out.println("Replace "+object1+" with "+object2+"?"); 1849@end example 1850 1851@noindent 1852into a statement involving a format string: 1853 1854@example 1855System.out.println( 1856 MessageFormat.format("Replace @{0@} with @{1@}?", 1857 new Object[] @{ object1, object2 @})); 1858@end example 1859 1860@noindent 1861Similarly, in C#, you would change 1862 1863@example 1864Console.WriteLine("Replace "+object1+" with "+object2+"?"); 1865@end example 1866 1867@noindent 1868into a statement involving a format string: 1869 1870@example 1871Console.WriteLine( 1872 String.Format("Replace @{0@} with @{1@}?", object1, object2)); 1873@end example 1874 1875@cindex markup 1876@cindex control characters 1877Unusual markup or control characters should not be used in translatable 1878strings. Translators will likely not understand the particular meaning 1879of the markup or control characters. 1880 1881For example, if you have a convention that @samp{|} delimits the 1882left-hand and right-hand part of some GUI elements, translators will 1883often not understand it without specific comments. It might be 1884better to have the translator translate the left-hand and right-hand 1885part separately. 1886 1887Another example is the @samp{argp} convention to use a single @samp{\v} 1888(vertical tab) control character to delimit two sections inside a 1889string. This is flawed. Some translators may convert it to a simple 1890newline, some to blank lines. With some PO file editors it may not be 1891easy to even enter a vertical tab control character. So, you cannot 1892be sure that the translation will contain a @samp{\v} character, at the 1893corresponding position. The solution is, again, to let the translator 1894translate two separate strings and combine at run-time the two translated 1895strings with the @samp{\v} required by the convention. 1896 1897HTML markup, however, is common enough that it's probably ok to use in 1898translatable strings. But please bear in mind that the GNU gettext tools 1899don't verify that the translations are well-formed HTML. 1900 1901@node Mark Keywords, Marking, Preparing Strings, Sources 1902@section How Marks Appear in Sources 1903@cindex marking strings that require translation 1904 1905All strings requiring translation should be marked in the C sources. Marking 1906is done in such a way that each translatable string appears to be 1907the sole argument of some function or preprocessor macro. There are 1908only a few such possible functions or macros meant for translation, 1909and their names are said to be marking keywords. The marking is 1910attached to strings themselves, rather than to what we do with them. 1911This approach has more uses. A blatant example is an error message 1912produced by formatting. The format string needs translation, as 1913well as some strings inserted through some @samp{%s} specification 1914in the format, while the result from @code{sprintf} may have so many 1915different instances that it is impractical to list them all in some 1916@samp{error_string_out()} routine, say. 1917 1918This marking operation has two goals. The first goal of marking 1919is for triggering the retrieval of the translation, at run time. 1920The keyword is possibly resolved into a routine able to dynamically 1921return the proper translation, as far as possible or wanted, for the 1922argument string. Most localizable strings are found in executable 1923positions, that is, attached to variables or given as parameters to 1924functions. But this is not universal usage, and some translatable 1925strings appear in structured initializations. @xref{Special cases}. 1926 1927The second goal of the marking operation is to help @code{xgettext} 1928at properly extracting all translatable strings when it scans a set 1929of program sources and produces PO file templates. 1930 1931The canonical keyword for marking translatable strings is 1932@samp{gettext}, it gave its name to the whole GNU @code{gettext} 1933package. For packages making only light use of the @samp{gettext} 1934keyword, macro or function, it is easily used @emph{as is}. However, 1935for packages using the @code{gettext} interface more heavily, it 1936is usually more convenient to give the main keyword a shorter, less 1937obtrusive name. Indeed, the keyword might appear on a lot of strings 1938all over the package, and programmers usually do not want nor need 1939their program sources to remind them forcefully, all the time, that they 1940are internationalized. Further, a long keyword has the disadvantage 1941of using more horizontal space, forcing more indentation work on 1942sources for those trying to keep them within 79 or 80 columns. 1943 1944@cindex @code{_}, a macro to mark strings for translation 1945Many packages use @samp{_} (a simple underline) as a keyword, 1946and write @samp{_("Translatable string")} instead of @samp{gettext 1947("Translatable string")}. Further, the coding rule, from GNU standards, 1948wanting that there is a space between the keyword and the opening 1949parenthesis is relaxed, in practice, for this particular usage. 1950So, the textual overhead per translatable string is reduced to 1951only three characters: the underline and the two parentheses. 1952However, even if GNU @code{gettext} uses this convention internally, 1953it does not offer it officially. The real, genuine keyword is truly 1954@samp{gettext} indeed. It is fairly easy for those wanting to use 1955@samp{_} instead of @samp{gettext} to declare: 1956 1957@example 1958#include <libintl.h> 1959#define _(String) gettext (String) 1960@end example 1961 1962@noindent 1963instead of merely using @samp{#include <libintl.h>}. 1964 1965The marking keywords @samp{gettext} and @samp{_} take the translatable 1966string as sole argument. It is also possible to define marking functions 1967that take it at another argument position. It is even possible to make 1968the marked argument position depend on the total number of arguments of 1969the function call; this is useful in C++. All this is achieved using 1970@code{xgettext}'s @samp{--keyword} option. 1971 1972Note also that long strings can be split across lines, into multiple 1973adjacent string tokens. Automatic string concatenation is performed 1974at compile time according to ISO C and ISO C++; @code{xgettext} also 1975supports this syntax. 1976 1977Later on, the maintenance is relatively easy. If, as a programmer, 1978you add or modify a string, you will have to ask yourself if the 1979new or altered string requires translation, and include it within 1980@samp{_()} if you think it should be translated. For example, @samp{"%s"} 1981is an example of string @emph{not} requiring translation. But 1982@samp{"%s: %d"} @emph{does} require translation, because in French, unlike 1983in English, it's customary to put a space before a colon. 1984 1985@node Marking, c-format Flag, Mark Keywords, Sources 1986@section Marking Translatable Strings 1987@emindex marking strings for translation 1988 1989In PO mode, one set of features is meant more for the programmer than 1990for the translator, and allows him to interactively mark which strings, 1991in a set of program sources, are translatable, and which are not. 1992Even if it is a fairly easy job for a programmer to find and mark 1993such strings by other means, using any editor of his choice, PO mode 1994makes this work more comfortable. Further, this gives translators 1995who feel a little like programmers, or programmers who feel a little 1996like translators, a tool letting them work at marking translatable 1997strings in the program sources, while simultaneously producing a set of 1998translation in some language, for the package being internationalized. 1999 2000@emindex @code{etags}, using for marking strings 2001The set of program sources, targeted by the PO mode commands describe 2002here, should have an Emacs tags table constructed for your project, 2003prior to using these PO file commands. This is easy to do. In any 2004shell window, change the directory to the root of your project, then 2005execute a command resembling: 2006 2007@example 2008etags src/*.[hc] lib/*.[hc] 2009@end example 2010 2011@noindent 2012presuming here you want to process all @file{.h} and @file{.c} files 2013from the @file{src/} and @file{lib/} directories. This command will 2014explore all said files and create a @file{TAGS} file in your root 2015directory, somewhat summarizing the contents using a special file 2016format Emacs can understand. 2017 2018@emindex @file{TAGS}, and marking translatable strings 2019For packages following the GNU coding standards, there is 2020a make goal @code{tags} or @code{TAGS} which constructs the tag files in 2021all directories and for all files containing source code. 2022 2023Once your @file{TAGS} file is ready, the following commands assist 2024the programmer at marking translatable strings in his set of sources. 2025But these commands are necessarily driven from within a PO file 2026window, and it is likely that you do not even have such a PO file yet. 2027This is not a problem at all, as you may safely open a new, empty PO 2028file, mainly for using these commands. This empty PO file will slowly 2029fill in while you mark strings as translatable in your program sources. 2030 2031@table @kbd 2032@item , 2033@efindex ,@r{, PO Mode command} 2034Search through program sources for a string which looks like a 2035candidate for translation (@code{po-tags-search}). 2036 2037@item M-, 2038@efindex M-,@r{, PO Mode command} 2039Mark the last string found with @samp{_()} (@code{po-mark-translatable}). 2040 2041@item M-. 2042@efindex M-.@r{, PO Mode command} 2043Mark the last string found with a keyword taken from a set of possible 2044keywords. This command with a prefix allows some management of these 2045keywords (@code{po-select-mark-and-mark}). 2046 2047@end table 2048 2049@efindex po-tags-search@r{, PO Mode command} 2050The @kbd{,} (@code{po-tags-search}) command searches for the next 2051occurrence of a string which looks like a possible candidate for 2052translation, and displays the program source in another Emacs window, 2053positioned in such a way that the string is near the top of this other 2054window. If the string is too big to fit whole in this window, it is 2055positioned so only its end is shown. In any case, the cursor 2056is left in the PO file window. If the shown string would be better 2057presented differently in different native languages, you may mark it 2058using @kbd{M-,} or @kbd{M-.}. Otherwise, you might rather ignore it 2059and skip to the next string by merely repeating the @kbd{,} command. 2060 2061A string is a good candidate for translation if it contains a sequence 2062of three or more letters. A string containing at most two letters in 2063a row will be considered as a candidate if it has more letters than 2064non-letters. The command disregards strings containing no letters, 2065or isolated letters only. It also disregards strings within comments, 2066or strings already marked with some keyword PO mode knows (see below). 2067 2068If you have never told Emacs about some @file{TAGS} file to use, the 2069command will request that you specify one from the minibuffer, the 2070first time you use the command. You may later change your @file{TAGS} 2071file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}}, 2072which will ask you to name the precise @file{TAGS} file you want 2073to use. @xref{Tags, , Tag Tables, emacs, The Emacs Editor}. 2074 2075Each time you use the @kbd{,} command, the search resumes from where it was 2076left by the previous search, and goes through all program sources, 2077obeying the @file{TAGS} file, until all sources have been processed. 2078However, by giving a prefix argument to the command @w{(@kbd{C-u 2079,})}, you may request that the search be restarted all over again 2080from the first program source; but in this case, strings that you 2081recently marked as translatable will be automatically skipped. 2082 2083Using this @kbd{,} command does not prevent using of other regular 2084Emacs tags commands. For example, regular @code{tags-search} or 2085@code{tags-query-replace} commands may be used without disrupting the 2086independent @kbd{,} search sequence. However, as implemented, the 2087@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a 2088prefix) might also reinitialize the regular Emacs tags searching to the 2089first tags file, this reinitialization might be considered spurious. 2090 2091@efindex po-mark-translatable@r{, PO Mode command} 2092@efindex po-select-mark-and-mark@r{, PO Mode command} 2093The @kbd{M-,} (@code{po-mark-translatable}) command will mark the 2094recently found string with the @samp{_} keyword. The @kbd{M-.} 2095(@code{po-select-mark-and-mark}) command will request that you type 2096one keyword from the minibuffer and use that keyword for marking 2097the string. Both commands will automatically create a new PO file 2098untranslated entry for the string being marked, and make it the 2099current entry (making it easy for you to immediately proceed to its 2100translation, if you feel like doing it right away). It is possible 2101that the modifications made to the program source by @kbd{M-,} or 2102@kbd{M-.} render some source line longer than 80 columns, forcing you 2103to break and re-indent this line differently. You may use the @kbd{O} 2104command from PO mode, or any other window changing command from 2105Emacs, to break out into the program source window, and do any 2106needed adjustments. You will have to use some regular Emacs command 2107to return the cursor to the PO file window, if you want command 2108@kbd{,} for the next string, say. 2109 2110The @kbd{M-.} command has a few built-in speedups, so you do not 2111have to explicitly type all keywords all the time. The first such 2112speedup is that you are presented with a @emph{preferred} keyword, 2113which you may accept by merely typing @kbd{@key{RET}} at the prompt. 2114The second speedup is that you may type any non-ambiguous prefix of the 2115keyword you really mean, and the command will complete it automatically 2116for you. This also means that PO mode has to @emph{know} all 2117your possible keywords, and that it will not accept mistyped keywords. 2118 2119If you reply @kbd{?} to the keyword request, the command gives a 2120list of all known keywords, from which you may choose. When the 2121command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits 2122updating any program source or PO file buffer, and does some simple 2123keyword management instead. In this case, the command asks for a 2124keyword, written in full, which becomes a new allowed keyword for 2125later @kbd{M-.} commands. Moreover, this new keyword automatically 2126becomes the @emph{preferred} keyword for later commands. By typing 2127an already known keyword in response to @w{@kbd{C-u M-.}}, one merely 2128changes the @emph{preferred} keyword and does nothing more. 2129 2130All keywords known for @kbd{M-.} are recognized by the @kbd{,} command 2131when scanning for strings, and strings already marked by any of those 2132known keywords are automatically skipped. If many PO files are opened 2133simultaneously, each one has its own independent set of known keywords. 2134There is no provision in PO mode, currently, for deleting a known 2135keyword, you have to quit the file (maybe using @kbd{q}) and reopen 2136it afresh. When a PO file is newly brought up in an Emacs window, only 2137@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext} 2138is preferred for the @kbd{M-.} command. In fact, this is not useful to 2139prefer @samp{_}, as this one is already built in the @kbd{M-,} command. 2140 2141@node c-format Flag, Special cases, Marking, Sources 2142@section Special Comments preceding Keywords 2143 2144@c FIXME document c-format and no-c-format. 2145 2146@cindex format strings 2147In C programs strings are often used within calls of functions from the 2148@code{printf} family. The special thing about these format strings is 2149that they can contain format specifiers introduced with @kbd{%}. Assume 2150we have the code 2151 2152@example 2153printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); 2154@end example 2155 2156@noindent 2157A possible German translation for the above string might be: 2158 2159@example 2160"%d Zeichen lang ist die Zeichenkette `%s'" 2161@end example 2162 2163A C programmer, even if he cannot speak German, will recognize that 2164there is something wrong here. The order of the two format specifiers 2165is changed but of course the arguments in the @code{printf} don't have. 2166This will most probably lead to problems because now the length of the 2167string is regarded as the address. 2168 2169To prevent errors at runtime caused by translations the @code{msgfmt} 2170tool can check statically whether the arguments in the original and the 2171translation string match in type and number. If this is not the case 2172and the @samp{-c} option has been passed to @code{msgfmt}, @code{msgfmt} 2173will give an error and refuse to produce a MO file. Thus consequent 2174use of @samp{msgfmt -c} will catch the error, so that it cannot cause 2175cause problems at runtime. 2176 2177@noindent 2178If the word order in the above German translation would be correct one 2179would have to write 2180 2181@example 2182"%2$d Zeichen lang ist die Zeichenkette `%1$s'" 2183@end example 2184 2185@noindent 2186The routines in @code{msgfmt} know about this special notation. 2187 2188Because not all strings in a program must be format strings it is not 2189useful for @code{msgfmt} to test all the strings in the @file{.po} file. 2190This might cause problems because the string might contain what looks 2191like a format specifier, but the string is not used in @code{printf}. 2192 2193Therefore the @code{xgettext} adds a special tag to those messages it 2194thinks might be a format string. There is no absolute rule for this, 2195only a heuristic. In the @file{.po} file the entry is marked using the 2196@code{c-format} flag in the @code{#,} comment line (@pxref{PO Files}). 2197 2198@kwindex c-format@r{, and @code{xgettext}} 2199@kwindex no-c-format@r{, and @code{xgettext}} 2200The careful reader now might say that this again can cause problems. 2201The heuristic might guess it wrong. This is true and therefore 2202@code{xgettext} knows about a special kind of comment which lets 2203the programmer take over the decision. If in the same line as or 2204the immediately preceding line to the @code{gettext} keyword 2205the @code{xgettext} program finds a comment containing the words 2206@code{xgettext:c-format}, it will mark the string in any case with 2207the @code{c-format} flag. This kind of comment should be used when 2208@code{xgettext} does not recognize the string as a format string but 2209it really is one and it should be tested. Please note that when the 2210comment is in the same line as the @code{gettext} keyword, it must be 2211before the string to be translated. 2212 2213This situation happens quite often. The @code{printf} function is often 2214called with strings which do not contain a format specifier. Of course 2215one would normally use @code{fputs} but it does happen. In this case 2216@code{xgettext} does not recognize this as a format string but what 2217happens if the translation introduces a valid format specifier? The 2218@code{printf} function will try to access one of the parameters but none 2219exists because the original code does not pass any parameters. 2220 2221@code{xgettext} of course could make a wrong decision the other way 2222round, i.e.@: a string marked as a format string actually is not a format 2223string. In this case the @code{msgfmt} might give too many warnings and 2224would prevent translating the @file{.po} file. The method to prevent 2225this wrong decision is similar to the one used above, only the comment 2226to use must contain the string @code{xgettext:no-c-format}. 2227 2228If a string is marked with @code{c-format} and this is not correct the 2229user can find out who is responsible for the decision. See 2230@ref{xgettext Invocation} to see how the @code{--debug} option can be 2231used for solving this problem. 2232 2233@node Special cases, Names, c-format Flag, Sources 2234@section Special Cases of Translatable Strings 2235 2236@cindex marking string initializers 2237The attentive reader might now point out that it is not always possible 2238to mark translatable string with @code{gettext} or something like this. 2239Consider the following case: 2240 2241@example 2242@group 2243@{ 2244 static const char *messages[] = @{ 2245 "some very meaningful message", 2246 "and another one" 2247 @}; 2248 const char *string; 2249 @dots{} 2250 string 2251 = index > 1 ? "a default message" : messages[index]; 2252 2253 fputs (string); 2254 @dots{} 2255@} 2256@end group 2257@end example 2258 2259While it is no problem to mark the string @code{"a default message"} it 2260is not possible to mark the string initializers for @code{messages}. 2261What is to be done? We have to fulfill two tasks. First we have to mark the 2262strings so that the @code{xgettext} program (@pxref{xgettext Invocation}) 2263can find them, and second we have to translate the string at runtime 2264before printing them. 2265 2266The first task can be fulfilled by creating a new keyword, which names a 2267no-op. For the second we have to mark all access points to a string 2268from the array. So one solution can look like this: 2269 2270@example 2271@group 2272#define gettext_noop(String) String 2273 2274@{ 2275 static const char *messages[] = @{ 2276 gettext_noop ("some very meaningful message"), 2277 gettext_noop ("and another one") 2278 @}; 2279 const char *string; 2280 @dots{} 2281 string 2282 = index > 1 ? gettext ("a default message") : gettext (messages[index]); 2283 2284 fputs (string); 2285 @dots{} 2286@} 2287@end group 2288@end example 2289 2290Please convince yourself that the string which is written by 2291@code{fputs} is translated in any case. How to get @code{xgettext} know 2292the additional keyword @code{gettext_noop} is explained in @ref{xgettext 2293Invocation}. 2294 2295The above is of course not the only solution. You could also come along 2296with the following one: 2297 2298@example 2299@group 2300#define gettext_noop(String) String 2301 2302@{ 2303 static const char *messages[] = @{ 2304 gettext_noop ("some very meaningful message", 2305 gettext_noop ("and another one") 2306 @}; 2307 const char *string; 2308 @dots{} 2309 string 2310 = index > 1 ? gettext_noop ("a default message") : messages[index]; 2311 2312 fputs (gettext (string)); 2313 @dots{} 2314@} 2315@end group 2316@end example 2317 2318But this has a drawback. The programmer has to take care that 2319he uses @code{gettext_noop} for the string @code{"a default message"}. 2320A use of @code{gettext} could have in rare cases unpredictable results. 2321 2322One advantage is that you need not make control flow analysis to make 2323sure the output is really translated in any case. But this analysis is 2324generally not very difficult. If it should be in any situation you can 2325use this second method in this situation. 2326 2327@node Names, Libraries, Special cases, Sources 2328@section Marking Proper Names for Translation 2329 2330Should names of persons, cities, locations etc. be marked for translation 2331or not? People who only know languages that can be written with Latin 2332letters (English, Spanish, French, German, etc.) are tempted to say ``no'', 2333because names usually do not change when transported between these languages. 2334However, in general when translating from one script to another, names 2335are translated too, usually phonetically or by transliteration. For 2336example, Russian or Greek names are converted to the Latin alphabet when 2337being translated to English, and English or French names are converted 2338to the Katakana script when being translated to Japanese. This is 2339necessary because the speakers of the target language in general cannot 2340read the script the name is originally written in. 2341 2342As a programmer, you should therefore make sure that names are marked 2343for translation, with a special comment telling the translators that it 2344is a proper name and how to pronounce it. Like this: 2345 2346@example 2347@group 2348printf (_("Written by %s.\n"), 2349 /* TRANSLATORS: This is a proper name. See the gettext 2350 manual, section Names. Note this is actually a non-ASCII 2351 name: The first name is (with Unicode escapes) 2352 "Fran\u00e7ois" or (with HTML entities) "François". 2353 Pronunciation is like "fraa-swa pee-nar". */ 2354 _("Francois Pinard")); 2355@end group 2356@end example 2357 2358As a translator, you should use some care when translating names, because 2359it is frustrating if people see their names mutilated or distorted. If 2360your language uses the Latin script, all you need to do is to reproduce 2361the name as perfectly as you can within the usual character set of your 2362language. In this particular case, this means to provide a translation 2363containing the c-cedilla character. If your language uses a different 2364script and the people speaking it don't usually read Latin words, it means 2365transliteration; but you should still give, in parentheses, the original 2366writing of the name -- for the sake of the people that do read the Latin 2367script. Here is an example, using Greek as the target script: 2368 2369@example 2370@group 2371#. This is a proper name. See the gettext 2372#. manual, section Names. Note this is actually a non-ASCII 2373#. name: The first name is (with Unicode escapes) 2374#. "Fran\u00e7ois" or (with HTML entities) "François". 2375#. Pronunciation is like "fraa-swa pee-nar". 2376msgid "Francois Pinard" 2377msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" 2378 " (Francois Pinard)" 2379@end group 2380@end example 2381 2382Because translation of names is such a sensitive domain, it is a good 2383idea to test your translation before submitting it. 2384 2385The translation project @url{http://sourceforge.net/projects/translation} 2386has set up a POT file and translation domain consisting of program author 2387names, with better facilities for the translator than those presented here. 2388Namely, there the original name is written directly in Unicode (rather 2389than with Unicode escapes or HTML entities), and the pronunciation is 2390denoted using the International Phonetic Alphabet (see 2391@url{http://www.wikipedia.org/wiki/International_Phonetic_Alphabet}). 2392 2393However, we don't recommend this approach for all POT files in all packages, 2394because this would force translators to use PO files in UTF-8 encoding, 2395which is - in the current state of software (as of 2003) - a major hassle 2396for translators using GNU Emacs or XEmacs with po-mode. 2397 2398@node Libraries, , Names, Sources 2399@section Preparing Library Sources 2400 2401When you are preparing a library, not a program, for the use of 2402@code{gettext}, only a few details are different. Here we assume that 2403the library has a translation domain and a POT file of its own. (If 2404it uses the translation domain and POT file of the main program, then 2405the previous sections apply without changes.) 2406 2407@enumerate 2408@item 2409The library code doesn't call @code{setlocale (LC_ALL, "")}. It's the 2410responsibility of the main program to set the locale. The library's 2411documentation should mention this fact, so that developers of programs 2412using the library are aware of it. 2413 2414@item 2415The library code doesn't call @code{textdomain (PACKAGE)}, because it 2416would interfere with the text domain set by the main program. 2417 2418@item 2419The initialization code for a program was 2420 2421@smallexample 2422 setlocale (LC_ALL, ""); 2423 bindtextdomain (PACKAGE, LOCALEDIR); 2424 textdomain (PACKAGE); 2425@end smallexample 2426 2427@noindent 2428For a library it is reduced to 2429 2430@smallexample 2431 bindtextdomain (PACKAGE, LOCALEDIR); 2432@end smallexample 2433 2434@noindent 2435If your library's API doesn't already have an initialization function, 2436you need to create one, containing at least the @code{bindtextdomain} 2437invocation. However, you usually don't need to export and document this 2438initialization function: It is sufficient that all entry points of the 2439library call the initialization function if it hasn't been called before. 2440The typical idiom used to achieve this is a static boolean variable that 2441indicates whether the initialization function has been called. Like this: 2442 2443@example 2444@group 2445static bool libfoo_initialized; 2446 2447static void 2448libfoo_initialize (void) 2449@{ 2450 bindtextdomain (PACKAGE, LOCALEDIR); 2451 libfoo_initialized = true; 2452@} 2453 2454/* This function is part of the exported API. */ 2455struct foo * 2456create_foo (...) 2457@{ 2458 /* Must ensure the initialization is performed. */ 2459 if (!libfoo_initialized) 2460 libfoo_initialize (); 2461 ... 2462@} 2463 2464/* This function is part of the exported API. The argument must be 2465 non-NULL and have been created through create_foo(). */ 2466int 2467foo_refcount (struct foo *argument) 2468@{ 2469 /* No need to invoke the initialization function here, because 2470 create_foo() must already have been called before. */ 2471 ... 2472@} 2473@end group 2474@end example 2475 2476@item 2477The usual declaration of the @samp{_} macro in each source file was 2478 2479@smallexample 2480#include <libintl.h> 2481#define _(String) gettext (String) 2482@end smallexample 2483 2484@noindent 2485for a program. For a library, which has its own translation domain, 2486it reads like this: 2487 2488@smallexample 2489#include <libintl.h> 2490#define _(String) dgettext (PACKAGE, String) 2491@end smallexample 2492 2493In other words, @code{dgettext} is used instead of @code{gettext}. 2494Similarly, the @code{dngettext} function should be used in place of the 2495@code{ngettext} function. 2496@end enumerate 2497 2498@node Template, Creating, Sources, Top 2499@chapter Making the PO Template File 2500@cindex PO template file 2501 2502After preparing the sources, the programmer creates a PO template file. 2503This section explains how to use @code{xgettext} for this purpose. 2504 2505@code{xgettext} creates a file named @file{@var{domainname}.po}. You 2506should then rename it to @file{@var{domainname}.pot}. (Why doesn't 2507@code{xgettext} create it under the name @file{@var{domainname}.pot} 2508right away? The answer is: for historical reasons. When @code{xgettext} 2509was specified, the distinction between a PO file and PO file template 2510was fuzzy, and the suffix @samp{.pot} wasn't in use at that time.) 2511 2512@c FIXME: Rewrite. 2513 2514@menu 2515* xgettext Invocation:: Invoking the @code{xgettext} Program 2516@end menu 2517 2518@node xgettext Invocation, , Template, Template 2519@section Invoking the @code{xgettext} Program 2520 2521@include xgettext.texi 2522 2523@node Creating, Updating, Template, Top 2524@chapter Creating a New PO File 2525@cindex creating a new PO file 2526 2527When starting a new translation, the translator creates a file called 2528@file{@var{LANG}.po}, as a copy of the @file{@var{package}.pot} template 2529file with modifications in the initial comments (at the beginning of the file) 2530and in the header entry (the first entry, near the beginning of the file). 2531 2532The easiest way to do so is by use of the @samp{msginit} program. 2533For example: 2534 2535@example 2536$ cd @var{PACKAGE}-@var{VERSION} 2537$ cd po 2538$ msginit 2539@end example 2540 2541The alternative way is to do the copy and modifications by hand. 2542To do so, the translator copies @file{@var{package}.pot} to 2543@file{@var{LANG}.po}. Then she modifies the initial comments and 2544the header entry of this file. 2545 2546@menu 2547* msginit Invocation:: Invoking the @code{msginit} Program 2548* Header Entry:: Filling in the Header Entry 2549@end menu 2550 2551@node msginit Invocation, Header Entry, Creating, Creating 2552@section Invoking the @code{msginit} Program 2553 2554@include msginit.texi 2555 2556@node Header Entry, , msginit Invocation, Creating 2557@section Filling in the Header Entry 2558@cindex header entry of a PO file 2559 2560The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and 2561"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible 2562information. This can be done in any text editor; if Emacs is used 2563and it switched to PO mode automatically (because it has recognized 2564the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}. 2565 2566Modifying the header entry can already be done using PO mode: in Emacs, 2567type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the 2568entry. You should fill in the following fields. 2569 2570@table @asis 2571@item Project-Id-Version 2572This is the name and version of the package. 2573 2574@item Report-Msgid-Bugs-To 2575This has already been filled in by @code{xgettext}. It contains an email 2576address or URL where you can report bugs in the untranslated strings: 2577 2578@itemize - 2579@item Strings which are not entire sentences, see the maintainer guidelines 2580in @ref{Preparing Strings}. 2581@item Strings which use unclear terms or require additional context to be 2582understood. 2583@item Strings which make invalid assumptions about notation of date, time or 2584money. 2585@item Pluralisation problems. 2586@item Incorrect English spelling. 2587@item Incorrect formatting. 2588@end itemize 2589 2590@item POT-Creation-Date 2591This has already been filled in by @code{xgettext}. 2592 2593@item PO-Revision-Date 2594You don't need to fill this in. It will be filled by the PO file editor 2595when you save the file. 2596 2597@item Last-Translator 2598Fill in your name and email address (without double quotes). 2599 2600@item Language-Team 2601Fill in the English name of the language, and the email address or 2602homepage URL of the language team you are part of. 2603 2604Before starting a translation, it is a good idea to get in touch with 2605your translation team, not only to make sure you don't do duplicated work, 2606but also to coordinate difficult linguistic issues. 2607 2608@cindex list of translation teams, where to find 2609In the Free Translation Project, each translation team has its own mailing 2610list. The up-to-date list of teams can be found at the Free Translation 2611Project's homepage, @uref{http://www.iro.umontreal.ca/contrib/po/HTML/}, 2612in the "National teams" area. 2613 2614@item Content-Type 2615@cindex encoding of PO files 2616@cindex charset of PO files 2617Replace @samp{CHARSET} with the character encoding used for your language, 2618in your locale, or UTF-8. This field is needed for correct operation of the 2619@code{msgmerge} and @code{msgfmt} programs, as well as for users whose 2620locale's character encoding differs from yours (see @ref{Charset conversion}). 2621 2622@cindex @code{locale} program 2623You get the character encoding of your locale by running the shell command 2624@samp{locale charmap}. If the result is @samp{C} or @samp{ANSI_X3.4-1968}, 2625which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your 2626locale is not correctly configured. In this case, ask your translation 2627team which charset to use. @samp{ASCII} is not usable for any language 2628except Latin. 2629 2630@cindex encoding list 2631Because the PO files must be portable to operating systems with less advanced 2632internationalization facilities, the character encodings that can be used 2633are limited to those supported by both GNU @code{libc} and GNU 2634@code{libiconv}. These are: 2635@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3}, 2636@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7}, 2637@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-14}, 2638@code{ISO-8859-15}, 2639@code{KOI8-R}, @code{KOI8-U}, @code{KOI8-T}, 2640@code{CP850}, @code{CP866}, @code{CP874}, 2641@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251}, 2642@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256}, 2643@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW}, 2644@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS}, 2645@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{GEORGIAN-PS}, @code{UTF-8}. 2646 2647@c This data is taken from glibc/localedata/SUPPORTED. 2648@cindex Linux 2649In the GNU system, the following encodings are frequently used for the 2650corresponding languages. 2651 2652@cindex encoding for your language 2653@itemize 2654@item @code{ISO-8859-1} for 2655Afrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, 2656English, Estonian, Faroese, Finnish, French, Galician, German, 2657Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx, 2658Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek, 2659Walloon, 2660@item @code{ISO-8859-2} for 2661Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, 2662Slovenian, 2663@item @code{ISO-8859-3} for Maltese, 2664@item @code{ISO-8859-5} for Macedonian, Serbian, 2665@item @code{ISO-8859-6} for Arabic, 2666@item @code{ISO-8859-7} for Greek, 2667@item @code{ISO-8859-8} for Hebrew, 2668@item @code{ISO-8859-9} for Turkish, 2669@item @code{ISO-8859-13} for Latvian, Lithuanian, Maori, 2670@item @code{ISO-8859-14} for Welsh, 2671@item @code{ISO-8859-15} for 2672Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish, 2673Italian, Portuguese, Spanish, Swedish, Walloon, 2674@item @code{KOI8-R} for Russian, 2675@item @code{KOI8-U} for Ukrainian, 2676@item @code{KOI8-T} for Tajik, 2677@item @code{CP1251} for Bulgarian, Byelorussian, 2678@item @code{GB2312}, @code{GBK}, @code{GB18030} 2679for simplified writing of Chinese, 2680@item @code{BIG5}, @code{BIG5-HKSCS} 2681for traditional writing of Chinese, 2682@item @code{EUC-JP} for Japanese, 2683@item @code{EUC-KR} for Korean, 2684@item @code{TIS-620} for Thai, 2685@item @code{GEORGIAN-PS} for Georgian, 2686@item @code{UTF-8} for any language, including those listed above. 2687@end itemize 2688 2689@cindex quote characters, use in PO files 2690@cindex quotation marks 2691When single quote characters or double quote characters are used in 2692translations for your language, and your locale's encoding is one of the 2693ISO-8859-* charsets, it is best if you create your PO files in UTF-8 2694encoding, instead of your locale's encoding. This is because in UTF-8 2695the real quote characters can be represented (single quote characters: 2696U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of 2697ISO-8859-* charsets has them all. Users in UTF-8 locales will see the 2698real quote characters, whereas users in ISO-8859-* locales will see the 2699vertical apostrophe and the vertical double quote instead (because that's 2700what the character set conversion will transliterate them to). 2701 2702@cindex @code{xmodmap} program, and typing quotation marks 2703To enter such quote characters under X11, you can change your keyboard 2704mapping using the @code{xmodmap} program. The X11 names of the quote 2705characters are "leftsinglequotemark", "rightsinglequotemark", 2706"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark", 2707"doublelowquotemark". 2708 2709Note that only recent versions of GNU Emacs support the UTF-8 encoding: 2710Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't 2711support the UTF-8 encoding. 2712 2713The character encoding name can be written in either upper or lower case. 2714Usually upper case is preferred. 2715 2716@item Content-Transfer-Encoding 2717Set this to @code{8bit}. 2718 2719@item Plural-Forms 2720This field is optional. It is only needed if the PO file has plural forms. 2721You can find them by searching for the @samp{msgid_plural} keyword. The 2722format of the plural forms field is described in @ref{Plural forms}. 2723@end table 2724 2725@node Updating, Editing, Creating, Top 2726@chapter Updating Existing PO Files 2727 2728@menu 2729* msgmerge Invocation:: Invoking the @code{msgmerge} Program 2730@end menu 2731 2732@node msgmerge Invocation, , Updating, Updating 2733@section Invoking the @code{msgmerge} Program 2734 2735@include msgmerge.texi 2736 2737@node Editing, Manipulating, Updating, Top 2738@chapter Editing PO Files 2739@cindex Editing PO Files 2740 2741@menu 2742* KBabel:: KDE's PO File Editor 2743* Gtranslator:: GNOME's PO File Editor 2744* PO Mode:: Emacs's PO File Editor 2745@end menu 2746 2747@node KBabel, Gtranslator, Editing, Editing 2748@section KDE's PO File Editor 2749@cindex KDE PO file editor 2750 2751@node Gtranslator, PO Mode, KBabel, Editing 2752@section GNOME's PO File Editor 2753@cindex GNOME PO file editor 2754 2755@node PO Mode, , Gtranslator, Editing 2756@section Emacs's PO File Editor 2757@cindex Emacs PO Mode 2758 2759@c FIXME: Rewrite. 2760 2761For those of you being 2762the lucky users of Emacs, PO mode has been specifically created 2763for providing a cozy environment for editing or modifying PO files. 2764While editing a PO file, PO mode allows for the easy browsing of 2765auxiliary and compendium PO files, as well as for following references into 2766the set of C program sources from which PO files have been derived. 2767It has a few special features, among which are the interactive marking 2768of program strings as translatable, and the validation of PO files 2769with easy repositioning to PO file lines showing errors. 2770 2771For the beginning, besides main PO mode commands 2772(@pxref{Main PO Commands}), you should know how to move between entries 2773(@pxref{Entry Positioning}), and how to handle untranslated entries 2774(@pxref{Untranslated Entries}). 2775 2776@menu 2777* Installation:: Completing GNU @code{gettext} Installation 2778* Main PO Commands:: Main Commands 2779* Entry Positioning:: Entry Positioning 2780* Normalizing:: Normalizing Strings in Entries 2781* Translated Entries:: Translated Entries 2782* Fuzzy Entries:: Fuzzy Entries 2783* Untranslated Entries:: Untranslated Entries 2784* Obsolete Entries:: Obsolete Entries 2785* Modifying Translations:: Modifying Translations 2786* Modifying Comments:: Modifying Comments 2787* Subedit:: Mode for Editing Translations 2788* C Sources Context:: C Sources Context 2789* Auxiliary:: Consulting Auxiliary PO Files 2790* Compendium:: Using Translation Compendia 2791@end menu 2792 2793@node Installation, Main PO Commands, PO Mode, PO Mode 2794@subsection Completing GNU @code{gettext} Installation 2795 2796@cindex installing @code{gettext} 2797@cindex @code{gettext} installation 2798Once you have received, unpacked, configured and compiled the GNU 2799@code{gettext} distribution, the @samp{make install} command puts in 2800place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and 2801@code{msgmerge}, as well as their available message catalogs. To 2802top off a comfortable installation, you might also want to make the 2803PO mode available to your Emacs users. 2804 2805@emindex @file{.emacs} customizations 2806@emindex installing PO mode 2807During the installation of the PO mode, you might want to modify your 2808file @file{.emacs}, once and for all, so it contains a few lines looking 2809like: 2810 2811@example 2812(setq auto-mode-alist 2813 (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist)) 2814(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t) 2815@end example 2816 2817Later, whenever you edit some @file{.po} 2818file, or any file having the string @samp{.po.} within its name, 2819Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and 2820automatically activates PO mode commands for the associated buffer. 2821The string @emph{PO} appears in the mode line for any buffer for 2822which PO mode is active. Many PO files may be active at once in a 2823single Emacs session. 2824 2825If you are using Emacs version 20 or newer, and have already installed 2826the appropriate international fonts on your system, you may also tell 2827Emacs how to determine automatically the coding system of every PO file. 2828This will often (but not always) cause the necessary fonts to be loaded 2829and used for displaying the translations on your Emacs screen. For this 2830to happen, add the lines: 2831 2832@example 2833(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\." 2834 'po-find-file-coding-system) 2835(autoload 'po-find-file-coding-system "po-mode") 2836@end example 2837 2838@noindent 2839to your @file{.emacs} file. If, with this, you still see boxes instead 2840of international characters, try a different font set (via Shift Mouse 2841button 1). 2842 2843@node Main PO Commands, Entry Positioning, Installation, PO Mode 2844@subsection Main PO mode Commands 2845 2846@cindex PO mode (Emacs) commands 2847@emindex commands 2848After setting up Emacs with something similar to the lines in 2849@ref{Installation}, PO mode is activated for a window when Emacs finds a 2850PO file in that window. This puts the window read-only and establishes a 2851po-mode-map, which is a genuine Emacs mode, in a way that is not derived 2852from text mode in any way. Functions found on @code{po-mode-hook}, 2853if any, will be executed. 2854 2855When PO mode is active in a window, the letters @samp{PO} appear 2856in the mode line for that window. The mode line also displays how 2857many entries of each kind are held in the PO file. For example, 2858the string @samp{132t+3f+10u+2o} would tell the translator that the 2859PO mode contains 132 translated entries (@pxref{Translated Entries}, 28603 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries 2861(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete 2862Entries}). Zero-coefficients items are not shown. So, in this example, if 2863the fuzzy entries were unfuzzied, the untranslated entries were translated 2864and the obsolete entries were deleted, the mode line would merely display 2865@samp{145t} for the counters. 2866 2867The main PO commands are those which do not fit into the other categories of 2868subsequent sections. These allow for quitting PO mode or for managing windows 2869in special ways. 2870 2871@table @kbd 2872@item _ 2873@efindex _@r{, PO Mode command} 2874Undo last modification to the PO file (@code{po-undo}). 2875 2876@item Q 2877@efindex Q@r{, PO Mode command} 2878Quit processing and save the PO file (@code{po-quit}). 2879 2880@item q 2881@efindex q@r{, PO Mode command} 2882Quit processing, possibly after confirmation (@code{po-confirm-and-quit}). 2883 2884@item 0 2885@efindex 0@r{, PO Mode command} 2886Temporary leave the PO file window (@code{po-other-window}). 2887 2888@item ? 2889@itemx h 2890@efindex ?@r{, PO Mode command} 2891@efindex h@r{, PO Mode command} 2892Show help about PO mode (@code{po-help}). 2893 2894@item = 2895@efindex =@r{, PO Mode command} 2896Give some PO file statistics (@code{po-statistics}). 2897 2898@item V 2899@efindex V@r{, PO Mode command} 2900Batch validate the format of the whole PO file (@code{po-validate}). 2901 2902@end table 2903 2904@efindex _@r{, PO Mode command} 2905@efindex po-undo@r{, PO Mode command} 2906The command @kbd{_} (@code{po-undo}) interfaces to the Emacs 2907@emph{undo} facility. @xref{Undo, , Undoing Changes, emacs, The Emacs 2908Editor}. Each time @kbd{U} is typed, modifications which the translator 2909did to the PO file are undone a little more. For the purpose of 2910undoing, each PO mode command is atomic. This is especially true for 2911the @kbd{@key{RET}} command: the whole edition made by using a single 2912use of this command is undone at once, even if the edition itself 2913implied several actions. However, while in the editing window, one 2914can undo the edition work quite parsimoniously. 2915 2916@efindex Q@r{, PO Mode command} 2917@efindex q@r{, PO Mode command} 2918@efindex po-quit@r{, PO Mode command} 2919@efindex po-confirm-and-quit@r{, PO Mode command} 2920The commands @kbd{Q} (@code{po-quit}) and @kbd{q} 2921(@code{po-confirm-and-quit}) are used when the translator is done with the 2922PO file. The former is a bit less verbose than the latter. If the file 2923has been modified, it is saved to disk first. In both cases, and prior to 2924all this, the commands check if any untranslated messages remain in the 2925PO file and, if so, the translator is asked if she really wants to leave 2926off working with this PO file. This is the preferred way of getting rid 2927of an Emacs PO file buffer. Merely killing it through the usual command 2928@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed. 2929 2930@efindex 0@r{, PO Mode command} 2931@efindex po-other-window@r{, PO Mode command} 2932The command @kbd{0} (@code{po-other-window}) is another, softer way, 2933to leave PO mode, temporarily. It just moves the cursor to some other 2934Emacs window, and pops one if necessary. For example, if the translator 2935just got PO mode to show some source context in some other, she might 2936discover some apparent bug in the program source that needs correction. 2937This command allows the translator to change sex, become a programmer, 2938and have the cursor right into the window containing the program she 2939(or rather @emph{he}) wants to modify. By later getting the cursor back 2940in the PO file window, or by asking Emacs to edit this file once again, 2941PO mode is then recovered. 2942 2943@efindex ?@r{, PO Mode command} 2944@efindex h@r{, PO Mode command} 2945@efindex po-help@r{, PO Mode command} 2946The command @kbd{h} (@code{po-help}) displays a summary of all available PO 2947mode commands. The translator should then type any character to resume 2948normal PO mode operations. The command @kbd{?} has the same effect 2949as @kbd{h}. 2950 2951@efindex =@r{, PO Mode command} 2952@efindex po-statistics@r{, PO Mode command} 2953The command @kbd{=} (@code{po-statistics}) computes the total number of 2954entries in the PO file, the ordinal of the current entry (counted from 29551), the number of untranslated entries, the number of obsolete entries, 2956and displays all these numbers. 2957 2958@efindex V@r{, PO Mode command} 2959@efindex po-validate@r{, PO Mode command} 2960The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in 2961checking and verbose 2962mode over the current PO file. This command first offers to save the 2963current PO file on disk. The @code{msgfmt} tool, from GNU @code{gettext}, 2964has the purpose of creating a MO file out of a PO file, and PO mode uses 2965the features of this program for checking the overall format of a PO file, 2966as well as all individual entries. 2967 2968@efindex next-error@r{, stepping through PO file validation results} 2969The program @code{msgfmt} runs asynchronously with Emacs, so the 2970translator regains control immediately while her PO file is being studied. 2971Error output is collected in the Emacs @samp{*compilation*} buffer, 2972displayed in another window. The regular Emacs command @kbd{C-x`} 2973(@code{next-error}), as well as other usual compile commands, allow the 2974translator to reposition quickly to the offending parts of the PO file. 2975Once the cursor is on the line in error, the translator may decide on 2976any PO mode action which would help correcting the error. 2977 2978@node Entry Positioning, Normalizing, Main PO Commands, PO Mode 2979@subsection Entry Positioning 2980 2981@emindex current entry of a PO file 2982The cursor in a PO file window is almost always part of 2983an entry. The only exceptions are the special case when the cursor 2984is after the last entry in the file, or when the PO file is 2985empty. The entry where the cursor is found to be is said to be the 2986current entry. Many PO mode commands operate on the current entry, 2987so moving the cursor does more than allowing the translator to browse 2988the PO file, this also selects on which entry commands operate. 2989 2990@emindex moving through a PO file 2991Some PO mode commands alter the position of the cursor in a specialized 2992way. A few of those special purpose positioning are described here, 2993the others are described in following sections (for a complete list try 2994@kbd{C-h m}): 2995 2996@table @kbd 2997 2998@item . 2999@efindex .@r{, PO Mode command} 3000Redisplay the current entry (@code{po-current-entry}). 3001 3002@item n 3003@efindex n@r{, PO Mode command} 3004Select the entry after the current one (@code{po-next-entry}). 3005 3006@item p 3007@efindex p@r{, PO Mode command} 3008Select the entry before the current one (@code{po-previous-entry}). 3009 3010@item < 3011@efindex <@r{, PO Mode command} 3012Select the first entry in the PO file (@code{po-first-entry}). 3013 3014@item > 3015@efindex >@r{, PO Mode command} 3016Select the last entry in the PO file (@code{po-last-entry}). 3017 3018@item m 3019@efindex m@r{, PO Mode command} 3020Record the location of the current entry for later use 3021(@code{po-push-location}). 3022 3023@item r 3024@efindex r@r{, PO Mode command} 3025Return to a previously saved entry location (@code{po-pop-location}). 3026 3027@item x 3028@efindex x@r{, PO Mode command} 3029Exchange the current entry location with the previously saved one 3030(@code{po-exchange-location}). 3031 3032@end table 3033 3034@efindex .@r{, PO Mode command} 3035@efindex po-current-entry@r{, PO Mode command} 3036Any Emacs command able to reposition the cursor may be used 3037to select the current entry in PO mode, including commands which 3038move by characters, lines, paragraphs, screens or pages, and search 3039commands. However, there is a kind of standard way to display the 3040current entry in PO mode, which usual Emacs commands moving 3041the cursor do not especially try to enforce. The command @kbd{.} 3042(@code{po-current-entry}) has the sole purpose of redisplaying the 3043current entry properly, after the current entry has been changed by 3044means external to PO mode, or the Emacs screen otherwise altered. 3045 3046It is yet to be decided if PO mode helps the translator, or otherwise 3047irritates her, by forcing a rigid window disposition while she 3048is doing her work. We originally had quite precise ideas about 3049how windows should behave, but on the other hand, anyone used to 3050Emacs is often happy to keep full control. Maybe a fixed window 3051disposition might be offered as a PO mode option that the translator 3052might activate or deactivate at will, so it could be offered on an 3053experimental basis. If nobody feels a real need for using it, or 3054a compulsion for writing it, we should drop this whole idea. 3055The incentive for doing it should come from translators rather than 3056programmers, as opinions from an experienced translator are surely 3057more worth to me than opinions from programmers @emph{thinking} about 3058how @emph{others} should do translation. 3059 3060@efindex n@r{, PO Mode command} 3061@efindex po-next-entry@r{, PO Mode command} 3062@efindex p@r{, PO Mode command} 3063@efindex po-previous-entry@r{, PO Mode command} 3064The commands @kbd{n} (@code{po-next-entry}) and @kbd{p} 3065(@code{po-previous-entry}) move the cursor the entry following, 3066or preceding, the current one. If @kbd{n} is given while the 3067cursor is on the last entry of the PO file, or if @kbd{p} 3068is given while the cursor is on the first entry, no move is done. 3069 3070@efindex <@r{, PO Mode command} 3071@efindex po-first-entry@r{, PO Mode command} 3072@efindex >@r{, PO Mode command} 3073@efindex po-last-entry@r{, PO Mode command} 3074The commands @kbd{<} (@code{po-first-entry}) and @kbd{>} 3075(@code{po-last-entry}) move the cursor to the first entry, or last 3076entry, of the PO file. When the cursor is located past the last 3077entry in a PO file, most PO mode commands will return an error saying 3078@samp{After last entry}. Moreover, the commands @kbd{<} and @kbd{>} 3079have the special property of being able to work even when the cursor 3080is not into some PO file entry, and one may use them for nicely 3081correcting this situation. But even these commands will fail on a 3082truly empty PO file. There are development plans for the PO mode for it 3083to interactively fill an empty PO file from sources. @xref{Marking}. 3084 3085The translator may decide, before working at the translation of 3086a particular entry, that she needs to browse the remainder of the 3087PO file, maybe for finding the terminology or phraseology used 3088in related entries. She can of course use the standard Emacs idioms 3089for saving the current cursor location in some register, and use that 3090register for getting back, or else, use the location ring. 3091 3092@efindex m@r{, PO Mode command} 3093@efindex po-push-location@r{, PO Mode command} 3094@efindex r@r{, PO Mode command} 3095@efindex po-pop-location@r{, PO Mode command} 3096PO mode offers another approach, by which cursor locations may be saved 3097onto a special stack. The command @kbd{m} (@code{po-push-location}) 3098merely adds the location of current entry to the stack, pushing 3099the already saved locations under the new one. The command 3100@kbd{r} (@code{po-pop-location}) consumes the top stack element and 3101repositions the cursor to the entry associated with that top element. 3102This position is then lost, for the next @kbd{r} will move the cursor 3103to the previously saved location, and so on until no locations remain 3104on the stack. 3105 3106If the translator wants the position to be kept on the location stack, 3107maybe for taking a look at the entry associated with the top 3108element, then go elsewhere with the intent of getting back later, she 3109ought to use @kbd{m} immediately after @kbd{r}. 3110 3111@efindex x@r{, PO Mode command} 3112@efindex po-exchange-location@r{, PO Mode command} 3113The command @kbd{x} (@code{po-exchange-location}) simultaneously 3114repositions the cursor to the entry associated with the top element of 3115the stack of saved locations, and replaces that top element with the 3116location of the current entry before the move. Consequently, repeating 3117the @kbd{x} command toggles alternatively between two entries. 3118For achieving this, the translator will position the cursor on the 3119first entry, use @kbd{m}, then position to the second entry, and 3120merely use @kbd{x} for making the switch. 3121 3122@node Normalizing, Translated Entries, Entry Positioning, PO Mode 3123@subsection Normalizing Strings in Entries 3124@cindex string normalization in entries 3125 3126There are many different ways for encoding a particular string into a 3127PO file entry, because there are so many different ways to split and 3128quote multi-line strings, and even, to represent special characters 3129by backslashed escaped sequences. Some features of PO mode rely on 3130the ability for PO mode to scan an already existing PO file for a 3131particular string encoded into the @code{msgid} field of some entry. 3132Even if PO mode has internally all the built-in machinery for 3133implementing this recognition easily, doing it fast is technically 3134difficult. To facilitate a solution to this efficiency problem, 3135we decided on a canonical representation for strings. 3136 3137A conventional representation of strings in a PO file is currently 3138under discussion, and PO mode experiments with a canonical representation. 3139Having both @code{xgettext} and PO mode converging towards a uniform 3140way of representing equivalent strings would be useful, as the internal 3141normalization needed by PO mode could be automatically satisfied 3142when using @code{xgettext} from GNU @code{gettext}. An explicit 3143PO mode normalization should then be only necessary for PO files 3144imported from elsewhere, or for when the convention itself evolves. 3145 3146So, for achieving normalization of at least the strings of a given 3147PO file needing a canonical representation, the following PO mode 3148command is available: 3149 3150@emindex string normalization in entries 3151@table @kbd 3152@item M-x po-normalize 3153@efindex po-normalize@r{, PO Mode command} 3154Tidy the whole PO file by making entries more uniform. 3155 3156@end table 3157 3158The special command @kbd{M-x po-normalize}, which has no associated 3159keys, revises all entries, ensuring that strings of both original 3160and translated entries use uniform internal quoting in the PO file. 3161It also removes any crumb after the last entry. This command may be 3162useful for PO files freshly imported from elsewhere, or if we ever 3163improve on the canonical quoting format we use. This canonical format 3164is not only meant for getting cleaner PO files, but also for greatly 3165speeding up @code{msgid} string lookup for some other PO mode commands. 3166 3167@kbd{M-x po-normalize} presently makes three passes over the entries. 3168The first implements heuristics for converting PO files for GNU 3169@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr} 3170fields were using K&R style C string syntax for multi-line strings. 3171These heuristics may fail for comments not related to obsolete 3172entries and ending with a backslash; they also depend on subsequent 3173passes for finalizing the proper commenting of continued lines for 3174obsolete entries. This first pass might disappear once all oldish PO 3175files would have been adjusted. The second and third pass normalize 3176all @code{msgid} and @code{msgstr} strings respectively. They also 3177clean out those trailing backslashes used by XView's @code{msgfmt} 3178for continued lines. 3179 3180@cindex importing PO files 3181Having such an explicit normalizing command allows for importing PO 3182files from other sources, but also eases the evolution of the current 3183convention, evolution driven mostly by aesthetic concerns, as of now. 3184It is easy to make suggested adjustments at a later time, as the 3185normalizing command and eventually, other GNU @code{gettext} tools 3186should greatly automate conformance. A description of the canonical 3187string format is given below, for the particular benefit of those not 3188having Emacs handy, and who would nevertheless want to handcraft 3189their PO files in nice ways. 3190 3191@cindex multi-line strings 3192Right now, in PO mode, strings are single line or multi-line. A string 3193goes multi-line if and only if it has @emph{embedded} newlines, that 3194is, if it matches @samp{[^\n]\n+[^\n]}. So, we would have: 3195 3196@example 3197msgstr "\n\nHello, world!\n\n\n" 3198@end example 3199 3200but, replacing the space by a newline, this becomes: 3201 3202@example 3203msgstr "" 3204"\n" 3205"\n" 3206"Hello,\n" 3207"world!\n" 3208"\n" 3209"\n" 3210@end example 3211 3212We are deliberately using a caricatural example, here, to make the 3213point clearer. Usually, multi-lines are not that bad looking. 3214It is probable that we will implement the following suggestion. 3215We might lump together all initial newlines into the empty string, 3216and also all newlines introducing empty lines (that is, for @w{@var{n} 3217> 1}, the @var{n}-1'th last newlines would go together on a separate 3218string), so making the previous example appear: 3219 3220@example 3221msgstr "\n\n" 3222"Hello,\n" 3223"world!\n" 3224"\n\n" 3225@end example 3226 3227There are a few yet undecided little points about string normalization, 3228to be documented in this manual, once these questions settle. 3229 3230@node Translated Entries, Fuzzy Entries, Normalizing, PO Mode 3231@subsection Translated Entries 3232@cindex translated entries 3233 3234Each PO file entry for which the @code{msgstr} field has been filled with 3235a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}), 3236is said to be a @dfn{translated} entry. Only translated entries will 3237later be compiled by GNU @code{msgfmt} and become usable in programs. 3238Other entry types will be excluded; translation will not occur for them. 3239 3240@emindex moving by translated entries 3241Some commands are more specifically related to translated entry processing. 3242 3243@table @kbd 3244@item t 3245@efindex t@r{, PO Mode command} 3246Find the next translated entry (@code{po-next-translated-entry}). 3247 3248@item T 3249@efindex T@r{, PO Mode command} 3250Find the previous translated entry (@code{po-previous-translated-entry}). 3251 3252@end table 3253 3254@efindex t@r{, PO Mode command} 3255@efindex po-next-translated-entry@r{, PO Mode command} 3256@efindex T@r{, PO Mode command} 3257@efindex po-previous-translated-entry@r{, PO Mode command} 3258The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{T} 3259(@code{po-previous-translated-entry}) move forwards or backwards, chasing 3260for an translated entry. If none is found, the search is extended and 3261wraps around in the PO file buffer. 3262 3263@evindex po-auto-fuzzy-on-edit@r{, PO Mode variable} 3264Translated entries usually result from the translator having edited in 3265a translation for them, @ref{Modifying Translations}. However, if the 3266variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having 3267received a new translation first becomes a fuzzy entry, which ought to 3268be later unfuzzied before becoming an official, genuine translated entry. 3269@xref{Fuzzy Entries}. 3270 3271@node Fuzzy Entries, Untranslated Entries, Translated Entries, PO Mode 3272@subsection Fuzzy Entries 3273@cindex fuzzy entries 3274 3275@cindex attributes of a PO file entry 3276@cindex attribute, fuzzy 3277Each PO file entry may have a set of @dfn{attributes}, which are 3278qualities given a name and explicitly associated with the translation, 3279using a special system comment. One of these attributes 3280has the name @code{fuzzy}, and entries having this attribute are said 3281to have a fuzzy translation. They are called fuzzy entries, for short. 3282 3283Fuzzy entries, even if they account for translated entries for 3284most other purposes, usually call for revision by the translator. 3285Those may be produced by applying the program @code{msgmerge} to 3286update an older translated PO files according to a new PO template 3287file, when this tool hypothesises that some new @code{msgid} has 3288been modified only slightly out of an older one, and chooses to pair 3289what it thinks to be the old translation for the new modified entry. 3290The slight alteration in the original string (the @code{msgid} string) 3291should often be reflected in the translated string, and this requires 3292the intervention of the translator. For this reason, @code{msgmerge} 3293might mark some entries as being fuzzy. 3294 3295@emindex moving by fuzzy entries 3296Also, the translator may decide herself to mark an entry as fuzzy 3297for her own convenience, when she wants to remember that the entry 3298has to be later revisited. So, some commands are more specifically 3299related to fuzzy entry processing. 3300 3301@table @kbd 3302@item z 3303@efindex z@r{, PO Mode command} 3304@c better append "-entry" all the time. -ke- 3305Find the next fuzzy entry (@code{po-next-fuzzy-entry}). 3306 3307@item Z 3308@efindex Z@r{, PO Mode command} 3309Find the previous fuzzy entry (@code{po-previous-fuzzy-entry}). 3310 3311@item @key{TAB} 3312@efindex TAB@r{, PO Mode command} 3313Remove the fuzzy attribute of the current entry (@code{po-unfuzzy}). 3314 3315@end table 3316 3317@efindex z@r{, PO Mode command} 3318@efindex po-next-fuzzy-entry@r{, PO Mode command} 3319@efindex Z@r{, PO Mode command} 3320@efindex po-previous-fuzzy-entry@r{, PO Mode command} 3321The commands @kbd{z} (@code{po-next-fuzzy-entry}) and @kbd{Z} 3322(@code{po-previous-fuzzy-entry}) move forwards or backwards, chasing for 3323a fuzzy entry. If none is found, the search is extended and wraps 3324around in the PO file buffer. 3325 3326@efindex TAB@r{, PO Mode command} 3327@efindex po-unfuzzy@r{, PO Mode command} 3328@evindex po-auto-select-on-unfuzzy@r{, PO Mode variable} 3329The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy 3330attribute associated with an entry, usually leaving it translated. 3331Further, if the variable @code{po-auto-select-on-unfuzzy} has not 3332the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase 3333for another interesting entry to work on. The initial value of 3334@code{po-auto-select-on-unfuzzy} is @code{nil}. 3335 3336The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}. However, 3337if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry 3338edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to 3339ensure some kind of double check, later. In this case, the usual paradigm 3340is that an entry becomes fuzzy (if not already) whenever the translator 3341modifies it. If she is satisfied with the translation, she then uses 3342@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute 3343on the same blow. If she is not satisfied yet, she merely uses @kbd{@key{SPC}} 3344to chase another entry, leaving the entry fuzzy. 3345 3346@efindex DEL@r{, PO Mode command} 3347@efindex po-fade-out-entry@r{, PO Mode command} 3348The translator may also use the @kbd{@key{DEL}} command 3349(@code{po-fade-out-entry}) over any translated entry to mark it as being 3350fuzzy, when she wants to easily leave a trace she wants to later return 3351working at this entry. 3352 3353Also, when time comes to quit working on a PO file buffer with the @kbd{q} 3354command, the translator is asked for confirmation, if fuzzy string 3355still exists. 3356 3357@node Untranslated Entries, Obsolete Entries, Fuzzy Entries, PO Mode 3358@subsection Untranslated Entries 3359@cindex untranslated entries 3360 3361When @code{xgettext} originally creates a PO file, unless told 3362otherwise, it initializes the @code{msgid} field with the untranslated 3363string, and leaves the @code{msgstr} string to be empty. Such entries, 3364having an empty translation, are said to be @dfn{untranslated} entries. 3365Later, when the programmer slightly modifies some string right in 3366the program, this change is later reflected in the PO file 3367by the appearance of a new untranslated entry for the modified string. 3368 3369The usual commands moving from entry to entry consider untranslated 3370entries on the same level as active entries. Untranslated entries 3371are easily recognizable by the fact they end with @w{@samp{msgstr ""}}. 3372 3373@emindex moving by untranslated entries 3374The work of the translator might be (quite naively) seen as the process 3375of seeking for an untranslated entry, editing a translation for 3376it, and repeating these actions until no untranslated entries remain. 3377Some commands are more specifically related to untranslated entry 3378processing. 3379 3380@table @kbd 3381@item u 3382@efindex u@r{, PO Mode command} 3383Find the next untranslated entry (@code{po-next-untranslated-entry}). 3384 3385@item U 3386@efindex U@r{, PO Mode command} 3387Find the previous untranslated entry (@code{po-previous-untransted-entry}). 3388 3389@item k 3390@efindex k@r{, PO Mode command} 3391Turn the current entry into an untranslated one (@code{po-kill-msgstr}). 3392 3393@end table 3394 3395@efindex u@r{, PO Mode command} 3396@efindex po-next-untranslated-entry@r{, PO Mode command} 3397@efindex U@r{, PO Mode command} 3398@efindex po-previous-untransted-entry@r{, PO Mode command} 3399The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{U} 3400(@code{po-previous-untransted-entry}) move forwards or backwards, 3401chasing for an untranslated entry. If none is found, the search is 3402extended and wraps around in the PO file buffer. 3403 3404@efindex k@r{, PO Mode command} 3405@efindex po-kill-msgstr@r{, PO Mode command} 3406An entry can be turned back into an untranslated entry by 3407merely emptying its translation, using the command @kbd{k} 3408(@code{po-kill-msgstr}). @xref{Modifying Translations}. 3409 3410Also, when time comes to quit working on a PO file buffer 3411with the @kbd{q} command, the translator is asked for confirmation, 3412if some untranslated string still exists. 3413 3414@node Obsolete Entries, Modifying Translations, Untranslated Entries, PO Mode 3415@subsection Obsolete Entries 3416@cindex obsolete entries 3417 3418By @dfn{obsolete} PO file entries, we mean those entries which are 3419commented out, usually by @code{msgmerge} when it found that the 3420translation is not needed anymore by the package being localized. 3421 3422The usual commands moving from entry to entry consider obsolete 3423entries on the same level as active entries. Obsolete entries are 3424easily recognizable by the fact that all their lines start with 3425@code{#}, even those lines containing @code{msgid} or @code{msgstr}. 3426 3427Commands exist for emptying the translation or reinitializing it 3428to the original untranslated string. Commands interfacing with the 3429kill ring may force some previously saved text into the translation. 3430The user may interactively edit the translation. All these commands 3431may apply to obsolete entries, carefully leaving the entry obsolete 3432after the fact. 3433 3434@emindex moving by obsolete entries 3435Moreover, some commands are more specifically related to obsolete 3436entry processing. 3437 3438@table @kbd 3439@item o 3440@efindex o@r{, PO Mode command} 3441Find the next obsolete entry (@code{po-next-obsolete-entry}). 3442 3443@item O 3444@efindex O@r{, PO Mode command} 3445Find the previous obsolete entry (@code{po-previous-obsolete-entry}). 3446 3447@item @key{DEL} 3448@efindex DEL@r{, PO Mode command} 3449Make an active entry obsolete, or zap out an obsolete entry 3450(@code{po-fade-out-entry}). 3451 3452@end table 3453 3454@efindex o@r{, PO Mode command} 3455@efindex po-next-obsolete-entry@r{, PO Mode command} 3456@efindex O@r{, PO Mode command} 3457@efindex po-previous-obsolete-entry@r{, PO Mode command} 3458The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{O} 3459(@code{po-previous-obsolete-entry}) move forwards or backwards, 3460chasing for an obsolete entry. If none is found, the search is 3461extended and wraps around in the PO file buffer. 3462 3463PO mode does not provide ways for un-commenting an obsolete entry 3464and making it active, because this would reintroduce an original 3465untranslated string which does not correspond to any marked string 3466in the program sources. This goes with the philosophy of never 3467introducing useless @code{msgid} values. 3468 3469@efindex DEL@r{, PO Mode command} 3470@efindex po-fade-out-entry@r{, PO Mode command} 3471@emindex obsolete active entry 3472@emindex comment out PO file entry 3473However, it is possible to comment out an active entry, so making 3474it obsolete. GNU @code{gettext} utilities will later react to the 3475disappearance of a translation by using the untranslated string. 3476The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry 3477a little further towards annihilation. If the entry is active (it is a 3478translated entry), then it is first made fuzzy. If it is already fuzzy, 3479then the entry is merely commented out, with confirmation. If the entry 3480is already obsolete, then it is completely deleted from the PO file. 3481It is easy to recycle the translation so deleted into some other PO file 3482entry, usually one which is untranslated. @xref{Modifying Translations}. 3483 3484Here is a quite interesting problem to solve for later development of 3485PO mode, for those nights you are not sleepy. The idea would be that 3486PO mode might become bright enough, one of these days, to make good 3487guesses at retrieving the most probable candidate, among all obsolete 3488entries, for initializing the translation of a newly appeared string. 3489I think it might be a quite hard problem to do this algorithmically, as 3490we have to develop good and efficient measures of string similarity. 3491Right now, PO mode completely lets the decision to the translator, 3492when the time comes to find the adequate obsolete translation, it 3493merely tries to provide handy tools for helping her to do so. 3494 3495@node Modifying Translations, Modifying Comments, Obsolete Entries, PO Mode 3496@subsection Modifying Translations 3497@cindex editing translations 3498@emindex editing translations 3499 3500PO mode prevents direct modification of the PO file, by the usual 3501means Emacs gives for altering a buffer's contents. By doing so, 3502it pretends helping the translator to avoid little clerical errors 3503about the overall file format, or the proper quoting of strings, 3504as those errors would be easily made. Other kinds of errors are 3505still possible, but some may be caught and diagnosed by the batch 3506validation process, which the translator may always trigger by the 3507@kbd{V} command. For all other errors, the translator has to rely on 3508her own judgment, and also on the linguistic reports submitted to her 3509by the users of the translated package, having the same mother tongue. 3510 3511When the time comes to create a translation, correct an error diagnosed 3512mechanically or reported by a user, the translators have to resort to 3513using the following commands for modifying the translations. 3514 3515@table @kbd 3516@item @key{RET} 3517@efindex RET@r{, PO Mode command} 3518Interactively edit the translation (@code{po-edit-msgstr}). 3519 3520@item @key{LFD} 3521@itemx C-j 3522@efindex LFD@r{, PO Mode command} 3523@efindex C-j@r{, PO Mode command} 3524Reinitialize the translation with the original, untranslated string 3525(@code{po-msgid-to-msgstr}). 3526 3527@item k 3528@efindex k@r{, PO Mode command} 3529Save the translation on the kill ring, and delete it (@code{po-kill-msgstr}). 3530 3531@item w 3532@efindex w@r{, PO Mode command} 3533Save the translation on the kill ring, without deleting it 3534(@code{po-kill-ring-save-msgstr}). 3535 3536@item y 3537@efindex y@r{, PO Mode command} 3538Replace the translation, taking the new from the kill ring 3539(@code{po-yank-msgstr}). 3540 3541@end table 3542 3543@efindex RET@r{, PO Mode command} 3544@efindex po-edit-msgstr@r{, PO Mode command} 3545The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs 3546window meant to edit in a new translation, or to modify an already existing 3547translation. The new window contains a copy of the translation taken from 3548the current PO file entry, all ready for edition, expunged of all quoting 3549marks, fully modifiable and with the complete extent of Emacs modifying 3550commands. When the translator is done with her modifications, she may use 3551@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted 3552results, or @w{@kbd{C-c C-k}} to abort her modifications. @xref{Subedit}, 3553for more information. 3554 3555@efindex LFD@r{, PO Mode command} 3556@efindex C-j@r{, PO Mode command} 3557@efindex po-msgid-to-msgstr@r{, PO Mode command} 3558The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or 3559reinitializes the translation with the original string. This command is 3560normally used when the translator wants to redo a fresh translation of 3561the original string, disregarding any previous work. 3562 3563@evindex po-auto-edit-with-msgid@r{, PO Mode variable} 3564It is possible to arrange so, whenever editing an untranslated 3565entry, the @kbd{@key{LFD}} command be automatically executed. If you set 3566@code{po-auto-edit-with-msgid} to @code{t}, the translation gets 3567initialised with the original string, in case none exists already. 3568The default value for @code{po-auto-edit-with-msgid} is @code{nil}. 3569 3570@emindex starting a string translation 3571In fact, whether it is best to start a translation with an empty 3572string, or rather with a copy of the original string, is a matter of 3573taste or habit. Sometimes, the source language and the 3574target language are so different that is simply best to start writing 3575on an empty page. At other times, the source and target languages 3576are so close that it would be a waste to retype a number of words 3577already being written in the original string. A translator may also 3578like having the original string right under her eyes, as she will 3579progressively overwrite the original text with the translation, even 3580if this requires some extra editing work to get rid of the original. 3581 3582@emindex cut and paste for translated strings 3583@efindex k@r{, PO Mode command} 3584@efindex po-kill-msgstr@r{, PO Mode command} 3585@efindex w@r{, PO Mode command} 3586@efindex po-kill-ring-save-msgstr@r{, PO Mode command} 3587The command @kbd{k} (@code{po-kill-msgstr}) merely empties the 3588translation string, so turning the entry into an untranslated 3589one. But while doing so, its previous contents is put apart in 3590a special place, known as the kill ring. The command @kbd{w} 3591(@code{po-kill-ring-save-msgstr}) has also the effect of taking a 3592copy of the translation onto the kill ring, but it otherwise leaves 3593the entry alone, and does @emph{not} remove the translation from the 3594entry. Both commands use exactly the Emacs kill ring, which is shared 3595between buffers, and which is well known already to Emacs lovers. 3596 3597The translator may use @kbd{k} or @kbd{w} many times in the course 3598of her work, as the kill ring may hold several saved translations. 3599From the kill ring, strings may later be reinserted in various 3600Emacs buffers. In particular, the kill ring may be used for moving 3601translation strings between different entries of a single PO file 3602buffer, or if the translator is handling many such buffers at once, 3603even between PO files. 3604 3605To facilitate exchanges with buffers which are not in PO mode, the 3606translation string put on the kill ring by the @kbd{k} command is fully 3607unquoted before being saved: external quotes are removed, multi-line 3608strings are concatenated, and backslash escaped sequences are turned 3609into their corresponding characters. In the special case of obsolete 3610entries, the translation is also uncommented prior to saving. 3611 3612@efindex y@r{, PO Mode command} 3613@efindex po-yank-msgstr@r{, PO Mode command} 3614The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the 3615translation of the current entry by a string taken from the kill ring. 3616Following Emacs terminology, we then say that the replacement 3617string is @dfn{yanked} into the PO file buffer. 3618@xref{Yanking, , , emacs, The Emacs Editor}. 3619The first time @kbd{y} is used, the translation receives the value of 3620the most recent addition to the kill ring. If @kbd{y} is typed once 3621again, immediately, without intervening keystrokes, the translation 3622just inserted is taken away and replaced by the second most recent 3623addition to the kill ring. By repeating @kbd{y} many times in a row, 3624the translator may travel along the kill ring for saved strings, 3625until she finds the string she really wanted. 3626 3627When a string is yanked into a PO file entry, it is fully and 3628automatically requoted for complying with the format PO files should 3629have. Further, if the entry is obsolete, PO mode then appropriately 3630push the inserted string inside comments. Once again, translators 3631should not burden themselves with quoting considerations besides, of 3632course, the necessity of the translated string itself respective to 3633the program using it. 3634 3635Note that @kbd{k} or @kbd{w} are not the only commands pushing strings 3636on the kill ring, as almost any PO mode command replacing translation 3637strings (or the translator comments) automatically saves the old string 3638on the kill ring. The main exceptions to this general rule are the 3639yanking commands themselves. 3640 3641@emindex using obsolete translations to make new entries 3642To better illustrate the operation of killing and yanking, let's 3643use an actual example, taken from a common situation. When the 3644programmer slightly modifies some string right in the program, his 3645change is later reflected in the PO file by the appearance 3646of a new untranslated entry for the modified string, and the fact 3647that the entry translating the original or unmodified string becomes 3648obsolete. In many cases, the translator might spare herself some work 3649by retrieving the unmodified translation from the obsolete entry, 3650then initializing the untranslated entry @code{msgstr} field with 3651this retrieved translation. Once this done, the obsolete entry is 3652not wanted anymore, and may be safely deleted. 3653 3654When the translator finds an untranslated entry and suspects that a 3655slight variant of the translation exists, she immediately uses @kbd{m} 3656to mark the current entry location, then starts chasing obsolete 3657entries with @kbd{o}, hoping to find some translation corresponding 3658to the unmodified string. Once found, she uses the @kbd{@key{DEL}} command 3659for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills} 3660the translation, that is, pushes the translation on the kill ring. 3661Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y} 3662then @emph{yanks} the saved translation right into the @code{msgstr} 3663field. The translator is then free to use @kbd{@key{RET}} for fine 3664tuning the translation contents, and maybe to later use @kbd{u}, 3665then @kbd{m} again, for going on with the next untranslated string. 3666 3667When some sequence of keys has to be typed over and over again, the 3668translator may find it useful to become better acquainted with the Emacs 3669capability of learning these sequences and playing them back under request. 3670@xref{Keyboard Macros, , , emacs, The Emacs Editor}. 3671 3672@node Modifying Comments, Subedit, Modifying Translations, PO Mode 3673@subsection Modifying Comments 3674@cindex editing comments in PO files 3675@emindex editing comments 3676 3677Any translation work done seriously will raise many linguistic 3678difficulties, for which decisions have to be made, and the choices 3679further documented. These documents may be saved within the 3680PO file in form of translator comments, which the translator 3681is free to create, delete, or modify at will. These comments may 3682be useful to herself when she returns to this PO file after a while. 3683 3684Comments not having whitespace after the initial @samp{#}, for example, 3685those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator 3686comments, they are exclusively created by other @code{gettext} tools. 3687So, the commands below will never alter such system added comments, 3688they are not meant for the translator to modify. @xref{PO Files}. 3689 3690The following commands are somewhat similar to those modifying translations, 3691so the general indications given for those apply here. @xref{Modifying 3692Translations}. 3693 3694@table @kbd 3695 3696@item # 3697@efindex #@r{, PO Mode command} 3698Interactively edit the translator comments (@code{po-edit-comment}). 3699 3700@item K 3701@efindex K@r{, PO Mode command} 3702Save the translator comments on the kill ring, and delete it 3703(@code{po-kill-comment}). 3704 3705@item W 3706@efindex W@r{, PO Mode command} 3707Save the translator comments on the kill ring, without deleting it 3708(@code{po-kill-ring-save-comment}). 3709 3710@item Y 3711@efindex Y@r{, PO Mode command} 3712Replace the translator comments, taking the new from the kill ring 3713(@code{po-yank-comment}). 3714 3715@end table 3716 3717These commands parallel PO mode commands for modifying the translation 3718strings, and behave much the same way as they do, except that they handle 3719this part of PO file comments meant for translator usage, rather 3720than the translation strings. So, if the descriptions given below are 3721slightly succinct, it is because the full details have already been given. 3722@xref{Modifying Translations}. 3723 3724@efindex #@r{, PO Mode command} 3725@efindex po-edit-comment@r{, PO Mode command} 3726The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window 3727containing a copy of the translator comments on the current PO file entry. 3728If there are no such comments, PO mode understands that the translator wants 3729to add a comment to the entry, and she is presented with an empty screen. 3730Comment marks (@code{#}) and the space following them are automatically 3731removed before edition, and reinstated after. For translator comments 3732pertaining to obsolete entries, the uncommenting and recommenting operations 3733are done twice. Once in the editing window, the keys @w{@kbd{C-c C-c}} 3734allow the translator to tell she is finished with editing the comment. 3735@xref{Subedit}, for further details. 3736 3737@evindex po-subedit-mode-hook@r{, PO Mode variable} 3738Functions found on @code{po-subedit-mode-hook}, if any, are executed after 3739the string has been inserted in the edit buffer. 3740 3741@efindex K@r{, PO Mode command} 3742@efindex po-kill-comment@r{, PO Mode command} 3743@efindex W@r{, PO Mode command} 3744@efindex po-kill-ring-save-comment@r{, PO Mode command} 3745@efindex Y@r{, PO Mode command} 3746@efindex po-yank-comment@r{, PO Mode command} 3747The command @kbd{K} (@code{po-kill-comment}) gets rid of all 3748translator comments, while saving those comments on the kill ring. 3749The command @kbd{W} (@code{po-kill-ring-save-comment}) takes 3750a copy of the translator comments on the kill ring, but leaves 3751them undisturbed in the current entry. The command @kbd{Y} 3752(@code{po-yank-comment}) completely replaces the translator comments 3753by a string taken at the front of the kill ring. When this command 3754is immediately repeated, the comments just inserted are withdrawn, 3755and replaced by other strings taken along the kill ring. 3756 3757On the kill ring, all strings have the same nature. There is no 3758distinction between @emph{translation} strings and @emph{translator 3759comments} strings. So, for example, let's presume the translator 3760has just finished editing a translation, and wants to create a new 3761translator comment to document why the previous translation was 3762not good, just to remember what was the problem. Foreseeing that she 3763will do that in her documentation, the translator may want to quote 3764the previous translation in her translator comments. To do so, she 3765may initialize the translator comments with the previous translation, 3766still at the head of the kill ring. Because editing already pushed the 3767previous translation on the kill ring, she merely has to type @kbd{M-w} 3768prior to @kbd{#}, and the previous translation will be right there, 3769all ready for being introduced by some explanatory text. 3770 3771On the other hand, presume there are some translator comments already 3772and that the translator wants to add to those comments, instead 3773of wholly replacing them. Then, she should edit the comment right 3774away with @kbd{#}. Once inside the editing window, she can use the 3775regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y} 3776(@code{yank-pop}) to get the previous translation where she likes. 3777 3778@node Subedit, C Sources Context, Modifying Comments, PO Mode 3779@subsection Details of Sub Edition 3780@emindex subedit minor mode 3781 3782The PO subedit minor mode has a few peculiarities worth being described 3783in fuller detail. It installs a few commands over the usual editing set 3784of Emacs, which are described below. 3785 3786@table @kbd 3787@item C-c C-c 3788@efindex C-c C-c@r{, PO Mode command} 3789Complete edition (@code{po-subedit-exit}). 3790 3791@item C-c C-k 3792@efindex C-c C-k@r{, PO Mode command} 3793Abort edition (@code{po-subedit-abort}). 3794 3795@item C-c C-a 3796@efindex C-c C-a@r{, PO Mode command} 3797Consult auxiliary PO files (@code{po-subedit-cycle-auxiliary}). 3798 3799@end table 3800 3801@emindex exiting PO subedit 3802@efindex C-c C-c@r{, PO Mode command} 3803@efindex po-subedit-exit@r{, PO Mode command} 3804The window's contents represents a translation for a given message, 3805or a translator comment. The translator may modify this window to 3806her heart's content. Once this is done, the command @w{@kbd{C-c C-c}} 3807(@code{po-subedit-exit}) may be used to return the edited translation into 3808the PO file, replacing the original translation, even if it moved out of 3809sight or if buffers were switched. 3810 3811@efindex C-c C-k@r{, PO Mode command} 3812@efindex po-subedit-abort@r{, PO Mode command} 3813If the translator becomes unsatisfied with her translation or comment, 3814to the extent she prefers keeping what was existent prior to the 3815@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}} 3816(@code{po-subedit-abort}) to merely get rid of edition, while preserving 3817the original translation or comment. Another way would be for her to exit 3818normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the 3819whole effect of last edition. 3820 3821@efindex C-c C-a@r{, PO Mode command} 3822@efindex po-subedit-cycle-auxiliary@r{, PO Mode command} 3823The command @w{@kbd{C-c C-a}} (@code{po-subedit-cycle-auxiliary}) 3824allows for glancing through translations 3825already achieved in other languages, directly while editing the current 3826translation. This may be quite convenient when the translator is fluent 3827at many languages, but of course, only makes sense when such completed 3828auxiliary PO files are already available to her (@pxref{Auxiliary}). 3829 3830Functions found on @code{po-subedit-mode-hook}, if any, are executed after 3831the string has been inserted in the edit buffer. 3832 3833While editing her translation, the translator should pay attention to not 3834inserting unwanted @kbd{@key{RET}} (newline) characters at the end of 3835the translated string if those are not meant to be there, or to removing 3836such characters when they are required. Since these characters are not 3837visible in the editing buffer, they are easily introduced by mistake. 3838To help her, @kbd{@key{RET}} automatically puts the character @code{<} 3839at the end of the string being edited, but this @code{<} is not really 3840part of the string. On exiting the editing window with @w{@kbd{C-c C-c}}, 3841PO mode automatically removes such @kbd{<} and all whitespace added after 3842it. If the translator adds characters after the terminating @code{<}, it 3843looses its delimiting property and integrally becomes part of the string. 3844If she removes the delimiting @code{<}, then the edited string is taken 3845@emph{as is}, with all trailing newlines, even if invisible. Also, if 3846the translated string ought to end itself with a genuine @code{<}, then 3847the delimiting @code{<} may not be removed; so the string should appear, 3848in the editing window, as ending with two @code{<} in a row. 3849 3850@emindex editing multiple entries 3851When a translation (or a comment) is being edited, the translator may move 3852the cursor back into the PO file buffer and freely move to other entries, 3853browsing at will. If, with an edition pending, the translator wanders in the 3854PO file buffer, she may decide to start modifying another entry. Each entry 3855being edited has its own subedit buffer. It is possible to simultaneously 3856edit the translation @emph{and} the comment of a single entry, or to 3857edit entries in different PO files, all at once. Typing @kbd{@key{RET}} 3858on a field already being edited merely resumes that particular edit. Yet, 3859the translator should better be comfortable at handling many Emacs windows! 3860 3861@emindex pending subedits 3862Pending subedits may be completed or aborted in any order, regardless 3863of how or when they were started. When many subedits are pending and the 3864translator asks for quitting the PO file (with the @kbd{q} command), subedits 3865are automatically resumed one at a time, so she may decide for each of them. 3866 3867@node C Sources Context, Auxiliary, Subedit, PO Mode 3868@subsection C Sources Context 3869@emindex consulting program sources 3870@emindex looking at the source to aid translation 3871@emindex use the source, Luke 3872 3873PO mode is particularly powerful when used with PO files 3874created through GNU @code{gettext} utilities, as those utilities 3875insert special comments in the PO files they generate. 3876Some of these special comments relate the PO file entry to 3877exactly where the untranslated string appears in the program sources. 3878 3879When the translator gets to an untranslated entry, she is fairly 3880often faced with an original string which is not as informative as 3881it normally should be, being succinct, cryptic, or otherwise ambiguous. 3882Before choosing how to translate the string, she needs to understand 3883better what the string really means and how tight the translation has 3884to be. Most of the time, when problems arise, the only way left to make 3885her judgment is looking at the true program sources from where this 3886string originated, searching for surrounding comments the programmer 3887might have put in there, and looking around for helping clues of 3888@emph{any} kind. 3889 3890Surely, when looking at program sources, the translator will receive 3891more help if she is a fluent programmer. However, even if she is 3892not versed in programming and feels a little lost in C code, the 3893translator should not be shy at taking a look, once in a while. 3894It is most probable that she will still be able to find some of the 3895hints she needs. She will learn quickly to not feel uncomfortable 3896in program code, paying more attention to programmer's comments, 3897variable and function names (if he dared choosing them well), and 3898overall organization, than to the program code itself. 3899 3900@emindex find source fragment for a PO file entry 3901The following commands are meant to help the translator at getting 3902program source context for a PO file entry. 3903 3904@table @kbd 3905@item s 3906@efindex s@r{, PO Mode command} 3907Resume the display of a program source context, or cycle through them 3908(@code{po-cycle-source-reference}). 3909 3910@item M-s 3911@efindex M-s@r{, PO Mode command} 3912Display of a program source context selected by menu 3913(@code{po-select-source-reference}). 3914 3915@item S 3916@efindex S@r{, PO Mode command} 3917Add a directory to the search path for source files 3918(@code{po-consider-source-path}). 3919 3920@item M-S 3921@efindex M-S@r{, PO Mode command} 3922Delete a directory from the search path for source files 3923(@code{po-ignore-source-path}). 3924 3925@end table 3926 3927@efindex s@r{, PO Mode command} 3928@efindex po-cycle-source-reference@r{, PO Mode command} 3929@efindex M-s@r{, PO Mode command} 3930@efindex po-select-source-reference@r{, PO Mode command} 3931The commands @kbd{s} (@code{po-cycle-source-reference}) and @kbd{M-s} 3932(@code{po-select-source-reference}) both open another window displaying 3933some source program file, and already positioned in such a way that 3934it shows an actual use of the string to be translated. By doing 3935so, the command gives source program context for the string. But if 3936the entry has no source context references, or if all references 3937are unresolved along the search path for program sources, then the 3938command diagnoses this as an error. 3939 3940Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays 3941in the PO file window. If the translator really wants to 3942get into the program source window, she ought to do it explicitly, 3943maybe by using command @kbd{O}. 3944 3945When @kbd{s} is typed for the first time, or for a PO file entry which 3946is different of the last one used for getting source context, then the 3947command reacts by giving the first context available for this entry, 3948if any. If some context has already been recently displayed for the 3949current PO file entry, and the translator wandered off to do other 3950things, typing @kbd{s} again will merely resume, in another window, 3951the context last displayed. In particular, if the translator moved 3952the cursor away from the context in the source file, the command will 3953bring the cursor back to the context. By using @kbd{s} many times 3954in a row, with no other commands intervening, PO mode will cycle to 3955the next available contexts for this particular entry, getting back 3956to the first context once the last has been shown. 3957 3958The command @kbd{M-s} behaves differently. Instead of cycling through 3959references, it lets the translator choose a particular reference among 3960many, and displays that reference. It is best used with completion, 3961if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in 3962response to the question, she will be offered a menu of all possible 3963references, as a reminder of which are the acceptable answers. 3964This command is useful only where there are really many contexts 3965available for a single string to translate. 3966 3967@efindex S@r{, PO Mode command} 3968@efindex po-consider-source-path@r{, PO Mode command} 3969@efindex M-S@r{, PO Mode command} 3970@efindex po-ignore-source-path@r{, PO Mode command} 3971Program source files are usually found relative to where the PO 3972file stands. As a special provision, when this fails, the file is 3973also looked for, but relative to the directory immediately above it. 3974Those two cases take proper care of most PO files. However, it might 3975happen that a PO file has been moved, or is edited in a different 3976place than its normal location. When this happens, the translator 3977should tell PO mode in which directory normally sits the genuine PO 3978file. Many such directories may be specified, and all together, they 3979constitute what is called the @dfn{search path} for program sources. 3980The command @kbd{S} (@code{po-consider-source-path}) is used to interactively 3981enter a new directory at the front of the search path, and the command 3982@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion, 3983one of the directories she does not want anymore on the search path. 3984 3985@node Auxiliary, Compendium, C Sources Context, PO Mode 3986@subsection Consulting Auxiliary PO Files 3987@emindex consulting translations to other languages 3988 3989PO mode is able to help the knowledgeable translator, being fluent in 3990many languages, at taking advantage of translations already achieved 3991in other languages she just happens to know. It provides these other 3992language translations as additional context for her own work. Moreover, 3993it has features to ease the production of translations for many languages 3994at once, for translators preferring to work in this way. 3995 3996@cindex auxiliary PO file 3997@emindex auxiliary PO file 3998An @dfn{auxiliary} PO file is an existing PO file meant for the same 3999package the translator is working on, but targeted to a different mother 4000tongue language. Commands exist for declaring and handling auxiliary 4001PO files, and also for showing contexts for the entry under work. 4002 4003Here are the auxiliary file commands available in PO mode. 4004 4005@table @kbd 4006@item a 4007@efindex a@r{, PO Mode command} 4008Seek auxiliary files for another translation for the same entry 4009(@code{po-cycle-auxiliary}). 4010 4011@item C-c C-a 4012@efindex C-c C-a@r{, PO Mode command} 4013Switch to a particular auxiliary file (@code{po-select-auxiliary}). 4014 4015@item A 4016@efindex A@r{, PO Mode command} 4017Declare this PO file as an auxiliary file (@code{po-consider-as-auxiliary}). 4018 4019@item M-A 4020@efindex M-A@r{, PO Mode command} 4021Remove this PO file from the list of auxiliary files 4022(@code{po-ignore-as-auxiliary}). 4023 4024@end table 4025 4026@efindex A@r{, PO Mode command} 4027@efindex po-consider-as-auxiliary@r{, PO Mode command} 4028@efindex M-A@r{, PO Mode command} 4029@efindex po-ignore-as-auxiliary@r{, PO Mode command} 4030Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current 4031PO file to the list of auxiliary files, while command @kbd{M-A} 4032(@code{po-ignore-as-auxiliary} just removes it. 4033 4034@efindex a@r{, PO Mode command} 4035@efindex po-cycle-auxiliary@r{, PO Mode command} 4036The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO 4037files, round-robin, searching for a translated entry in some other language 4038having an @code{msgid} field identical as the one for the current entry. 4039The found PO file, if any, takes the place of the current PO file in 4040the display (its window gets on top). Before doing so, the current PO 4041file is also made into an auxiliary file, if not already. So, @kbd{a} 4042in this newly displayed PO file will seek another PO file, and so on, 4043so repeating @kbd{a} will eventually yield back the original PO file. 4044 4045@efindex C-c C-a@r{, PO Mode command} 4046@efindex po-select-auxiliary@r{, PO Mode command} 4047The command @kbd{C-c C-a} (@code{po-select-auxiliary}) asks the translator 4048for her choice of a particular auxiliary file, with completion, and 4049then switches to that selected PO file. The command also checks if 4050the selected file has an @code{msgid} field identical as the one for 4051the current entry, and if yes, this entry becomes current. Otherwise, 4052the cursor of the selected file is left undisturbed. 4053 4054For all this to work fully, auxiliary PO files will have to be normalized, 4055in that way that @code{msgid} fields should be written @emph{exactly} 4056the same way. It is possible to write @code{msgid} fields in various 4057ways for representing the same string, different writing would break the 4058proper behaviour of the auxiliary file commands of PO mode. This is not 4059expected to be much a problem in practice, as most existing PO files have 4060their @code{msgid} entries written by the same GNU @code{gettext} tools. 4061 4062@efindex normalize@r{, PO Mode command} 4063However, PO files initially created by PO mode itself, while marking 4064strings in source files, are normalised differently. So are PO 4065files resulting of the @samp{M-x normalize} command. Until these 4066discrepancies between PO mode and other GNU @code{gettext} tools get 4067fully resolved, the translator should stay aware of normalisation issues. 4068 4069@node Compendium, , Auxiliary, PO Mode 4070@subsection Using Translation Compendia 4071@emindex using translation compendia 4072 4073@cindex compendium 4074A @dfn{compendium} is a special PO file containing a set of 4075translations recurring in many different packages. The translator can 4076use gettext tools to build a new compendium, to add entries to her 4077compendium, and to initialize untranslated entries, or to update 4078already translated entries, from translations kept in the compendium. 4079 4080@menu 4081* Creating Compendia:: Merging translations for later use 4082* Using Compendia:: Using older translations if they fit 4083@end menu 4084 4085@node Creating Compendia, Using Compendia, Compendium, Compendium 4086@subsubsection Creating Compendia 4087@cindex creating compendia 4088@cindex compendium, creating 4089 4090Basically every PO file consisting of translated entries only can be 4091declared as a valid compendium. Often the translator wants to have 4092special compendia; let's consider two cases: @cite{concatenating PO 4093files} and @cite{extracting a message subset from a PO file}. 4094 4095@subsubsection Concatenate PO Files 4096 4097@cindex concatenating PO files into a compendium 4098@cindex accumulating translations 4099To concatenate several valid PO files into one compendium file you can 4100use @samp{msgcomm} or @samp{msgcat} (the latter preferred): 4101 4102@example 4103msgcat -o compendium.po file1.po file2.po 4104@end example 4105 4106By default, @code{msgcat} will accumulate divergent translations 4107for the same string. Those occurrences will be marked as @code{fuzzy} 4108and highly visible decorated; calling @code{msgcat} on 4109@file{file1.po}: 4110 4111@example 4112#: src/hello.c:200 4113#, c-format 4114msgid "Report bugs to <%s>.\n" 4115msgstr "Comunicar `bugs' a <%s>.\n" 4116@end example 4117 4118@noindent 4119and @file{file2.po}: 4120 4121@example 4122#: src/bye.c:100 4123#, c-format 4124msgid "Report bugs to <%s>.\n" 4125msgstr "Comunicar \"bugs\" a <%s>.\n" 4126@end example 4127 4128@noindent 4129will result in: 4130 4131@example 4132#: src/hello.c:200 src/bye.c:100 4133#, fuzzy, c-format 4134msgid "Report bugs to <%s>.\n" 4135msgstr "" 4136"#-#-#-#-# file1.po #-#-#-#-#\n" 4137"Comunicar `bugs' a <%s>.\n" 4138"#-#-#-#-# file2.po #-#-#-#-#\n" 4139"Comunicar \"bugs\" a <%s>.\n" 4140@end example 4141 4142@noindent 4143The translator will have to resolve this ``conflict'' manually; she 4144has to decide whether the first or the second version is appropriate 4145(or provide a new translation), to delete the ``marker lines'', and 4146finally to remove the @code{fuzzy} mark. 4147 4148If the translator knows in advance the first found translation of a 4149message is always the best translation she can make use to the 4150@samp{--use-first} switch: 4151 4152@example 4153msgcat --use-first -o compendium.po file1.po file2.po 4154@end example 4155 4156A good compendium file must not contain @code{fuzzy} or untranslated 4157entries. If input files are ``dirty'' you must preprocess the input 4158files or postprocess the result using @samp{msgattrib --translated --no-fuzzy}. 4159 4160@subsubsection Extract a Message Subset from a PO File 4161@cindex extracting parts of a PO file into a compendium 4162 4163Nobody wants to translate the same messages again and again; thus you 4164may wish to have a compendium file containing @file{getopt.c} messages. 4165 4166To extract a message subset (e.g., all @file{getopt.c} messages) from an 4167existing PO file into one compendium file you can use @samp{msggrep}: 4168 4169@example 4170msggrep --location src/getopt.c -o compendium.po file.po 4171@end example 4172 4173@node Using Compendia, , Creating Compendia, Compendium 4174@subsubsection Using Compendia 4175 4176You can use a compendium file to initialize a translation from scratch 4177or to update an already existing translation. 4178 4179@subsubsection Initialize a New Translation File 4180@cindex initialize translations from a compendium 4181 4182Since a PO file with translations does not exist the translator can 4183merely use @file{/dev/null} to fake the ``old'' translation file. 4184 4185@example 4186msgmerge --compendium compendium.po -o file.po /dev/null file.pot 4187@end example 4188 4189@subsubsection Update an Existing Translation File 4190@cindex update translations from a compendium 4191 4192Concatenate the compendium file(s) and the existing PO, merge the 4193result with the POT file and remove the obsolete entries (optional, 4194here done using @samp{sed}): 4195 4196@example 4197msgcat --use-first -o update.po compendium1.po compendium2.po file.po 4198msgmerge update.po file.pot | sed -e '/^#~/d' > file.po 4199@end example 4200 4201@node Manipulating, Binaries, Editing, Top 4202@chapter Manipulating PO Files 4203@cindex manipulating PO files 4204 4205Sometimes it is necessary to manipulate PO files in a way that is better 4206performed automatically than by hand. GNU @code{gettext} includes a 4207complete set of tools for this purpose. 4208 4209@cindex merging two PO files 4210When merging two packages into a single package, the resulting POT file 4211will be the concatenation of the two packages' POT files. Thus the 4212maintainer must concatenate the two existing package translations into 4213a single translation catalog, for each language. This is best performed 4214using @samp{msgcat}. It is then the translators' duty to deal with any 4215possible conflicts that arose during the merge. 4216 4217@cindex encoding conversion 4218When a translator takes over the translation job from another translator, 4219but she uses a different character encoding in her locale, she will 4220convert the catalog to her character encoding. This is best done through 4221the @samp{msgconv} program. 4222 4223When a maintainer takes a source file with tagged messages from another 4224package, he should also take the existing translations for this source 4225file (and not let the translators do the same job twice). One way to do 4226this is through @samp{msggrep}, another is to create a POT file for 4227that source file and use @samp{msgmerge}. 4228 4229@cindex dialect 4230@cindex orthography 4231When a translator wants to adjust some translation catalog for a special 4232dialect or orthography --- for example, German as written in Switzerland 4233versus German as written in Germany --- she needs to apply some text 4234processing to every message in the catalog. The tool for doing this is 4235@samp{msgfilter}. 4236 4237Another use of @code{msgfilter} is to produce approximately the POT file for 4238which a given PO file was made. This can be done through a filter command 4239like @samp{msgfilter sed -e d | sed -e '/^# /d'}. Note that the original 4240POT file may have had different comments and different plural message counts, 4241that's why it's better to use the original POT file if available. 4242 4243@cindex checking of translations 4244When a translator wants to check her translations, for example according 4245to orthography rules or using a non-interactive spell checker, she can do 4246so using the @samp{msgexec} program. 4247 4248@cindex duplicate elimination 4249When third party tools create PO or POT files, sometimes duplicates cannot 4250be avoided. But the GNU @code{gettext} tools give an error when they 4251encounter duplicate msgids in the same file and in the same domain. 4252To merge duplicates, the @samp{msguniq} program can be used. 4253 4254@samp{msgcomm} is a more general tool for keeping or throwing away 4255duplicates, occurring in different files. 4256 4257@samp{msgcmp} can be used to check whether a translation catalog is 4258completely translated. 4259 4260@cindex attributes, manipulating 4261@samp{msgattrib} can be used to select and extract only the fuzzy 4262or untranslated messages of a translation catalog. 4263 4264@samp{msgen} is useful as a first step for preparing English translation 4265catalogs. It copies each message's msgid to its msgstr. 4266 4267Finally, for those applications where all these various programs are not 4268sufficient, a library @samp{libgettextpo} is provided that can be used to 4269write other specialized programs that process PO files. 4270 4271@menu 4272* msgcat Invocation:: Invoking the @code{msgcat} Program 4273* msgconv Invocation:: Invoking the @code{msgconv} Program 4274* msggrep Invocation:: Invoking the @code{msggrep} Program 4275* msgfilter Invocation:: Invoking the @code{msgfilter} Program 4276* msguniq Invocation:: Invoking the @code{msguniq} Program 4277* msgcomm Invocation:: Invoking the @code{msgcomm} Program 4278* msgcmp Invocation:: Invoking the @code{msgcmp} Program 4279* msgattrib Invocation:: Invoking the @code{msgattrib} Program 4280* msgen Invocation:: Invoking the @code{msgen} Program 4281* msgexec Invocation:: Invoking the @code{msgexec} Program 4282* libgettextpo:: Writing your own programs that process PO files 4283@end menu 4284 4285@node msgcat Invocation, msgconv Invocation, Manipulating, Manipulating 4286@section Invoking the @code{msgcat} Program 4287 4288@include msgcat.texi 4289 4290@node msgconv Invocation, msggrep Invocation, msgcat Invocation, Manipulating 4291@section Invoking the @code{msgconv} Program 4292 4293@include msgconv.texi 4294 4295@node msggrep Invocation, msgfilter Invocation, msgconv Invocation, Manipulating 4296@section Invoking the @code{msggrep} Program 4297 4298@include msggrep.texi 4299 4300@node msgfilter Invocation, msguniq Invocation, msggrep Invocation, Manipulating 4301@section Invoking the @code{msgfilter} Program 4302 4303@include msgfilter.texi 4304 4305@node msguniq Invocation, msgcomm Invocation, msgfilter Invocation, Manipulating 4306@section Invoking the @code{msguniq} Program 4307 4308@include msguniq.texi 4309 4310@node msgcomm Invocation, msgcmp Invocation, msguniq Invocation, Manipulating 4311@section Invoking the @code{msgcomm} Program 4312 4313@include msgcomm.texi 4314 4315@node msgcmp Invocation, msgattrib Invocation, msgcomm Invocation, Manipulating 4316@section Invoking the @code{msgcmp} Program 4317 4318@include msgcmp.texi 4319 4320@node msgattrib Invocation, msgen Invocation, msgcmp Invocation, Manipulating 4321@section Invoking the @code{msgattrib} Program 4322 4323@include msgattrib.texi 4324 4325@node msgen Invocation, msgexec Invocation, msgattrib Invocation, Manipulating 4326@section Invoking the @code{msgen} Program 4327 4328@include msgen.texi 4329 4330@node msgexec Invocation, libgettextpo, msgen Invocation, Manipulating 4331@section Invoking the @code{msgexec} Program 4332 4333@include msgexec.texi 4334 4335@node libgettextpo, , msgexec Invocation, Manipulating 4336@section Writing your own programs that process PO files 4337 4338For the tasks for which a combination of @samp{msgattrib}, @samp{msgcat} etc. 4339is not sufficient, a set of C functions is provided in a library, to make it 4340possible to process PO files in your own programs. When you use this library, 4341you don't need to write routines to parse the PO file; instead, you retrieve 4342a pointer in memory to each of messages contained in the PO file. Functions 4343for writing PO files are not provided at this time. 4344 4345The functions are declared in the header file @samp{<gettext-po.h>}, and are 4346defined in a library called @samp{libgettextpo}. 4347 4348@deftp {Data Type} po_file_t 4349This is a pointer type that refers to the contents of a PO file, after it has 4350been read into memory. 4351@end deftp 4352 4353@deftp {Data Type} po_message_iterator_t 4354This is a pointer type that refers to an iterator that produces a sequence of 4355messages. 4356@end deftp 4357 4358@deftp {Data Type} po_message_t 4359This is a pointer type that refers to a message of a PO file, including its 4360translation. 4361@end deftp 4362 4363@deftypefun po_file_t po_file_read (const char *@var{filename}) 4364The @code{po_file_read} function reads a PO file into memory. The file name 4365is given as argument. The return value is a handle to the PO file's contents, 4366valid until @code{po_file_free} is called on it. In case of error, the return 4367value is @code{NULL}, and @code{errno} is set. 4368@end deftypefun 4369 4370@deftypefun void po_file_free (po_file_t @var{file}) 4371The @code{po_file_free} function frees a PO file's contents from memory, 4372including all messages that are only implicitly accessible through iterators. 4373@end deftypefun 4374 4375@deftypefun {const char * const *} po_file_domains (po_file_t @var{file}) 4376The @code{po_file_domains} function returns the domains for which the given 4377PO file has messages. The return value is a @code{NULL} terminated array 4378which is valid as long as the @var{file} handle is valid. For PO files which 4379contain no @samp{domain} directive, the return value contains only one domain, 4380namely the default domain @code{"messages"}. 4381@end deftypefun 4382 4383@deftypefun po_message_iterator_t po_message_iterator (po_file_t @var{file}, const char *@var{domain}) 4384The @code{po_message_iterator} returns an iterator that will produce the 4385messages of @var{file} that belong to the given @var{domain}. If @var{domain} 4386is @code{NULL}, the default domain is used instead. To list the messages, 4387use the function @code{po_next_message} repeatedly. 4388@end deftypefun 4389 4390@deftypefun void po_message_iterator_free (po_message_iterator_t @var{iterator}) 4391The @code{po_message_iterator_free} function frees an iterator previously 4392allocated through the @code{po_message_iterator} function. 4393@end deftypefun 4394 4395@deftypefun po_message_t po_next_message (po_message_iterator_t @var{iterator}) 4396The @code{po_next_message} function returns the next message from 4397@var{iterator} and advances the iterator. It returns @code{NULL} when the 4398iterator has reached the end of its message list. 4399@end deftypefun 4400 4401The following functions returns details of a @code{po_message_t}. Recall 4402that the results are valid as long as the @var{file} handle is valid. 4403 4404@deftypefun {const char *} po_message_msgid (po_message_t @var{message}) 4405The @code{po_message_msgid} function returns the @code{msgid} (untranslated 4406English string) of a message. This is guaranteed to be non-@code{NULL}. 4407@end deftypefun 4408 4409@deftypefun {const char *} po_message_msgid_plural (po_message_t @var{message}) 4410The @code{po_message_msgid_plural} function returns the @code{msgid_plural} 4411(untranslated English plural string) of a message with plurals, or @code{NULL} 4412for a message without plural. 4413@end deftypefun 4414 4415@deftypefun {const char *} po_message_msgstr (po_message_t @var{message}) 4416The @code{po_message_msgstr} function returns the @code{msgstr} (translation) 4417of a message. For an untranslated message, the return value is an empty 4418string. 4419@end deftypefun 4420 4421@deftypefun {const char *} po_message_msgstr_plural (po_message_t @var{message}, int @var{index}) 4422The @code{po_message_msgstr_plural} function returns the 4423@code{msgstr[@var{index}]} of a message with plurals, or @code{NULL} when 4424the @var{index} is out of range or for a message without plural. 4425@end deftypefun 4426 4427Here is an example code how these functions can be used. 4428 4429@example 4430const char *filename = @dots{}; 4431po_file_t file = po_file_read (filename); 4432 4433if (file == NULL) 4434 error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename); 4435@{ 4436 const char * const *domains = po_file_domains (file); 4437 const char * const *domainp; 4438 4439 for (domainp = domains; *domainp; domainp++) 4440 @{ 4441 const char *domain = *domainp; 4442 po_message_iterator_t iterator = po_message_iterator (file, domain); 4443 4444 for (;;) 4445 @{ 4446 po_message_t *message = po_next_message (iterator); 4447 4448 if (message == NULL) 4449 break; 4450 @{ 4451 const char *msgid = po_message_msgid (message); 4452 const char *msgstr = po_message_msgstr (message); 4453 4454 @dots{} 4455 @} 4456 @} 4457 po_message_iterator_free (iterator); 4458 @} 4459@} 4460po_file_free (file); 4461@end example 4462 4463@node Binaries, Programmers, Manipulating, Top 4464@chapter Producing Binary MO Files 4465 4466@c FIXME: Rewrite. 4467 4468@menu 4469* msgfmt Invocation:: Invoking the @code{msgfmt} Program 4470* msgunfmt Invocation:: Invoking the @code{msgunfmt} Program 4471* MO Files:: The Format of GNU MO Files 4472@end menu 4473 4474@node msgfmt Invocation, msgunfmt Invocation, Binaries, Binaries 4475@section Invoking the @code{msgfmt} Program 4476 4477@include msgfmt.texi 4478 4479@node msgunfmt Invocation, MO Files, msgfmt Invocation, Binaries 4480@section Invoking the @code{msgunfmt} Program 4481 4482@include msgunfmt.texi 4483 4484@node MO Files, , msgunfmt Invocation, Binaries 4485@section The Format of GNU MO Files 4486@cindex MO file's format 4487@cindex file format, @file{.mo} 4488 4489The format of the generated MO files is best described by a picture, 4490which appears below. 4491 4492@cindex magic signature of MO files 4493The first two words serve the identification of the file. The magic 4494number will always signal GNU MO files. The number is stored in the 4495byte order of the generating machine, so the magic number really is 4496two numbers: @code{0x950412de} and @code{0xde120495}. The second 4497word describes the current revision of the file format. For now the 4498revision is 0. This might change in future versions, and ensures 4499that the readers of MO files can distinguish new formats from old 4500ones, so that both can be handled correctly. The version is kept 4501separate from the magic number, instead of using different magic 4502numbers for different formats, mainly because @file{/etc/magic} is 4503not updated often. It might be better to have magic separated from 4504internal format version identification. 4505 4506Follow a number of pointers to later tables in the file, allowing 4507for the extension of the prefix part of MO files without having to 4508recompile programs reading them. This might become useful for later 4509inserting a few flag bits, indication about the charset used, new 4510tables, or other things. 4511 4512Then, at offset @var{O} and offset @var{T} in the picture, two tables 4513of string descriptors can be found. In both tables, each string 4514descriptor uses two 32 bits integers, one for the string length, 4515another for the offset of the string in the MO file, counting in bytes 4516from the start of the file. The first table contains descriptors 4517for the original strings, and is sorted so the original strings 4518are in increasing lexicographical order. The second table contains 4519descriptors for the translated strings, and is parallel to the first 4520table: to find the corresponding translation one has to access the 4521array slot in the second array with the same index. 4522 4523Having the original strings sorted enables the use of simple binary 4524search, for when the MO file does not contain an hashing table, or 4525for when it is not practical to use the hashing table provided in 4526the MO file. This also has another advantage, as the empty string 4527in a PO file GNU @code{gettext} is usually @emph{translated} into 4528some system information attached to that particular MO file, and the 4529empty string necessarily becomes the first in both the original and 4530translated tables, making the system information very easy to find. 4531 4532@cindex hash table, inside MO files 4533The size @var{S} of the hash table can be zero. In this case, the 4534hash table itself is not contained in the MO file. Some people might 4535prefer this because a precomputed hashing table takes disk space, and 4536does not win @emph{that} much speed. The hash table contains indices 4537to the sorted array of strings in the MO file. Conflict resolution is 4538done by double hashing. The precise hashing algorithm used is fairly 4539dependent on GNU @code{gettext} code, and is not documented here. 4540 4541As for the strings themselves, they follow the hash file, and each 4542is terminated with a @key{NUL}, and this @key{NUL} is not counted in 4543the length which appears in the string descriptor. The @code{msgfmt} 4544program has an option selecting the alignment for MO file strings. 4545With this option, each string is separately aligned so it starts at 4546an offset which is a multiple of the alignment value. On some RISC 4547machines, a correct alignment will speed things up. 4548 4549@cindex context, in MO files 4550Contexts are stored by storing the concatenation of the context, a 4551@key{EOT} byte, and the original string, instead of the original string. 4552 4553@cindex plural forms, in MO files 4554Plural forms are stored by letting the plural of the original string 4555follow the singular of the original string, separated through a 4556@key{NUL} byte. The length which appears in the string descriptor 4557includes both. However, only the singular of the original string 4558takes part in the hash table lookup. The plural variants of the 4559translation are all stored consecutively, separated through a 4560@key{NUL} byte. Here also, the length in the string descriptor 4561includes all of them. 4562 4563Nothing prevents a MO file from having embedded @key{NUL}s in strings. 4564However, the program interface currently used already presumes 4565that strings are @key{NUL} terminated, so embedded @key{NUL}s are 4566somewhat useless. But the MO file format is general enough so other 4567interfaces would be later possible, if for example, we ever want to 4568implement wide characters right in MO files, where @key{NUL} bytes may 4569accidentally appear. (No, we don't want to have wide characters in MO 4570files. They would make the file unnecessarily large, and the 4571@samp{wchar_t} type being platform dependent, MO files would be 4572platform dependent as well.) 4573 4574This particular issue has been strongly debated in the GNU 4575@code{gettext} development forum, and it is expectable that MO file 4576format will evolve or change over time. It is even possible that many 4577formats may later be supported concurrently. But surely, we have to 4578start somewhere, and the MO file format described here is a good start. 4579Nothing is cast in concrete, and the format may later evolve fairly 4580easily, so we should feel comfortable with the current approach. 4581 4582@example 4583@group 4584 byte 4585 +------------------------------------------+ 4586 0 | magic number = 0x950412de | 4587 | | 4588 4 | file format revision = 0 | 4589 | | 4590 8 | number of strings | == N 4591 | | 4592 12 | offset of table with original strings | == O 4593 | | 4594 16 | offset of table with translation strings | == T 4595 | | 4596 20 | size of hashing table | == S 4597 | | 4598 24 | offset of hashing table | == H 4599 | | 4600 . . 4601 . (possibly more entries later) . 4602 . . 4603 | | 4604 O | length & offset 0th string ----------------. 4605 O + 8 | length & offset 1st string ------------------. 4606 ... ... | | 4607O + ((N-1)*8)| length & offset (N-1)th string | | | 4608 | | | | 4609 T | length & offset 0th translation ---------------. 4610 T + 8 | length & offset 1st translation -----------------. 4611 ... ... | | | | 4612T + ((N-1)*8)| length & offset (N-1)th translation | | | | | 4613 | | | | | | 4614 H | start hash table | | | | | 4615 ... ... | | | | 4616 H + S * 4 | end hash table | | | | | 4617 | | | | | | 4618 | NUL terminated 0th string <----------------' | | | 4619 | | | | | 4620 | NUL terminated 1st string <------------------' | | 4621 | | | | 4622 ... ... | | 4623 | | | | 4624 | NUL terminated 0th translation <---------------' | 4625 | | | 4626 | NUL terminated 1st translation <-----------------' 4627 | | 4628 ... ... 4629 | | 4630 +------------------------------------------+ 4631@end group 4632@end example 4633 4634@node Programmers, Translators, Binaries, Top 4635@chapter The Programmer's View 4636 4637@c FIXME: Reorganize whole chapter. 4638 4639One aim of the current message catalog implementation provided by 4640GNU @code{gettext} was to use the system's message catalog handling, if the 4641installer wishes to do so. So we perhaps should first take a look at 4642the solutions we know about. The people in the POSIX committee did not 4643manage to agree on one of the semi-official standards which we'll 4644describe below. In fact they couldn't agree on anything, so they decided 4645only to include an example of an interface. The major Unix vendors 4646are split in the usage of the two most important specifications: X/Open's 4647catgets vs. Uniforum's gettext interface. We'll describe them both and 4648later explain our solution of this dilemma. 4649 4650@menu 4651* catgets:: About @code{catgets} 4652* gettext:: About @code{gettext} 4653* Comparison:: Comparing the two interfaces 4654* Using libintl.a:: Using libintl.a in own programs 4655* gettext grok:: Being a @code{gettext} grok 4656* Temp Programmers:: Temporary Notes for the Programmers Chapter 4657@end menu 4658 4659@node catgets, gettext, Programmers, Programmers 4660@section About @code{catgets} 4661@cindex @code{catgets}, X/Open specification 4662 4663The @code{catgets} implementation is defined in the X/Open Portability 4664Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the 4665process of creating this standard seemed to be too slow for some of 4666the Unix vendors so they created their implementations on preliminary 4667versions of the standard. Of course this leads again to problems while 4668writing platform independent programs: even the usage of @code{catgets} 4669does not guarantee a unique interface. 4670 4671Another, personal comment on this that only a bunch of committee members 4672could have made this interface. They never really tried to program 4673using this interface. It is a fast, memory-saving implementation, an 4674user can happily live with it. But programmers hate it (at least I and 4675some others do@dots{}) 4676 4677But we must not forget one point: after all the trouble with transferring 4678the rights on Unix(tm) they at last came to X/Open, the very same who 4679published this specification. This leads me to making the prediction 4680that this interface will be in future Unix standards (e.g.@: Spec1170) and 4681therefore part of all Unix implementation (implementations, which are 4682@emph{allowed} to wear this name). 4683 4684@menu 4685* Interface to catgets:: The interface 4686* Problems with catgets:: Problems with the @code{catgets} interface?! 4687@end menu 4688 4689@node Interface to catgets, Problems with catgets, catgets, catgets 4690@subsection The Interface 4691@cindex interface to @code{catgets} 4692 4693The interface to the @code{catgets} implementation consists of three 4694functions which correspond to those used in file access: @code{catopen} 4695to open the catalog for using, @code{catgets} for accessing the message 4696tables, and @code{catclose} for closing after work is done. Prototypes 4697for the functions and the needed definitions are in the 4698@code{<nl_types.h>} header file. 4699 4700@cindex @code{catopen}, a @code{catgets} function 4701@code{catopen} is used like in this: 4702 4703@example 4704nl_catd catd = catopen ("catalog_name", 0); 4705@end example 4706 4707The function takes as the argument the name of the catalog. This usual 4708refers to the name of the program or the package. The second parameter 4709is not further specified in the standard. I don't even know whether it 4710is implemented consistently among various systems. So the common advice 4711is to use @code{0} as the value. The return value is a handle to the 4712message catalog, equivalent to handles to file returned by @code{open}. 4713 4714@cindex @code{catgets}, a @code{catgets} function 4715This handle is of course used in the @code{catgets} function which can 4716be used like this: 4717 4718@example 4719char *translation = catgets (catd, set_no, msg_id, "original string"); 4720@end example 4721 4722The first parameter is this catalog descriptor. The second parameter 4723specifies the set of messages in this catalog, in which the message 4724described by @code{msg_id} is obtained. @code{catgets} therefore uses a 4725three-stage addressing: 4726 4727@display 4728catalog name @result{} set number @result{} message ID @result{} translation 4729@end display 4730 4731@c Anybody else loving Haskell??? :-) -- Uli 4732 4733The fourth argument is not used to address the translation. It is given 4734as a default value in case when one of the addressing stages fail. One 4735important thing to remember is that although the return type of catgets 4736is @code{char *} the resulting string @emph{must not} be changed. It 4737should better be @code{const char *}, but the standard is published in 47381988, one year before ANSI C. 4739 4740@noindent 4741@cindex @code{catclose}, a @code{catgets} function 4742The last of these functions is used and behaves as expected: 4743 4744@example 4745catclose (catd); 4746@end example 4747 4748After this no @code{catgets} call using the descriptor is legal anymore. 4749 4750@node Problems with catgets, , Interface to catgets, catgets 4751@subsection Problems with the @code{catgets} Interface?! 4752@cindex problems with @code{catgets} interface 4753 4754Now that this description seemed to be really easy --- where are the 4755problems we speak of? In fact the interface could be used in a 4756reasonable way, but constructing the message catalogs is a pain. The 4757reason for this lies in the third argument of @code{catgets}: the unique 4758message ID. This has to be a numeric value for all messages in a single 4759set. Perhaps you could imagine the problems keeping such a list while 4760changing the source code. Add a new message here, remove one there. Of 4761course there have been developed a lot of tools helping to organize this 4762chaos but one as the other fails in one aspect or the other. We don't 4763want to say that the other approach has no problems but they are far 4764more easy to manage. 4765 4766@node gettext, Comparison, catgets, Programmers 4767@section About @code{gettext} 4768@cindex @code{gettext}, a programmer's view 4769 4770The definition of the @code{gettext} interface comes from a Uniforum 4771proposal. It was submitted there by Sun, who had implemented the 4772@code{gettext} function in SunOS 4, around 1990. Nowadays, the 4773@code{gettext} interface is specified by the OpenI18N standard. 4774 4775The main point about this solution is that it does not follow the 4776method of normal file handling (open-use-close) and that it does not 4777burden the programmer with so many tasks, especially the unique key handling. 4778Of course here also a unique key is needed, but this key is the message 4779itself (how long or short it is). See @ref{Comparison} for a more 4780detailed comparison of the two methods. 4781 4782The following section contains a rather detailed description of the 4783interface. We make it that detailed because this is the interface 4784we chose for the GNU @code{gettext} Library. Programmers interested 4785in using this library will be interested in this description. 4786 4787@menu 4788* Interface to gettext:: The interface 4789* Ambiguities:: Solving ambiguities 4790* Locating Catalogs:: Locating message catalog files 4791* Charset conversion:: How to request conversion to Unicode 4792* Contexts:: Solving ambiguities in GUI programs 4793* Plural forms:: Additional functions for handling plurals 4794* Optimized gettext:: Optimization of the *gettext functions 4795@end menu 4796 4797@node Interface to gettext, Ambiguities, gettext, gettext 4798@subsection The Interface 4799@cindex @code{gettext} interface 4800 4801The minimal functionality an interface must have is a) to select a 4802domain the strings are coming from (a single domain for all programs is 4803not reasonable because its construction and maintenance is difficult, 4804perhaps impossible) and b) to access a string in a selected domain. 4805 4806This is principally the description of the @code{gettext} interface. It 4807has a global domain which unqualified usages reference. Of course this 4808domain is selectable by the user. 4809 4810@example 4811char *textdomain (const char *domain_name); 4812@end example 4813 4814This provides the possibility to change or query the current status of 4815the current global domain of the @code{LC_MESSAGE} category. The 4816argument is a null-terminated string, whose characters must be legal in 4817the use in filenames. If the @var{domain_name} argument is @code{NULL}, 4818the function returns the current value. If no value has been set 4819before, the name of the default domain is returned: @emph{messages}. 4820Please note that although the return value of @code{textdomain} is of 4821type @code{char *} no changing is allowed. It is also important to know 4822that no checks of the availability are made. If the name is not 4823available you will see this by the fact that no translations are provided. 4824 4825@noindent 4826To use a domain set by @code{textdomain} the function 4827 4828@example 4829char *gettext (const char *msgid); 4830@end example 4831 4832@noindent 4833is to be used. This is the simplest reasonable form one can imagine. 4834The translation of the string @var{msgid} is returned if it is available 4835in the current domain. If it is not available, the argument itself is 4836returned. If the argument is @code{NULL} the result is undefined. 4837 4838One thing which should come into mind is that no explicit dependency to 4839the used domain is given. The current value of the domain for the 4840@code{LC_MESSAGES} locale is used. If this changes between two 4841executions of the same @code{gettext} call in the program, both calls 4842reference a different message catalog. 4843 4844For the easiest case, which is normally used in internationalized 4845packages, once at the beginning of execution a call to @code{textdomain} 4846is issued, setting the domain to a unique name, normally the package 4847name. In the following code all strings which have to be translated are 4848filtered through the gettext function. That's all, the package speaks 4849your language. 4850 4851@node Ambiguities, Locating Catalogs, Interface to gettext, gettext 4852@subsection Solving Ambiguities 4853@cindex several domains 4854@cindex domain ambiguities 4855@cindex large package 4856 4857While this single name domain works well for most applications there 4858might be the need to get translations from more than one domain. Of 4859course one could switch between different domains with calls to 4860@code{textdomain}, but this is really not convenient nor is it fast. A 4861possible situation could be one case subject to discussion during this 4862writing: all 4863error messages of functions in the set of common used functions should 4864go into a separate domain @code{error}. By this mean we would only need 4865to translate them once. 4866Another case are messages from a library, as these @emph{have} to be 4867independent of the current domain set by the application. 4868 4869@noindent 4870For this reasons there are two more functions to retrieve strings: 4871 4872@example 4873char *dgettext (const char *domain_name, const char *msgid); 4874char *dcgettext (const char *domain_name, const char *msgid, 4875 int category); 4876@end example 4877 4878Both take an additional argument at the first place, which corresponds 4879to the argument of @code{textdomain}. The third argument of 4880@code{dcgettext} allows to use another locale but @code{LC_MESSAGES}. 4881But I really don't know where this can be useful. If the 4882@var{domain_name} is @code{NULL} or @var{category} has an value beside 4883the known ones, the result is undefined. It should also be noted that 4884this function is not part of the second known implementation of this 4885function family, the one found in Solaris. 4886 4887A second ambiguity can arise by the fact, that perhaps more than one 4888domain has the same name. This can be solved by specifying where the 4889needed message catalog files can be found. 4890 4891@example 4892char *bindtextdomain (const char *domain_name, 4893 const char *dir_name); 4894@end example 4895 4896Calling this function binds the given domain to a file in the specified 4897directory (how this file is determined follows below). Especially a 4898file in the systems default place is not favored against the specified 4899file anymore (as it would be by solely using @code{textdomain}). A 4900@code{NULL} pointer for the @var{dir_name} parameter returns the binding 4901associated with @var{domain_name}. If @var{domain_name} itself is 4902@code{NULL} nothing happens and a @code{NULL} pointer is returned. Here 4903again as for all the other functions is true that none of the return 4904value must be changed! 4905 4906It is important to remember that relative path names for the 4907@var{dir_name} parameter can be trouble. Since the path is always 4908computed relative to the current directory different results will be 4909achieved when the program executes a @code{chdir} command. Relative 4910paths should always be avoided to avoid dependencies and 4911unreliabilities. 4912 4913@node Locating Catalogs, Charset conversion, Ambiguities, gettext 4914@subsection Locating Message Catalog Files 4915@cindex message catalog files location 4916 4917Because many different languages for many different packages have to be 4918stored we need some way to add these information to file message catalog 4919files. The way usually used in Unix environments is have this encoding 4920in the file name. This is also done here. The directory name given in 4921@code{bindtextdomain}s second argument (or the default directory), 4922followed by the value and name of the locale and the domain name are 4923concatenated: 4924 4925@example 4926@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo 4927@end example 4928 4929The default value for @var{dir_name} is system specific. For the GNU 4930library, and for packages adhering to its conventions, it's: 4931@example 4932/usr/local/share/locale 4933@end example 4934 4935@noindent 4936@var{locale} is the value of the locale whose name is this 4937@code{LC_@var{category}}. For @code{gettext} and @code{dgettext} this 4938@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some 4939system, eg Ultrix, don't have @code{LC_MESSAGES}. Here we use a more or 4940less arbitrary value for it, namely 1729, the smallest positive integer 4941which can be represented in two different ways as the sum of two cubes.} 4942The value of the locale is determined through 4943@code{setlocale (LC_@var{category}, NULL)}. 4944@footnote{When the system does not support @code{setlocale} its behavior 4945in setting the locale values is simulated by looking at the environment 4946variables.} 4947@code{dcgettext} specifies the locale category by the third argument. 4948 4949@node Charset conversion, Contexts, Locating Catalogs, gettext 4950@subsection How to specify the output character set @code{gettext} uses 4951@cindex charset conversion at runtime 4952@cindex encoding conversion at runtime 4953 4954@code{gettext} not only looks up a translation in a message catalog. It 4955also converts the translation on the fly to the desired output character 4956set. This is useful if the user is working in a different character set 4957than the translator who created the message catalog, because it avoids 4958distributing variants of message catalogs which differ only in the 4959character set. 4960 4961The output character set is, by default, the value of @code{nl_langinfo 4962(CODESET)}, which depends on the @code{LC_CTYPE} part of the current 4963locale. But programs which store strings in a locale independent way 4964(e.g.@: UTF-8) can request that @code{gettext} and related functions 4965return the translations in that encoding, by use of the 4966@code{bind_textdomain_codeset} function. 4967 4968Note that the @var{msgid} argument to @code{gettext} is not subject to 4969character set conversion. Also, when @code{gettext} does not find a 4970translation for @var{msgid}, it returns @var{msgid} unchanged -- 4971independently of the current output character set. It is therefore 4972recommended that all @var{msgid}s be US-ASCII strings. 4973 4974@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset}) 4975The @code{bind_textdomain_codeset} function can be used to specify the 4976output character set for message catalogs for domain @var{domainname}. 4977The @var{codeset} argument must be a valid codeset name which can be used 4978for the @code{iconv_open} function, or a null pointer. 4979 4980If the @var{codeset} parameter is the null pointer, 4981@code{bind_textdomain_codeset} returns the currently selected codeset 4982for the domain with the name @var{domainname}. It returns @code{NULL} if 4983no codeset has yet been selected. 4984 4985The @code{bind_textdomain_codeset} function can be used several times. 4986If used multiple times with the same @var{domainname} argument, the 4987later call overrides the settings made by the earlier one. 4988 4989The @code{bind_textdomain_codeset} function returns a pointer to a 4990string containing the name of the selected codeset. The string is 4991allocated internally in the function and must not be changed by the 4992user. If the system went out of core during the execution of 4993@code{bind_textdomain_codeset}, the return value is @code{NULL} and the 4994global variable @var{errno} is set accordingly. 4995@end deftypefun 4996 4997@node Contexts, Plural forms, Charset conversion, gettext 4998@subsection Using contexts for solving ambiguities 4999@cindex context 5000@cindex GUI programs 5001@cindex translating menu entries 5002@cindex menu entries 5003 5004One place where the @code{gettext} functions, if used normally, have big 5005problems is within programs with graphical user interfaces (GUIs). The 5006problem is that many of the strings which have to be translated are very 5007short. They have to appear in pull-down menus which restricts the 5008length. But strings which are not containing entire sentences or at 5009least large fragments of a sentence may appear in more than one 5010situation in the program but might have different translations. This is 5011especially true for the one-word strings which are frequently used in 5012GUI programs. 5013 5014As a consequence many people say that the @code{gettext} approach is 5015wrong and instead @code{catgets} should be used which indeed does not 5016have this problem. But there is a very simple and powerful method to 5017handle this kind of problems with the @code{gettext} functions. 5018 5019Contexts can be added to strings to be translated. A context dependent 5020translation lookup is when a translation for a given string is searched, 5021that is limited to a given context. The translation for the same string 5022in a different context can be different. The different translations of 5023the same string in different contexts can be stored in the in the same 5024MO file, and can be edited by the translator in the same PO file. 5025 5026The @file{gettext.h} include file contains the lookup macros for strings 5027with contexts. They are implemented as thin macros and inline functions 5028over the functions from @code{<libintl.h>}. 5029 5030@findex pgettext 5031@example 5032const char *pgettext (const char *msgctxt, const char *msgid); 5033@end example 5034 5035In a call of this macro, @var{msgctxt} and @var{msgid} must be string 5036literals. The macro returns the translation of @var{msgid}, restricted 5037to the context given by @var{msgctxt}. 5038 5039The @var{msgctxt} string is visible in the PO file to the translator. 5040You should try to make it somehow canonical and never changing. Because 5041every time you change an @var{msgctxt}, the translator will have to review 5042the translation of @var{msgid}. 5043 5044Finding a canonical @var{msgctxt} string that doesn't change over time can 5045be hard. But you shouldn't use the file name or class name containing the 5046@code{pgettext} call -- because it is a common development task to rename 5047a file or a class, and it shouldn't cause translator work. Also you shouldn't 5048use a comment in the form of a complete English sentence as @var{msgctxt} -- 5049because orthography or grammar changes are often applied to such sentences, 5050and again, it shouldn't force the translator to do a review. 5051 5052The @samp{p} in @samp{pgettext} stands for ``particular'': @code{pgettext} 5053fetches a particular translation of the @var{msgid}. 5054 5055@findex dpgettext 5056@findex dcpgettext 5057@example 5058const char *dpgettext (const char *domain_name, 5059 const char *msgctxt, const char *msgid); 5060const char *dcpgettext (const char *domain_name, 5061 const char *msgctxt, const char *msgid, 5062 int category); 5063@end example 5064 5065These are generalizations of @code{pgettext}. They behave similarly to 5066@code{dgettext} and @code{dcgettext}, respectively. The @var{domain_name} 5067argument defines the translation domain. The @var{category} argument 5068allows to use another locale facet than @code{LC_MESSAGES}. 5069 5070As as example consider the following fictional situation. A GUI program 5071has a menu bar with the following entries: 5072 5073@smallexample 5074+------------+------------+--------------------------------------+ 5075| File | Printer | | 5076+------------+------------+--------------------------------------+ 5077| Open | | Select | 5078| New | | Open | 5079+----------+ | Connect | 5080 +----------+ 5081@end smallexample 5082 5083To have the strings @code{File}, @code{Printer}, @code{Open}, 5084@code{New}, @code{Select}, and @code{Connect} translated there has to be 5085at some point in the code a call to a function of the @code{gettext} 5086family. But in two places the string passed into the function would be 5087@code{Open}. The translations might not be the same and therefore we 5088are in the dilemma described above. 5089 5090What distinguishes the two places is the menu path from the menu root to 5091the particular menu entries: 5092 5093@smallexample 5094Menu|File 5095Menu|Printer 5096Menu|File|Open 5097Menu|File|New 5098Menu|Printer|Select 5099Menu|Printer|Open 5100Menu|Printer|Connect 5101@end smallexample 5102 5103The context is thus the menu path without its last part. So, the calls 5104look like this: 5105 5106@smallexample 5107pgettext ("Menu|", "File") 5108pgettext ("Menu|", "Printer") 5109pgettext ("Menu|File|", "Open") 5110pgettext ("Menu|File|", "New") 5111pgettext ("Menu|Printer|", "Select") 5112pgettext ("Menu|Printer|", "Open") 5113pgettext ("Menu|Printer|", "Connect") 5114@end smallexample 5115 5116Whether or not to use the @samp{|} character at the end of the context is a 5117matter of style. 5118 5119For more complex cases, where the @var{msgctxt} or @var{msgid} are not 5120string literals, more general macros are available: 5121 5122@findex pgettext_expr 5123@findex dpgettext_expr 5124@findex dcpgettext_expr 5125@example 5126const char *pgettext_expr (const char *msgctxt, const char *msgid); 5127const char *dpgettext_expr (const char *domain_name, 5128 const char *msgctxt, const char *msgid); 5129const char *dcpgettext_expr (const char *domain_name, 5130 const char *msgctxt, const char *msgid, 5131 int category); 5132@end example 5133 5134Here @var{msgctxt} and @var{msgid} can be arbitrary string-valued expressions. 5135These macros are more general. But in the case that both argument expressions 5136are string literals, the macros without the @samp{_expr} suffix are more 5137efficient. 5138 5139@node Plural forms, Optimized gettext, Contexts, gettext 5140@subsection Additional functions for plural forms 5141@cindex plural forms 5142 5143The functions of the @code{gettext} family described so far (and all the 5144@code{catgets} functions as well) have one problem in the real world 5145which have been neglected completely in all existing approaches. What 5146is meant here is the handling of plural forms. 5147 5148Looking through Unix source code before the time anybody thought about 5149internationalization (and, sadly, even afterwards) one can often find 5150code similar to the following: 5151 5152@smallexample 5153 printf ("%d file%s deleted", n, n == 1 ? "" : "s"); 5154@end smallexample 5155 5156@noindent 5157After the first complaints from people internationalizing the code people 5158either completely avoided formulations like this or used strings like 5159@code{"file(s)"}. Both look unnatural and should be avoided. First 5160tries to solve the problem correctly looked like this: 5161 5162@smallexample 5163 if (n == 1) 5164 printf ("%d file deleted", n); 5165 else 5166 printf ("%d files deleted", n); 5167@end smallexample 5168 5169But this does not solve the problem. It helps languages where the 5170plural form of a noun is not simply constructed by adding an 5171@ifhtml 5172‘s’ 5173@end ifhtml 5174@ifnothtml 5175`s' 5176@end ifnothtml 5177but that is all. Once again people fell into the trap of believing the 5178rules their language is using are universal. But the handling of plural 5179forms differs widely between the language families. For example, 5180Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports: 5181 5182@quotation 5183In Polish we use e.g.@: plik (file) this way: 5184@example 51851 plik 51862,3,4 pliki 51875-21 pliko'w 518822-24 pliki 518925-31 pliko'w 5190@end example 5191and so on (o' means 8859-2 oacute which should be rather okreska, 5192similar to aogonek). 5193@end quotation 5194 5195There are two things which can differ between languages (and even inside 5196language families); 5197 5198@itemize @bullet 5199@item 5200The form how plural forms are built differs. This is a problem with 5201languages which have many irregularities. German, for instance, is a 5202drastic case. Though English and German are part of the same language 5203family (Germanic), the almost regular forming of plural noun forms 5204(appending an 5205@ifhtml 5206‘s’) 5207@end ifhtml 5208@ifnothtml 5209`s') 5210@end ifnothtml 5211is hardly found in German. 5212 5213@item 5214The number of plural forms differ. This is somewhat surprising for 5215those who only have experiences with Romanic and Germanic languages 5216since here the number is the same (there are two). 5217 5218But other language families have only one form or many forms. More 5219information on this in an extra section. 5220@end itemize 5221 5222The consequence of this is that application writers should not try to 5223solve the problem in their code. This would be localization since it is 5224only usable for certain, hardcoded language environments. Instead the 5225extended @code{gettext} interface should be used. 5226 5227These extra functions are taking instead of the one key string two 5228strings and a numerical argument. The idea behind this is that using 5229the numerical argument and the first string as a key, the implementation 5230can select using rules specified by the translator the right plural 5231form. The two string arguments then will be used to provide a return 5232value in case no message catalog is found (similar to the normal 5233@code{gettext} behavior). In this case the rules for Germanic language 5234is used and it is assumed that the first string argument is the singular 5235form, the second the plural form. 5236 5237This has the consequence that programs without language catalogs can 5238display the correct strings only if the program itself is written using 5239a Germanic language. This is a limitation but since the GNU C library 5240(as well as the GNU @code{gettext} package) are written as part of the 5241GNU package and the coding standards for the GNU project require program 5242being written in English, this solution nevertheless fulfills its 5243purpose. 5244 5245@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) 5246The @code{ngettext} function is similar to the @code{gettext} function 5247as it finds the message catalogs in the same way. But it takes two 5248extra arguments. The @var{msgid1} parameter must contain the singular 5249form of the string to be converted. It is also used as the key for the 5250search in the catalog. The @var{msgid2} parameter is the plural form. 5251The parameter @var{n} is used to determine the plural form. If no 5252message catalog is found @var{msgid1} is returned if @code{n == 1}, 5253otherwise @code{msgid2}. 5254 5255An example for the use of this function is: 5256 5257@smallexample 5258printf (ngettext ("%d file removed", "%d files removed", n), n); 5259@end smallexample 5260 5261Please note that the numeric value @var{n} has to be passed to the 5262@code{printf} function as well. It is not sufficient to pass it only to 5263@code{ngettext}. 5264 5265In the English singular case, the number -- always 1 -- can be replaced with 5266"one": 5267 5268@smallexample 5269printf (ngettext ("One file removed", "%d files removed", n), n); 5270@end smallexample 5271 5272@noindent 5273This works because the @samp{printf} function discards excess arguments that 5274are not consumed by the format string. 5275 5276It is also possible to use this function when the strings don't contain a 5277cardinal number: 5278 5279@smallexample 5280puts (ngettext ("Delete the selected file?", 5281 "Delete the selected files?", 5282 n)); 5283@end smallexample 5284 5285In this case the number @var{n} is only used to choose the plural form. 5286@end deftypefun 5287 5288@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}) 5289The @code{dngettext} is similar to the @code{dgettext} function in the 5290way the message catalog is selected. The difference is that it takes 5291two extra parameter to provide the correct plural form. These two 5292parameters are handled in the same way @code{ngettext} handles them. 5293@end deftypefun 5294 5295@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category}) 5296The @code{dcngettext} is similar to the @code{dcgettext} function in the 5297way the message catalog is selected. The difference is that it takes 5298two extra parameter to provide the correct plural form. These two 5299parameters are handled in the same way @code{ngettext} handles them. 5300@end deftypefun 5301 5302Now, how do these functions solve the problem of the plural forms? 5303Without the input of linguists (which was not available) it was not 5304possible to determine whether there are only a few different forms in 5305which plural forms are formed or whether the number can increase with 5306every new supported language. 5307 5308Therefore the solution implemented is to allow the translator to specify 5309the rules of how to select the plural form. Since the formula varies 5310with every language this is the only viable solution except for 5311hardcoding the information in the code (which still would require the 5312possibility of extensions to not prevent the use of new languages). 5313 5314@cindex specifying plural form in a PO file 5315@kwindex nplurals@r{, in a PO file header} 5316@kwindex plural@r{, in a PO file header} 5317The information about the plural form selection has to be stored in the 5318header entry of the PO file (the one with the empty @code{msgid} string). 5319The plural form information looks like this: 5320 5321@smallexample 5322Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; 5323@end smallexample 5324 5325The @code{nplurals} value must be a decimal number which specifies how 5326many different plural forms exist for this language. The string 5327following @code{plural} is an expression which is using the C language 5328syntax. Exceptions are that no negative numbers are allowed, numbers 5329must be decimal, and the only variable allowed is @code{n}. Spaces are 5330allowed in the expression, but backslash-newlines are not; in the 5331examples below the backslash-newlines are present for formatting purposes 5332only. This expression will be evaluated whenever one of the functions 5333@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The 5334numeric value passed to these functions is then substituted for all uses 5335of the variable @code{n} in the expression. The resulting value then 5336must be greater or equal to zero and smaller than the value given as the 5337value of @code{nplurals}. 5338 5339@noindent 5340@cindex plural form formulas 5341The following rules are known at this point. The language with families 5342are listed. But this does not necessarily mean the information can be 5343generalized for the whole family (as can be easily seen in the table 5344below).@footnote{Additions are welcome. Send appropriate information to 5345@email{bug-gnu-gettext@@gnu.org} and @email{bug-glibc-manual@@gnu.org}.} 5346 5347@table @asis 5348@item Only one form: 5349Some languages only require one single form. There is no distinction 5350between the singular and plural form. An appropriate header entry 5351would look like this: 5352 5353@smallexample 5354Plural-Forms: nplurals=1; plural=0; 5355@end smallexample 5356 5357@noindent 5358Languages with this property include: 5359 5360@table @asis 5361@item Asian family 5362Japanese, Korean, Vietnamese 5363@item Turkic/Altaic family 5364Turkish 5365@end table 5366 5367@item Two forms, singular used for one only 5368This is the form used in most existing programs since it is what English 5369is using. A header entry would look like this: 5370 5371@smallexample 5372Plural-Forms: nplurals=2; plural=n != 1; 5373@end smallexample 5374 5375(Note: this uses the feature of C expressions that boolean expressions 5376have to value zero or one.) 5377 5378@noindent 5379Languages with this property include: 5380 5381@table @asis 5382@item Germanic family 5383Danish, Dutch, English, Faroese, German, Norwegian, Swedish 5384@item Finno-Ugric family 5385Estonian, Finnish 5386@item Latin/Greek family 5387Greek 5388@item Semitic family 5389Hebrew 5390@item Romanic family 5391Italian, Portuguese, Spanish 5392@item Artificial 5393Esperanto 5394@end table 5395 5396@noindent 5397Another language using the same header entry is: 5398 5399@table @asis 5400@item Finno-Ugric family 5401Hungarian 5402@end table 5403 5404Hungarian does not appear to have a plural if you look at sentences involving 5405cardinal numbers. For example, ``1 apple'' is ``1 alma'', and ``123 apples'' is 5406``123 alma''. But when the number is not explicit, the distinction between 5407singular and plural exists: ``the apple'' is ``az alma'', and ``the apples'' is 5408``az alm@'{a}k''. Since @code{ngettext} has to support both types of sentences, 5409it is classified here, under ``two forms''. 5410 5411@item Two forms, singular used for zero and one 5412Exceptional case in the language family. The header entry would be: 5413 5414@smallexample 5415Plural-Forms: nplurals=2; plural=n>1; 5416@end smallexample 5417 5418@noindent 5419Languages with this property include: 5420 5421@table @asis 5422@item Romanic family 5423French, Brazilian Portuguese 5424@end table 5425 5426@item Three forms, special case for zero 5427The header entry would be: 5428 5429@smallexample 5430Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2; 5431@end smallexample 5432 5433@noindent 5434Languages with this property include: 5435 5436@table @asis 5437@item Baltic family 5438Latvian 5439@end table 5440 5441@item Three forms, special cases for one and two 5442The header entry would be: 5443 5444@smallexample 5445Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; 5446@end smallexample 5447 5448@noindent 5449Languages with this property include: 5450 5451@table @asis 5452@item Celtic 5453Gaeilge (Irish) 5454@end table 5455 5456@item Three forms, special case for numbers ending in 00 or [2-9][0-9] 5457The header entry would be: 5458 5459@smallexample 5460Plural-Forms: nplurals=3; \ 5461 plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2; 5462@end smallexample 5463 5464@noindent 5465Languages with this property include: 5466 5467@table @asis 5468@item Romanic family 5469Romanian 5470@end table 5471 5472@item Three forms, special case for numbers ending in 1[2-9] 5473The header entry would look like this: 5474 5475@smallexample 5476Plural-Forms: nplurals=3; \ 5477 plural=n%10==1 && n%100!=11 ? 0 : \ 5478 n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; 5479@end smallexample 5480 5481@noindent 5482Languages with this property include: 5483 5484@table @asis 5485@item Baltic family 5486Lithuanian 5487@end table 5488 5489@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4] 5490The header entry would look like this: 5491 5492@smallexample 5493Plural-Forms: nplurals=3; \ 5494 plural=n%10==1 && n%100!=11 ? 0 : \ 5495 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 5496@end smallexample 5497 5498@noindent 5499Languages with this property include: 5500 5501@table @asis 5502@item Slavic family 5503Croatian, Serbian, Russian, Ukrainian 5504@end table 5505 5506@item Three forms, special cases for 1 and 2, 3, 4 5507The header entry would look like this: 5508 5509@smallexample 5510Plural-Forms: nplurals=3; \ 5511 plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; 5512@end smallexample 5513 5514@noindent 5515Languages with this property include: 5516 5517@table @asis 5518@item Slavic family 5519Slovak, Czech 5520@end table 5521 5522@item Three forms, special case for one and some numbers ending in 2, 3, or 4 5523The header entry would look like this: 5524 5525@smallexample 5526Plural-Forms: nplurals=3; \ 5527 plural=n==1 ? 0 : \ 5528 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 5529@end smallexample 5530 5531@noindent 5532Languages with this property include: 5533 5534@table @asis 5535@item Slavic family 5536Polish 5537@end table 5538 5539@item Four forms, special case for one and all numbers ending in 02, 03, or 04 5540The header entry would look like this: 5541 5542@smallexample 5543Plural-Forms: nplurals=4; \ 5544 plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; 5545@end smallexample 5546 5547@noindent 5548Languages with this property include: 5549 5550@table @asis 5551@item Slavic family 5552Slovenian 5553@end table 5554@end table 5555 5556You might now ask, @code{ngettext} handles only numbers @var{n} of type 5557@samp{unsigned long}. What about larger integer types? What about negative 5558numbers? What about floating-point numbers? 5559 5560About larger integer types, such as @samp{uintmax_t} or 5561@samp{unsigned long long}: they can be handled by reducing the value to a 5562range that fits in an @samp{unsigned long}. Simply casting the value to 5563@samp{unsigned long} would not do the right thing, since it would treat 5564@code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and 5565the like. Here you can exploit the fact that all mentioned plural form 5566formulas eventually become periodic, with a period that is a divisor of 100 5567(or 1000 or 1000000). So, when you reduce a large value to another one in 5568the range [1000000, 1999999] that ends in the same 6 decimal digits, you 5569can assume that it will lead to the same plural form selection. This code 5570does this: 5571 5572@smallexample 5573#include <inttypes.h> 5574uintmax_t nbytes = ...; 5575printf (ngettext ("The file has %"PRIuMAX" byte.", 5576 "The file has %"PRIuMAX" bytes.", 5577 (nbytes > ULONG_MAX 5578 ? (nbytes % 1000000) + 1000000 5579 : nbytes)), 5580 nbytes); 5581@end smallexample 5582 5583Negative and floating-point values usually represent physical entities for 5584which singular and plural don't clearly apply. In such cases, there is no 5585need to use @code{ngettext}; a simple @code{gettext} call with a form suitable 5586for all values will do. For example: 5587 5588@smallexample 5589printf (gettext ("Time elapsed: %.3f seconds"), 5590 num_milliseconds * 0.001); 5591@end smallexample 5592 5593@noindent 5594Even if @var{num_milliseconds} happens to be a multiple of 1000, the output 5595@smallexample 5596Time elapsed: 1.000 seconds 5597@end smallexample 5598@noindent 5599is acceptable in English, and similarly for other languages. 5600 5601@node Optimized gettext, , Plural forms, gettext 5602@subsection Optimization of the *gettext functions 5603@cindex optimization of @code{gettext} functions 5604 5605At this point of the discussion we should talk about an advantage of the 5606GNU @code{gettext} implementation. Some readers might have pointed out 5607that an internationalized program might have a poor performance if some 5608string has to be translated in an inner loop. While this is unavoidable 5609when the string varies from one run of the loop to the other it is 5610simply a waste of time when the string is always the same. Take the 5611following example: 5612 5613@example 5614@group 5615@{ 5616 while (@dots{}) 5617 @{ 5618 puts (gettext ("Hello world")); 5619 @} 5620@} 5621@end group 5622@end example 5623 5624@noindent 5625When the locale selection does not change between two runs the resulting 5626string is always the same. One way to use this is: 5627 5628@example 5629@group 5630@{ 5631 str = gettext ("Hello world"); 5632 while (@dots{}) 5633 @{ 5634 puts (str); 5635 @} 5636@} 5637@end group 5638@end example 5639 5640@noindent 5641But this solution is not usable in all situation (e.g.@: when the locale 5642selection changes) nor does it lead to legible code. 5643 5644For this reason, GNU @code{gettext} caches previous translation results. 5645When the same translation is requested twice, with no new message 5646catalogs being loaded in between, @code{gettext} will, the second time, 5647find the result through a single cache lookup. 5648 5649@node Comparison, Using libintl.a, gettext, Programmers 5650@section Comparing the Two Interfaces 5651@cindex @code{gettext} vs @code{catgets} 5652@cindex comparison of interfaces 5653 5654@c FIXME: arguments to catgets vs. gettext 5655@c Partly done 950718 -- drepper 5656 5657The following discussion is perhaps a little bit colored. As said 5658above we implemented GNU @code{gettext} following the Uniforum 5659proposal and this surely has its reasons. But it should show how we 5660came to this decision. 5661 5662First we take a look at the developing process. When we write an 5663application using NLS provided by @code{gettext} we proceed as always. 5664Only when we come to a string which might be seen by the users and thus 5665has to be translated we use @code{gettext("@dots{}")} instead of 5666@code{"@dots{}"}. At the beginning of each source file (or in a central 5667header file) we define 5668 5669@example 5670#define gettext(String) (String) 5671@end example 5672 5673Even this definition can be avoided when the system supports the 5674@code{gettext} function in its C library. When we compile this code the 5675result is the same as if no NLS code is used. When you take a look at 5676the GNU @code{gettext} code you will see that we use @code{_("@dots{}")} 5677instead of @code{gettext("@dots{}")}. This reduces the number of 5678additional characters per translatable string to @emph{3} (in words: 5679three). 5680 5681When now a production version of the program is needed we simply replace 5682the definition 5683 5684@example 5685#define _(String) (String) 5686@end example 5687 5688@noindent 5689by 5690 5691@cindex include file @file{libintl.h} 5692@example 5693#include <libintl.h> 5694#define _(String) gettext (String) 5695@end example 5696 5697@noindent 5698Additionally we run the program @file{xgettext} on all source code file 5699which contain translatable strings and that's it: we have a running 5700program which does not depend on translations to be available, but which 5701can use any that becomes available. 5702 5703@cindex @code{N_}, a convenience macro 5704The same procedure can be done for the @code{gettext_noop} invocations 5705(@pxref{Special cases}). One usually defines @code{gettext_noop} as a 5706no-op macro. So you should consider the following code for your project: 5707 5708@example 5709#define gettext_noop(String) String 5710#define N_(String) gettext_noop (String) 5711@end example 5712 5713@code{N_} is a short form similar to @code{_}. The @file{Makefile} in 5714the @file{po/} directory of GNU @code{gettext} knows by default both of the 5715mentioned short forms so you are invited to follow this proposal for 5716your own ease. 5717 5718Now to @code{catgets}. The main problem is the work for the 5719programmer. Every time he comes to a translatable string he has to 5720define a number (or a symbolic constant) which has also be defined in 5721the message catalog file. He also has to take care for duplicate 5722entries, duplicate message IDs etc. If he wants to have the same 5723quality in the message catalog as the GNU @code{gettext} program 5724provides he also has to put the descriptive comments for the strings and 5725the location in all source code files in the message catalog. This is 5726nearly a Mission: Impossible. 5727 5728But there are also some points people might call advantages speaking for 5729@code{catgets}. If you have a single word in a string and this string 5730is used in different contexts it is likely that in one or the other 5731language the word has different translations. Example: 5732 5733@example 5734printf ("%s: %d", gettext ("number"), number_of_errors) 5735 5736printf ("you should see %d %s", number_count, 5737 number_count == 1 ? gettext ("number") : gettext ("numbers")) 5738@end example 5739 5740Here we have to translate two times the string @code{"number"}. Even 5741if you do not speak a language beside English it might be possible to 5742recognize that the two words have a different meaning. In German the 5743first appearance has to be translated to @code{"Anzahl"} and the second 5744to @code{"Zahl"}. 5745 5746Now you can say that this example is really esoteric. And you are 5747right! This is exactly how we felt about this problem and decide that 5748it does not weight that much. The solution for the above problem could 5749be very easy: 5750 5751@example 5752printf ("%s %d", gettext ("number:"), number_of_errors) 5753 5754printf (number_count == 1 ? gettext ("you should see %d number") 5755 : gettext ("you should see %d numbers"), 5756 number_count) 5757@end example 5758 5759We believe that we can solve all conflicts with this method. If it is 5760difficult one can also consider changing one of the conflicting string a 5761little bit. But it is not impossible to overcome. 5762 5763@code{catgets} allows same original entry to have different translations, 5764but @code{gettext} has another, scalable approach for solving ambiguities 5765of this kind: @xref{Ambiguities}. 5766 5767@node Using libintl.a, gettext grok, Comparison, Programmers 5768@section Using libintl.a in own programs 5769 5770Starting with version 0.9.4 the library @code{libintl.h} should be 5771self-contained. I.e., you can use it in your own programs without 5772providing additional functions. The @file{Makefile} will put the header 5773and the library in directories selected using the @code{$(prefix)}. 5774 5775@node gettext grok, Temp Programmers, Using libintl.a, Programmers 5776@section Being a @code{gettext} grok 5777 5778@strong{ NOTE: } This documentation section is outdated and needs to be 5779revised. 5780 5781To fully exploit the functionality of the GNU @code{gettext} library it 5782is surely helpful to read the source code. But for those who don't want 5783to spend that much time in reading the (sometimes complicated) code here 5784is a list comments: 5785 5786@itemize @bullet 5787@item Changing the language at runtime 5788@cindex language selection at runtime 5789 5790For interactive programs it might be useful to offer a selection of the 5791used language at runtime. To understand how to do this one need to know 5792how the used language is determined while executing the @code{gettext} 5793function. The method which is presented here only works correctly 5794with the GNU implementation of the @code{gettext} functions. 5795 5796In the function @code{dcgettext} at every call the current setting of 5797the highest priority environment variable is determined and used. 5798Highest priority means here the following list with decreasing 5799priority: 5800 5801@enumerate 5802@vindex LANGUAGE@r{, environment variable} 5803@item @code{LANGUAGE} 5804@vindex LC_ALL@r{, environment variable} 5805@item @code{LC_ALL} 5806@vindex LC_CTYPE@r{, environment variable} 5807@vindex LC_NUMERIC@r{, environment variable} 5808@vindex LC_TIME@r{, environment variable} 5809@vindex LC_COLLATE@r{, environment variable} 5810@vindex LC_MONETARY@r{, environment variable} 5811@vindex LC_MESSAGES@r{, environment variable} 5812@item @code{LC_xxx}, according to selected locale 5813@vindex LANG@r{, environment variable} 5814@item @code{LANG} 5815@end enumerate 5816 5817Afterwards the path is constructed using the found value and the 5818translation file is loaded if available. 5819 5820What happens now when the value for, say, @code{LANGUAGE} changes? According 5821to the process explained above the new value of this variable is found 5822as soon as the @code{dcgettext} function is called. But this also means 5823the (perhaps) different message catalog file is loaded. In other 5824words: the used language is changed. 5825 5826But there is one little hook. The code for gcc-2.7.0 and up provides 5827some optimization. This optimization normally prevents the calling of 5828the @code{dcgettext} function as long as no new catalog is loaded. But 5829if @code{dcgettext} is not called the program also cannot find the 5830@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}). A 5831solution for this is very easy. Include the following code in the 5832language switching function. 5833 5834@example 5835 /* Change language. */ 5836 setenv ("LANGUAGE", "fr", 1); 5837 5838 /* Make change known. */ 5839 @{ 5840 extern int _nl_msg_cat_cntr; 5841 ++_nl_msg_cat_cntr; 5842 @} 5843@end example 5844 5845@cindex @code{_nl_msg_cat_cntr} 5846The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}. 5847You don't need to know what this is for. But it can be used to detect 5848whether a @code{gettext} implementation is GNU gettext and not non-GNU 5849system's native gettext implementation. 5850 5851@end itemize 5852 5853@node Temp Programmers, , gettext grok, Programmers 5854@section Temporary Notes for the Programmers Chapter 5855 5856@strong{ NOTE: } This documentation section is outdated and needs to be 5857revised. 5858 5859@menu 5860* Temp Implementations:: Temporary - Two Possible Implementations 5861* Temp catgets:: Temporary - About @code{catgets} 5862* Temp WSI:: Temporary - Why a single implementation 5863* Temp Notes:: Temporary - Notes 5864@end menu 5865 5866@node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers 5867@subsection Temporary - Two Possible Implementations 5868 5869There are two competing methods for language independent messages: 5870the X/Open @code{catgets} method, and the Uniforum @code{gettext} 5871method. The @code{catgets} method indexes messages by integers; the 5872@code{gettext} method indexes them by their English translations. 5873The @code{catgets} method has been around longer and is supported 5874by more vendors. The @code{gettext} method is supported by Sun, 5875and it has been heard that the COSE multi-vendor initiative is 5876supporting it. Neither method is a POSIX standard; the POSIX.1 5877committee had a lot of disagreement in this area. 5878 5879Neither one is in the POSIX standard. There was much disagreement 5880in the POSIX.1 committee about using the @code{gettext} routines 5881vs. @code{catgets} (XPG). In the end the committee couldn't 5882agree on anything, so no messaging system was included as part 5883of the standard. I believe the informative annex of the standard 5884includes the XPG3 messaging interfaces, ``@dots{}as an example of 5885a messaging system that has been implemented@dots{}'' 5886 5887They were very careful not to say anywhere that you should use one 5888set of interfaces over the other. For more on this topic please 5889see the Programming for Internationalization FAQ. 5890 5891@node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers 5892@subsection Temporary - About @code{catgets} 5893 5894There have been a few discussions of late on the use of 5895@code{catgets} as a base. I think it important to present both 5896sides of the argument and hence am opting to play devil's advocate 5897for a little bit. 5898 5899I'll not deny the fact that @code{catgets} could have been designed 5900a lot better. It currently has quite a number of limitations and 5901these have already been pointed out. 5902 5903However there is a great deal to be said for consistency and 5904standardization. A common recurring problem when writing Unix 5905software is the myriad portability problems across Unix platforms. 5906It seems as if every Unix vendor had a look at the operating system 5907and found parts they could improve upon. Undoubtedly, these 5908modifications are probably innovative and solve real problems. 5909However, software developers have a hard time keeping up with all 5910these changes across so many platforms. 5911 5912And this has prompted the Unix vendors to begin to standardize their 5913systems. Hence the impetus for Spec1170. Every major Unix vendor 5914has committed to supporting this standard and every Unix software 5915developer waits with glee the day they can write software to this 5916standard and simply recompile (without having to use autoconf) 5917across different platforms. 5918 5919As I understand it, Spec1170 is roughly based upon version 4 of the 5920X/Open Portability Guidelines (XPG4). Because @code{catgets} and 5921friends are defined in XPG4, I'm led to believe that @code{catgets} 5922is a part of Spec1170 and hence will become a standardized component 5923of all Unix systems. 5924 5925@node Temp WSI, Temp Notes, Temp catgets, Temp Programmers 5926@subsection Temporary - Why a single implementation 5927 5928Now it seems kind of wasteful to me to have two different systems 5929installed for accessing message catalogs. If we do want to remedy 5930@code{catgets} deficiencies why don't we try to expand @code{catgets} 5931(in a compatible manner) rather than implement an entirely new system. 5932Otherwise, we'll end up with two message catalog access systems installed 5933with an operating system - one set of routines for packages using GNU 5934@code{gettext} for their internationalization, and another set of routines 5935(catgets) for all other software. Bloated? 5936 5937Supposing another catalog access system is implemented. Which do 5938we recommend? At least for Linux, we need to attract as many 5939software developers as possible. Hence we need to make it as easy 5940for them to port their software as possible. Which means supporting 5941@code{catgets}. We will be implementing the @code{libintl} code 5942within our @code{libc}, but does this mean we also have to incorporate 5943another message catalog access scheme within our @code{libc} as well? 5944And what about people who are going to be using the @code{libintl} 5945+ non-@code{catgets} routines. When they port their software to 5946other platforms, they're now going to have to include the front-end 5947(@code{libintl}) code plus the back-end code (the non-@code{catgets} 5948access routines) with their software instead of just including the 5949@code{libintl} code with their software. 5950 5951Message catalog support is however only the tip of the iceberg. 5952What about the data for the other locale categories. They also have 5953a number of deficiencies. Are we going to abandon them as well and 5954develop another duplicate set of routines (should @code{libintl} 5955expand beyond message catalog support)? 5956 5957Like many parts of Unix that can be improved upon, we're stuck with balancing 5958compatibility with the past with useful improvements and innovations for 5959the future. 5960 5961@node Temp Notes, , Temp WSI, Temp Programmers 5962@subsection Temporary - Notes 5963 5964X/Open agreed very late on the standard form so that many 5965implementations differ from the final form. Both of my system (old 5966Linux catgets and Ultrix-4) have a strange variation. 5967 5968OK. After incorporating the last changes I have to spend some time on 5969making the GNU/Linux @code{libc} @code{gettext} functions. So in future 5970Solaris is not the only system having @code{gettext}. 5971 5972@node Translators, Maintainers, Programmers, Top 5973@chapter The Translator's View 5974 5975@c FIXME: Reorganize whole chapter. 5976 5977@menu 5978* Trans Intro 0:: Introduction 0 5979* Trans Intro 1:: Introduction 1 5980* Discussions:: Discussions 5981* Organization:: Organization 5982* Information Flow:: Information Flow 5983* Prioritizing messages:: How to find which messages to translate first 5984@end menu 5985 5986@node Trans Intro 0, Trans Intro 1, Translators, Translators 5987@section Introduction 0 5988 5989@strong{ NOTE: } This documentation section is outdated and needs to be 5990revised. 5991 5992Free software is going international! The Translation Project is a way 5993to get maintainers, translators and users all together, so free software 5994will gradually become able to speak many native languages. 5995 5996The GNU @code{gettext} tool set contains @emph{everything} maintainers 5997need for internationalizing their packages for messages. It also 5998contains quite useful tools for helping translators at localizing 5999messages to their native language, once a package has already been 6000internationalized. 6001 6002To achieve the Translation Project, we need many interested 6003people who like their own language and write it well, and who are also 6004able to synergize with other translators speaking the same language. 6005If you'd like to volunteer to @emph{work} at translating messages, 6006please send mail to your translating team. 6007 6008Each team has its own mailing list, courtesy of Linux 6009International. You may reach your translating team at the address 6010@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639} 6011code for your language. Language codes are @emph{not} the same as 6012country codes given in @w{ISO 3166}. The following translating teams 6013exist: 6014 6015@quotation 6016Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl}, 6017Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish 6018@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it}, 6019Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish 6020@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es}, 6021Swedish @code{sv} and Turkish @code{tr}. 6022@end quotation 6023 6024@noindent 6025For example, you may reach the Chinese translating team by writing to 6026@file{zh@@li.org}. When you become a member of the translating team 6027for your own language, you may subscribe to its list. For example, 6028Swedish people can send a message to @w{@file{sv-request@@li.org}}, 6029having this message body: 6030 6031@example 6032subscribe 6033@end example 6034 6035Keep in mind that team members should be interested in @emph{working} 6036at translations, or at solving translational difficulties, rather than 6037merely lurking around. If your team does not exist yet and you want to 6038start one, please write to @w{@file{translation@@iro.umontreal.ca}}; 6039you will then reach the coordinator for all translator teams. 6040 6041A handful of GNU packages have already been adapted and provided 6042with message translations for several languages. Translation 6043teams have begun to organize, using these packages as a starting 6044point. But there are many more packages and many languages for 6045which we have no volunteer translators. If you would like to 6046volunteer to work at translating messages, please send mail to 6047@file{translation@@iro.umontreal.ca} indicating what language(s) 6048you can work on. 6049 6050@node Trans Intro 1, Discussions, Trans Intro 0, Translators 6051@section Introduction 1 6052 6053@strong{ NOTE: } This documentation section is outdated and needs to be 6054revised. 6055 6056This is now official, GNU is going international! Here is the 6057announcement submitted for the January 1995 GNU Bulletin: 6058 6059@quotation 6060A handful of GNU packages have already been adapted and provided 6061with message translations for several languages. Translation 6062teams have begun to organize, using these packages as a starting 6063point. But there are many more packages and many languages 6064for which we have no volunteer translators. If you'd like to 6065volunteer to work at translating messages, please send mail to 6066@samp{translation@@iro.umontreal.ca} indicating what language(s) 6067you can work on. 6068@end quotation 6069 6070This document should answer many questions for those who are curious about 6071the process or would like to contribute. Please at least skim over it, 6072hoping to cut down a little of the high volume of e-mail generated by this 6073collective effort towards internationalization of free software. 6074 6075Most free programming which is widely shared is done in English, and 6076currently, English is used as the main communicating language between 6077national communities collaborating to free software. This very document 6078is written in English. This will not change in the foreseeable future. 6079 6080However, there is a strong appetite from national communities for 6081having more software able to write using national language and habits, 6082and there is an on-going effort to modify free software in such a way 6083that it becomes able to do so. The experiments driven so far raised 6084an enthusiastic response from pretesters, so we believe that 6085internationalization of free software is dedicated to succeed. 6086 6087For suggestion clarifications, additions or corrections to this 6088document, please e-mail to @file{translation@@iro.umontreal.ca}. 6089 6090@node Discussions, Organization, Trans Intro 1, Translators 6091@section Discussions 6092 6093@strong{ NOTE: } This documentation section is outdated and needs to be 6094revised. 6095 6096Facing this internationalization effort, a few users expressed their 6097concerns. Some of these doubts are presented and discussed, here. 6098 6099@itemize @bullet 6100@item Smaller groups 6101 6102Some languages are not spoken by a very large number of people, so people 6103speaking them sometimes consider that there may not be all that much 6104demand such versions of free software packages. Moreover, many people 6105being @emph{into computers}, in some countries, generally seem to prefer 6106English versions of their software. 6107 6108On the other end, people might enjoy their own language a lot, and be 6109very motivated at providing to themselves the pleasure of having their 6110beloved free software speaking their mother tongue. They do themselves 6111a personal favor, and do not pay that much attention to the number of 6112people benefiting of their work. 6113 6114@item Misinterpretation 6115 6116Other users are shy to push forward their own language, seeing in this 6117some kind of misplaced propaganda. Someone thought there must be some 6118users of the language over the networks pestering other people with it. 6119 6120But any spoken language is worth localization, because there are 6121people behind the language for whom the language is important and 6122dear to their hearts. 6123 6124@item Odd translations 6125 6126The biggest problem is to find the right translations so that 6127everybody can understand the messages. Translations are usually a 6128little odd. Some people get used to English, to the extent they may 6129find translations into their own language ``rather pushy, obnoxious 6130and sometimes even hilarious.'' As a French speaking man, I have 6131the experience of those instruction manuals for goods, so poorly 6132translated in French in Korea or Taiwan@dots{} 6133 6134The fact is that we sometimes have to create a kind of national 6135computer culture, and this is not easy without the collaboration of 6136many people liking their mother tongue. This is why translations are 6137better achieved by people knowing and loving their own language, and 6138ready to work together at improving the results they obtain. 6139 6140@item Dependencies over the GPL or LGPL 6141 6142Some people wonder if using GNU @code{gettext} necessarily brings their 6143package under the protective wing of the GNU General Public License or 6144the GNU Library General Public License, when they do not want to make 6145their program free, or want other kinds of freedom. The simplest 6146answer is ``normally not''. 6147 6148The @code{gettext-runtime} part of GNU @code{gettext}, i.e.@: the 6149contents of @code{libintl}, is covered by the GNU Library General Public 6150License. The @code{gettext-tools} part of GNU @code{gettext}, i.e.@: the 6151rest of the GNU @code{gettext} package, is covered by the GNU General 6152Public License. 6153 6154The mere marking of localizable strings in a package, or conditional 6155inclusion of a few lines for initialization, is not really including 6156GPL'ed or LGPL'ed code. However, since the localization routines in 6157@code{libintl} are under the LGPL, the LGPL needs to be considered. 6158It gives the right to distribute the complete unmodified source of 6159@code{libintl} even with non-free programs. It also gives the right 6160to use @code{libintl} as a shared library, even for non-free programs. 6161But it gives the right to use @code{libintl} as a static library or 6162to incorporate @code{libintl} into another library only to free 6163software. 6164 6165@end itemize 6166 6167@node Organization, Information Flow, Discussions, Translators 6168@section Organization 6169 6170@strong{ NOTE: } This documentation section is outdated and needs to be 6171revised. 6172 6173On a larger scale, the true solution would be to organize some kind of 6174fairly precise set up in which volunteers could participate. I gave 6175some thought to this idea lately, and realize there will be some 6176touchy points. I thought of writing to Richard Stallman to launch 6177such a project, but feel it might be good to shake out the ideas 6178between ourselves first. Most probably that Linux International has 6179some experience in the field already, or would like to orchestrate 6180the volunteer work, maybe. Food for thought, in any case! 6181 6182I guess we have to setup something early, somehow, that will help 6183many possible contributors of the same language to interlock and avoid 6184work duplication, and further be put in contact for solving together 6185problems particular to their tongue (in most languages, there are many 6186difficulties peculiar to translating technical English). My Swedish 6187contributor acknowledged these difficulties, and I'm well aware of 6188them for French. 6189 6190This is surely not a technical issue, but we should manage so the 6191effort of locale contributors be maximally useful, despite the national 6192team layer interface between contributors and maintainers. 6193 6194The Translation Project needs some setup for coordinating language 6195coordinators. Localizing evolving programs will surely 6196become a permanent and continuous activity in the free software community, 6197once well started. 6198The setup should be minimally completed and tested before GNU 6199@code{gettext} becomes an official reality. The e-mail address 6200@file{translation@@iro.umontreal.ca} has been setup for receiving 6201offers from volunteers and general e-mail on these topics. This address 6202reaches the Translation Project coordinator. 6203 6204@menu 6205* Central Coordination:: Central Coordination 6206* National Teams:: National Teams 6207* Mailing Lists:: Mailing Lists 6208@end menu 6209 6210@node Central Coordination, National Teams, Organization, Organization 6211@subsection Central Coordination 6212 6213I also think GNU will need sooner than it thinks, that someone setup 6214a way to organize and coordinate these groups. Some kind of group 6215of groups. My opinion is that it would be good that GNU delegates 6216this task to a small group of collaborating volunteers, shortly. 6217Perhaps in @file{gnu.announce} a list of this national committee's 6218can be published. 6219 6220My role as coordinator would simply be to refer to Ulrich any German 6221speaking volunteer interested to localization of free software packages, and 6222maybe helping national groups to initially organize, while maintaining 6223national registries for until national groups are ready to take over. 6224In fact, the coordinator should ease volunteers to get in contact with 6225one another for creating national teams, which should then select 6226one coordinator per language, or country (regionalized language). 6227If well done, the coordination should be useful without being an 6228overwhelming task, the time to put delegations in place. 6229 6230@node National Teams, Mailing Lists, Central Coordination, Organization 6231@subsection National Teams 6232 6233I suggest we look for volunteer coordinators/editors for individual 6234languages. These people will scan contributions of translation files 6235for various programs, for their own languages, and will ensure high 6236and uniform standards of diction. 6237 6238From my current experience with other people in these days, those who 6239provide localizations are very enthusiastic about the process, and are 6240more interested in the localization process than in the program they 6241localize, and want to do many programs, not just one. This seems 6242to confirm that having a coordinator/editor for each language is a 6243good idea. 6244 6245We need to choose someone who is good at writing clear and concise 6246prose in the language in question. That is hard---we can't check 6247it ourselves. So we need to ask a few people to judge each others' 6248writing and select the one who is best. 6249 6250I announce my prerelease to a few dozen people, and you would not 6251believe all the discussions it generated already. I shudder to think 6252what will happen when this will be launched, for true, officially, 6253world wide. Who am I to arbitrate between two Czekolsovak users 6254contradicting each other, for example? 6255 6256I assume that your German is not much better than my French so that 6257I would not be able to judge about these formulations. What I would 6258suggest is that for each language there is a group for people who 6259maintain the PO files and judge about changes. I suspect there will 6260be cultural differences between how such groups of people will behave. 6261Some will have relaxed ways, reach consensus easily, and have anyone 6262of the group relate to the maintainers, while others will fight to 6263death, organize heavy administrations up to national standards, and 6264use strict channels. 6265 6266The German team is putting out a good example. Right now, they are 6267maybe half a dozen people revising translations of each other and 6268discussing the linguistic issues. I do not even have all the names. 6269Ulrich Drepper is taking care of coordinating the German team. 6270He subscribed to all my pretest lists, so I do not even have to warn 6271him specifically of incoming releases. 6272 6273I'm sure, that is a good idea to get teams for each language working 6274on translations. That will make the translations better and more 6275consistent. 6276 6277@menu 6278* Sub-Cultures:: Sub-Cultures 6279* Organizational Ideas:: Organizational Ideas 6280@end menu 6281 6282@node Sub-Cultures, Organizational Ideas, National Teams, National Teams 6283@subsubsection Sub-Cultures 6284 6285Taking French for example, there are a few sub-cultures around computers 6286which developed diverging vocabularies. Picking volunteers here and 6287there without addressing this problem in an organized way, soon in the 6288project, might produce a distasteful mix of internationalized programs, 6289and possibly trigger endless quarrels among those who really care. 6290 6291Keeping some kind of unity in the way French localization of 6292internationalized programs is achieved is a difficult (and delicate) job. 6293Knowing the latin character of French people (:-), if we take this 6294the wrong way, we could end up nowhere, or spoil a lot of energies. 6295Maybe we should begin to address this problem seriously @emph{before} 6296GNU @code{gettext} become officially published. And I suspect that this 6297means soon! 6298 6299@node Organizational Ideas, , Sub-Cultures, National Teams 6300@subsubsection Organizational Ideas 6301 6302I expect the next big changes after the official release. Please note 6303that I use the German translation of the short GPL message. We need 6304to set a few good examples before the localization goes out for true 6305in the free software community. Here are a few points to discuss: 6306 6307@itemize @bullet 6308@item 6309Each group should have one FTP server (at least one master). 6310 6311@item 6312The files on the server should reflect the latest version (of 6313course!) and it should also contain a RCS directory with the 6314corresponding archives (I don't have this now). 6315 6316@item 6317There should also be a ChangeLog file (this is more useful than the 6318RCS archive but can be generated automatically from the later by 6319Emacs). 6320 6321@item 6322A @dfn{core group} should judge about questionable changes (for now 6323this group consists solely by me but I ask some others occasionally; 6324this also seems to work). 6325 6326@end itemize 6327 6328@node Mailing Lists, , National Teams, Organization 6329@subsection Mailing Lists 6330 6331If we get any inquiries about GNU @code{gettext}, send them on to: 6332 6333@example 6334@file{translation@@iro.umontreal.ca} 6335@end example 6336 6337The @file{*-pretest} lists are quite useful to me, maybe the idea could 6338be generalized to many GNU, and non-GNU packages. But each maintainer 6339his/her way! 6340 6341Fran@,{c}ois, we have a mechanism in place here at 6342@file{gnu.ai.mit.edu} to track teams, support mailing lists for 6343them and log members. We have a slight preference that you use it. 6344If this is OK with you, I can get you clued in. 6345 6346Things are changing! A few years ago, when Daniel Fekete and I 6347asked for a mailing list for GNU localization, nested at the FSF, we 6348were politely invited to organize it anywhere else, and so did we. 6349For communicating with my pretesters, I later made a handful of 6350mailing lists located at iro.umontreal.ca and administrated by 6351@code{majordomo}. These lists have been @emph{very} dependable 6352so far@dots{} 6353 6354I suspect that the German team will organize itself a mailing list 6355located in Germany, and so forth for other countries. But before they 6356organize for true, it could surely be useful to offer mailing lists 6357located at the FSF to each national team. So yes, please explain me 6358how I should proceed to create and handle them. 6359 6360We should create temporary mailing lists, one per country, to help 6361people organize. Temporary, because once regrouped and structured, it 6362would be fair the volunteers from country bring back @emph{their} list 6363in there and manage it as they want. My feeling is that, in the long 6364run, each team should run its own list, from within their country. 6365There also should be some central list to which all teams could 6366subscribe as they see fit, as long as each team is represented in it. 6367 6368@node Information Flow, Prioritizing messages, Organization, Translators 6369@section Information Flow 6370 6371@strong{ NOTE: } This documentation section is outdated and needs to be 6372revised. 6373 6374There will surely be some discussion about this messages after the 6375packages are finally released. If people now send you some proposals 6376for better messages, how do you proceed? Jim, please note that 6377right now, as I put forward nearly a dozen of localizable programs, I 6378receive both the translations and the coordination concerns about them. 6379 6380If I put one of my things to pretest, Ulrich receives the announcement 6381and passes it on to the German team, who make last minute revisions. 6382Then he submits the translation files to me @emph{as the maintainer}. 6383For free packages I do not maintain, I would not even hear about it. 6384This scheme could be made to work for the whole Translation Project, 6385I think. For security reasons, maybe Ulrich (national coordinators, 6386in fact) should update central registry kept at the Translation Project 6387(Jim, me, or Len's recruits) once in a while. 6388 6389In December/January, I was aggressively ready to internationalize 6390all of GNU, giving myself the duty of one small GNU package per week 6391or so, taking many weeks or months for bigger packages. But it does 6392not work this way. I first did all the things I'm responsible for. 6393I've nothing against some missionary work on other maintainers, but 6394I'm also loosing a lot of energy over it---same debates over again. 6395 6396And when the first localized packages are released we'll get a lot of 6397responses about ugly translations :-). Surely, and we need to have 6398beforehand a fairly good idea about how to handle the information 6399flow between the national teams and the package maintainers. 6400 6401Please start saving somewhere a quick history of each PO file. I know 6402for sure that the file format will change, allowing for comments. 6403It would be nice that each file has a kind of log, and references for 6404those who want to submit comments or gripes, or otherwise contribute. 6405I sent a proposal for a fast and flexible format, but it is not 6406receiving acceptance yet by the GNU deciders. I'll tell you when I 6407have more information about this. 6408 6409@node Prioritizing messages, , Information Flow, Translators 6410@section Prioritizing messages: How to determine which messages to translate first 6411 6412A translator sometimes has only a limited amount of time per week to 6413spend on a package, and some packages have quite large message catalogs 6414(over 1000 messages). Therefore she wishes to translate the messages 6415first that are the most visible to the user, or that occur most frequently. 6416This section describes how to determine these "most urgent" messages. 6417It also applies to determine the "next most urgent" messages after the 6418message catalog has already been partially translated. 6419 6420In a first step, she uses the programs like a user would do. While she 6421does this, the GNU @code{gettext} library logs into a file the not yet 6422translated messages for which a translation was requested from the program. 6423 6424In a second step, she uses the PO mode to translate precisely this set 6425of messages. 6426 6427@vindex GETTEXT_LOG_UNTRANSLATED@r{, environment variable} 6428Here a more details. The GNU @code{libintl} library (but not the 6429corresponding functions in GNU @code{libc}) supports an environment variable 6430@code{GETTEXT_LOG_UNTRANSLATED}. The GNU @code{libintl} library will 6431log into this file the messages for which @code{gettext()} and related 6432functions couldn't find the translation. If the file doesn't exist, it 6433will be created as needed. On systems with GNU @code{libc} a shared library 6434@samp{preloadable_libintl.so} is provided that can be used with the ELF 6435@samp{LD_PRELOAD} mechanism. 6436 6437So, in the first step, the translator uses these commands on systems with 6438GNU @code{libc}: 6439 6440@smallexample 6441$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so 6442$ export LD_PRELOAD 6443$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused 6444$ export GETTEXT_LOG_UNTRANSLATED 6445@end smallexample 6446 6447@noindent 6448and these commands on other systems: 6449 6450@smallexample 6451$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused 6452$ export GETTEXT_LOG_UNTRANSLATED 6453@end smallexample 6454 6455Then she uses and peruses the programs. (It is a good and recommended 6456practice to use the programs for which you provide translations: it 6457gives you the needed context.) When done, she removes the environment 6458variables: 6459 6460@smallexample 6461$ unset LD_PRELOAD 6462$ unset GETTEXT_LOG_UNTRANSLATED 6463@end smallexample 6464 6465The second step starts with removing duplicates: 6466 6467@smallexample 6468$ msguniq $HOME/gettextlogused > missing.po 6469@end smallexample 6470 6471The result is a PO file, but needs some preprocessing before a PO file editor 6472can be used with it. First, it is a multi-domain PO file, containing 6473messages from many translation domains. Second, it lacks all translator 6474comments and source references. Here is how to get a list of the affected 6475translation domains: 6476 6477@smallexample 6478$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq 6479@end smallexample 6480 6481Then the translator can handle the domains one by one. For simplicity, 6482let's use environment variables to denote the language, domain and source 6483package. 6484 6485@smallexample 6486$ lang=nl # your language 6487$ domain=coreutils # the name of the domain to be handled 6488$ package=/usr/src/gnu/coreutils-4.5.4 # the package where it comes from 6489@end smallexample 6490 6491She takes the latest copy of @file{$lang.po} from the Translation Project, 6492or from the package (in most cases, @file{$package/po/$lang.po}), or 6493creates a fresh one if she's the first translator (see @ref{Creating}). 6494She then uses the following commands to mark the not urgent messages as 6495"obsolete". (This doesn't mean that these messages - translated and 6496untranslated ones - will go away. It simply means that the PO file editor 6497will ignore them in the following editing session.) 6498 6499@smallexample 6500$ msggrep --domain=$domain missing.po | grep -v '^domain' \ 6501 > $domain-missing.po 6502$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \ 6503 > $domain.$lang-urgent.po 6504@end smallexample 6505 6506The she translates @file{$domain.$lang-urgent.po} by use of a PO file editor 6507(@pxref{Editing}). 6508(FIXME: I don't know whether @code{KBabel} and @code{gtranslator} also 6509preserve obsolete messages, as they should.) 6510Finally she restores the not urgent messages (with their earlier 6511translations, for those which were already translated) through this command: 6512 6513@smallexample 6514$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \ 6515 > $domain.$lang.po 6516@end smallexample 6517 6518Then she can submit @file{$domain.$lang.po} and proceed to the next domain. 6519 6520@node Maintainers, Installers, Translators, Top 6521@chapter The Maintainer's View 6522@cindex package maintainer's view of @code{gettext} 6523 6524The maintainer of a package has many responsibilities. One of them 6525is ensuring that the package will install easily on many platforms, 6526and that the magic we described earlier (@pxref{Users}) will work 6527for installers and end users. 6528 6529Of course, there are many possible ways by which GNU @code{gettext} 6530might be integrated in a distribution, and this chapter does not cover 6531them in all generality. Instead, it details one possible approach which 6532is especially adequate for many free software distributions following GNU 6533standards, or even better, Gnits standards, because GNU @code{gettext} 6534is purposely for helping the internationalization of the whole GNU 6535project, and as many other good free packages as possible. So, the 6536maintainer's view presented here presumes that the package already has 6537a @file{configure.in} file and uses GNU Autoconf. 6538 6539Nevertheless, GNU @code{gettext} may surely be useful for free packages 6540not following GNU standards and conventions, but the maintainers of such 6541packages might have to show imagination and initiative in organizing 6542their distributions so @code{gettext} work for them in all situations. 6543There are surely many, out there. 6544 6545Even if @code{gettext} methods are now stabilizing, slight adjustments 6546might be needed between successive @code{gettext} versions, so you 6547should ideally revise this chapter in subsequent releases, looking 6548for changes. 6549 6550@menu 6551* Flat and Non-Flat:: Flat or Non-Flat Directory Structures 6552* Prerequisites:: Prerequisite Works 6553* gettextize Invocation:: Invoking the @code{gettextize} Program 6554* Adjusting Files:: Files You Must Create or Alter 6555* autoconf macros:: Autoconf macros for use in @file{configure.in} 6556* CVS Issues:: Integrating with CVS 6557* Release Management:: Creating a Distribution Tarball 6558@end menu 6559 6560@node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers 6561@section Flat or Non-Flat Directory Structures 6562 6563Some free software packages are distributed as @code{tar} files which unpack 6564in a single directory, these are said to be @dfn{flat} distributions. 6565Other free software packages have a one level hierarchy of subdirectories, using 6566for example a subdirectory named @file{doc/} for the Texinfo manual and 6567man pages, another called @file{lib/} for holding functions meant to 6568replace or complement C libraries, and a subdirectory @file{src/} for 6569holding the proper sources for the package. These other distributions 6570are said to be @dfn{non-flat}. 6571 6572We cannot say much about flat distributions. A flat 6573directory structure has the disadvantage of increasing the difficulty 6574of updating to a new version of GNU @code{gettext}. Also, if you have 6575many PO files, this could somewhat pollute your single directory. 6576Also, GNU @code{gettext}'s libintl sources consist of C sources, shell 6577scripts, @code{sed} scripts and complicated Makefile rules, which don't 6578fit well into an existing flat structure. For these reasons, we 6579recommend to use non-flat approach in this case as well. 6580 6581Maybe because GNU @code{gettext} itself has a non-flat structure, 6582we have more experience with this approach, and this is what will be 6583described in the remaining of this chapter. Some maintainers might 6584use this as an opportunity to unflatten their package structure. 6585 6586@node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers 6587@section Prerequisite Works 6588@cindex converting a package to use @code{gettext} 6589@cindex migration from earlier versions of @code{gettext} 6590@cindex upgrading to new versions of @code{gettext} 6591 6592There are some works which are required for using GNU @code{gettext} 6593in one of your package. These works have some kind of generality 6594that escape the point by point descriptions used in the remainder 6595of this chapter. So, we describe them here. 6596 6597@itemize @bullet 6598@item 6599Before attempting to use @code{gettextize} you should install some 6600other packages first. 6601Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU 6602@code{gettext} are already installed at your site, and if not, proceed 6603to do this first. If you get to install these things, beware that 6604GNU @code{m4} must be fully installed before GNU Autoconf is even 6605@emph{configured}. 6606 6607To further ease the task of a package maintainer the @code{automake} 6608package was designed and implemented. GNU @code{gettext} now uses this 6609tool and the @file{Makefile}s in the @file{intl/} and @file{po/} 6610therefore know about all the goals necessary for using @code{automake} 6611and @file{libintl} in one project. 6612 6613Those four packages are only needed by you, as a maintainer; the 6614installers of your own package and end users do not really need any of 6615GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake} 6616for successfully installing and running your package, with messages 6617properly translated. But this is not completely true if you provide 6618internationalized shell scripts within your own package: GNU 6619@code{gettext} shall then be installed at the user site if the end users 6620want to see the translation of shell script messages. 6621 6622@item 6623Your package should use Autoconf and have a @file{configure.in} or 6624@file{configure.ac} file. 6625If it does not, you have to learn how. The Autoconf documentation 6626is quite well written, it is a good idea that you print it and get 6627familiar with it. 6628 6629@item 6630Your C sources should have already been modified according to 6631instructions given earlier in this manual. @xref{Sources}. 6632 6633@item 6634Your @file{po/} directory should receive all PO files submitted to you 6635by the translator teams, each having @file{@var{ll}.po} as a name. 6636This is not usually easy to get translation 6637work done before your package gets internationalized and available! 6638Since the cycle has to start somewhere, the easiest for the maintainer 6639is to start with absolutely no PO files, and wait until various 6640translator teams get interested in your package, and submit PO files. 6641 6642@end itemize 6643 6644It is worth adding here a few words about how the maintainer should 6645ideally behave with PO files submissions. As a maintainer, your role is 6646to authenticate the origin of the submission as being the representative 6647of the appropriate translating teams of the Translation Project (forward 6648the submission to @file{translation@@iro.umontreal.ca} in case of doubt), 6649to ensure that the PO file format is not severely broken and does not 6650prevent successful installation, and for the rest, to merely put these 6651PO files in @file{po/} for distribution. 6652 6653As a maintainer, you do not have to take on your shoulders the 6654responsibility of checking if the translations are adequate or 6655complete, and should avoid diving into linguistic matters. Translation 6656teams drive themselves and are fully responsible of their linguistic 6657choices for the Translation Project. Keep in mind that translator teams are @emph{not} 6658driven by maintainers. You can help by carefully redirecting all 6659communications and reports from users about linguistic matters to the 6660appropriate translation team, or explain users how to reach or join 6661their team. The simplest might be to send them the @file{ABOUT-NLS} file. 6662 6663Maintainers should @emph{never ever} apply PO file bug reports 6664themselves, short-cutting translation teams. If some translator has 6665difficulty to get some of her points through her team, it should not be 6666an option for her to directly negotiate translations with maintainers. 6667Teams ought to settle their problems themselves, if any. If you, as 6668a maintainer, ever think there is a real problem with a team, please 6669never try to @emph{solve} a team's problem on your own. 6670 6671@node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers 6672@section Invoking the @code{gettextize} Program 6673 6674@include gettextize.texi 6675 6676@node Adjusting Files, autoconf macros, gettextize Invocation, Maintainers 6677@section Files You Must Create or Alter 6678@cindex @code{gettext} files 6679 6680Besides files which are automatically added through @code{gettextize}, 6681there are many files needing revision for properly interacting with 6682GNU @code{gettext}. If you are closely following GNU standards for 6683Makefile engineering and auto-configuration, the adaptations should 6684be easier to achieve. Here is a point by point description of the 6685changes needed in each. 6686 6687So, here comes a list of files, each one followed by a description of 6688all alterations it needs. Many examples are taken out from the GNU 6689@code{gettext} @value{VERSION} distribution itself, or from the GNU 6690@code{hello} distribution (@uref{http://www.franken.de/users/gnu/ke/hello} 6691or @uref{http://www.gnu.franken.de/ke/hello/}) You may indeed 6692refer to the source code of the GNU @code{gettext} and GNU @code{hello} 6693packages, as they are intended to be good examples for using GNU 6694gettext functionality. 6695 6696@menu 6697* po/POTFILES.in:: @file{POTFILES.in} in @file{po/} 6698* po/LINGUAS:: @file{LINGUAS} in @file{po/} 6699* po/Makevars:: @file{Makevars} in @file{po/} 6700* po/Rules-*:: Extending @file{Makefile} in @file{po/} 6701* configure.in:: @file{configure.in} at top level 6702* config.guess:: @file{config.guess}, @file{config.sub} at top level 6703* mkinstalldirs:: @file{mkinstalldirs} at top level 6704* aclocal:: @file{aclocal.m4} at top level 6705* acconfig:: @file{acconfig.h} at top level 6706* config.h.in:: @file{config.h.in} at top level 6707* Makefile:: @file{Makefile.in} at top level 6708* src/Makefile:: @file{Makefile.in} in @file{src/} 6709* lib/gettext.h:: @file{gettext.h} in @file{lib/} 6710@end menu 6711 6712@node po/POTFILES.in, po/LINGUAS, Adjusting Files, Adjusting Files 6713@subsection @file{POTFILES.in} in @file{po/} 6714@cindex @file{POTFILES.in} file 6715 6716The @file{po/} directory should receive a file named 6717@file{POTFILES.in}. This file tells which files, among all program 6718sources, have marked strings needing translation. Here is an example 6719of such a file: 6720 6721@example 6722@group 6723# List of source files containing translatable strings. 6724# Copyright (C) 1995 Free Software Foundation, Inc. 6725 6726# Common library files 6727lib/error.c 6728lib/getopt.c 6729lib/xmalloc.c 6730 6731# Package source files 6732src/gettext.c 6733src/msgfmt.c 6734src/xgettext.c 6735@end group 6736@end example 6737 6738@noindent 6739Hash-marked comments and white lines are ignored. All other lines 6740list those source files containing strings marked for translation 6741(@pxref{Mark Keywords}), in a notation relative to the top level 6742of your whole distribution, rather than the location of the 6743@file{POTFILES.in} file itself. 6744 6745When a C file is automatically generated by a tool, like @code{flex} or 6746@code{bison}, that doesn't introduce translatable strings by itself, 6747it is recommended to list in @file{po/POTFILES.in} the real source file 6748(ending in @file{.l} in the case of @code{flex}, or in @file{.y} in the 6749case of @code{bison}), not the generated C file. 6750 6751@node po/LINGUAS, po/Makevars, po/POTFILES.in, Adjusting Files 6752@subsection @file{LINGUAS} in @file{po/} 6753@cindex @file{LINGUAS} file 6754 6755The @file{po/} directory should also receive a file named 6756@file{LINGUAS}. This file contains the list of available translations. 6757It is a whitespace separated list. Hash-marked comments and white lines 6758are ignored. Here is an example file: 6759 6760@example 6761@group 6762# Set of available languages. 6763de fr 6764@end group 6765@end example 6766 6767@noindent 6768This example means that German and French PO files are available, so 6769that these languages are currently supported by your package. If you 6770want to further restrict, at installation time, the set of installed 6771languages, this should not be done by modifying the @file{LINGUAS} file, 6772but rather by using the @code{LINGUAS} environment variable 6773(@pxref{Installers}). 6774 6775It is recommended that you add the "languages" @samp{en@@quot} and 6776@samp{en@@boldquot} to the @code{LINGUAS} file. @code{en@@quot} is a 6777variant of English message catalogs (@code{en}) which uses real quotation 6778marks instead of the ugly looking asymmetric ASCII substitutes @samp{`} 6779and @samp{'}. @code{en@@boldquot} is a variant of @code{en@@quot} that 6780additionally outputs quoted pieces of text in a bold font, when used in 6781a terminal emulator which supports the VT100 escape sequences (such as 6782@code{xterm} or the Linux console, but not Emacs in @kbd{M-x shell} mode). 6783 6784These extra message catalogs @samp{en@@quot} and @samp{en@@boldquot} 6785are constructed automatically, not by translators; to support them, you 6786need the files @file{Rules-quot}, @file{quot.sed}, @file{boldquot.sed}, 6787@file{en@@quot.header}, @file{en@@boldquot.header}, @file{insert-header.sin} 6788in the @file{po/} directory. You can copy them from GNU gettext's @file{po/} 6789directory; they are also installed by running @code{gettextize}. 6790 6791@node po/Makevars, po/Rules-*, po/LINGUAS, Adjusting Files 6792@subsection @file{Makevars} in @file{po/} 6793@cindex @file{Makevars} file 6794 6795The @file{po/} directory also has a file named @file{Makevars}. It 6796contains variables that are specific to your project. @file{po/Makevars} 6797gets inserted into the @file{po/Makefile} when the latter is created. 6798The variables thus take effect when the POT file is created or updated, 6799and when the message catalogs get installed. 6800 6801The first three variables can be left unmodified if your package has a 6802single message domain and, accordingly, a single @file{po/} directory. 6803Only packages which have multiple @file{po/} directories at different 6804locations need to adjust the three first variables defined in 6805@file{Makevars}. 6806 6807@node po/Rules-*, configure.in, po/Makevars, Adjusting Files 6808@subsection Extending @file{Makefile} in @file{po/} 6809@cindex @file{Makefile.in.in} extensions 6810 6811All files called @file{Rules-*} in the @file{po/} directory get appended to 6812the @file{po/Makefile} when it is created. They present an opportunity to 6813add rules for special PO files to the Makefile, without needing to mess 6814with @file{po/Makefile.in.in}. 6815 6816@cindex quotation marks 6817@vindex LANGUAGE@r{, environment variable} 6818GNU gettext comes with a @file{Rules-quot} file, containing rules for 6819building catalogs @file{en@@quot.po} and @file{en@@boldquot.po}. The 6820effect of @file{en@@quot.po} is that people who set their @code{LANGUAGE} 6821environment variable to @samp{en@@quot} will get messages with proper 6822looking symmetric Unicode quotation marks instead of abusing the ASCII 6823grave accent and the ASCII apostrophe for indicating quotations. To 6824enable this catalog, simply add @code{en@@quot} to the @file{po/LINGUAS} 6825file. The effect of @file{en@@boldquot.po} is that people who set 6826@code{LANGUAGE} to @samp{en@@boldquot} will get not only proper quotation 6827marks, but also the quoted text will be shown in a bold font on terminals 6828and consoles. This catalog is useful only for command-line programs, not 6829GUI programs. To enable it, similarly add @code{en@@boldquot} to the 6830@file{po/LINGUAS} file. 6831 6832Similarly, you can create rules for building message catalogs for the 6833@file{sr@@latin} locale -- Serbian written with the Latin alphabet -- 6834from those for the @file{sr} locale -- Serbian written with Cyrillic 6835letters. See @ref{msgfilter Invocation}. 6836 6837@node configure.in, config.guess, po/Rules-*, Adjusting Files 6838@subsection @file{configure.in} at top level 6839 6840@file{configure.in} or @file{configure.ac} - this is the source from which 6841@code{autoconf} generates the @file{configure} script. 6842 6843@enumerate 6844@item Declare the package and version. 6845@cindex package and version declaration in @file{configure.in} 6846 6847This is done by a set of lines like these: 6848 6849@example 6850PACKAGE=gettext 6851VERSION=@value{VERSION} 6852AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE") 6853AC_DEFINE_UNQUOTED(VERSION, "$VERSION") 6854AC_SUBST(PACKAGE) 6855AC_SUBST(VERSION) 6856@end example 6857 6858@noindent 6859or, if you are using GNU @code{automake}, by a line like this: 6860 6861@example 6862AM_INIT_AUTOMAKE(gettext, @value{VERSION}) 6863@end example 6864 6865@noindent 6866Of course, you replace @samp{gettext} with the name of your package, 6867and @samp{@value{VERSION}} by its version numbers, exactly as they 6868should appear in the packaged @code{tar} file name of your distribution 6869(@file{gettext-@value{VERSION}.tar.gz}, here). 6870 6871@item Check for internationalization support. 6872 6873Here is the main @code{m4} macro for triggering internationalization 6874support. Just add this line to @file{configure.in}: 6875 6876@example 6877AM_GNU_GETTEXT 6878@end example 6879 6880@noindent 6881This call is purposely simple, even if it generates a lot of configure 6882time checking and actions. 6883 6884If you have suppressed the @file{intl/} subdirectory by calling 6885@code{gettextize} without @samp{--intl} option, this call should read 6886 6887@example 6888AM_GNU_GETTEXT([external]) 6889@end example 6890 6891@item Have output files created. 6892 6893The @code{AC_OUTPUT} directive, at the end of your @file{configure.in} 6894file, needs to be modified in two ways: 6895 6896@example 6897AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in], 6898[@var{existing additional actions}]) 6899@end example 6900 6901The modification to the first argument to @code{AC_OUTPUT} asks 6902for substitution in the @file{intl/} and @file{po/} directories. 6903Note the @samp{.in} suffix used for @file{po/} only. This is because 6904the distributed file is really @file{po/Makefile.in.in}. 6905 6906If you have suppressed the @file{intl/} subdirectory by calling 6907@code{gettextize} without @samp{--intl} option, then you don't need to 6908add @code{intl/Makefile} to the @code{AC_OUTPUT} line. 6909 6910@end enumerate 6911 6912If, after doing the recommended modifications, a command like 6913@samp{aclocal -I m4} or @samp{autoconf} or @samp{autoreconf} fails with 6914a trace similar to this: 6915 6916@smallexample 6917configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE 6918../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from... 6919m4/lock.m4:224: gl_LOCK is expanded from... 6920m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from... 6921m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from... 6922m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from... 6923configure.ac:44: the top level 6924configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE 6925@end smallexample 6926 6927@noindent 6928you need to add an explicit invocation of @samp{AC_GNU_SOURCE} in the 6929@file{configure.ac} file - after @samp{AC_PROG_CC} but before 6930@samp{AM_GNU_GETTEXT}, most likely very close to the @samp{AC_PROG_CC} 6931invocation. This is necessary because of ordering restrictions imposed 6932by GNU autoconf. 6933 6934@node config.guess, mkinstalldirs, configure.in, Adjusting Files 6935@subsection @file{config.guess}, @file{config.sub} at top level 6936 6937If you haven't suppressed the @file{intl/} subdirectory, 6938you need to add the GNU @file{config.guess} and @file{config.sub} files 6939to your distribution. They are needed because the @file{intl/} directory 6940has platform dependent support for determining the locale's character 6941encoding and therefore needs to identify the platform. 6942 6943You can obtain the newest version of @file{config.guess} and 6944@file{config.sub} from the CVS of the @samp{config} project at 6945@file{http://savannah.gnu.org/}. The commands to fetch them are 6946@smallexample 6947$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess' 6948$ wget 'http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub' 6949@end smallexample 6950@noindent 6951Less recent versions are also contained in the GNU @code{automake} and 6952GNU @code{libtool} packages. 6953 6954Normally, @file{config.guess} and @file{config.sub} are put at the 6955top level of a distribution. But it is also possible to put them in a 6956subdirectory, altogether with other configuration support files like 6957@file{install-sh}, @file{ltconfig}, @file{ltmain.sh} or @file{missing}. 6958All you need to do, other than moving the files, is to add the following line 6959to your @file{configure.in}. 6960 6961@example 6962AC_CONFIG_AUX_DIR([@var{subdir}]) 6963@end example 6964 6965@node mkinstalldirs, aclocal, config.guess, Adjusting Files 6966@subsection @file{mkinstalldirs} at top level 6967@cindex @file{mkinstalldirs} file 6968 6969With earlier versions of GNU gettext, you needed to add the GNU 6970@file{mkinstalldirs} script to your distribution. This is not needed any 6971more. You can remove it if you not also using an automake version older than 6972automake 1.9. 6973 6974@node aclocal, acconfig, mkinstalldirs, Adjusting Files 6975@subsection @file{aclocal.m4} at top level 6976@cindex @file{aclocal.m4} file 6977 6978If you do not have an @file{aclocal.m4} file in your distribution, 6979the simplest is to concatenate the files @file{codeset.m4}, 6980@file{gettext.m4}, @file{glibc2.m4}, @file{glibc21.m4}, @file{iconv.m4}, 6981@file{intdiv0.m4}, @file{intl.m4}, @file{intldir.m4}, @file{intmax.m4}, 6982@file{inttypes_h.m4}, @file{inttypes-pri.m4}, @file{lcmessage.m4}, 6983@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4}, @file{lock.m4}, 6984@file{longdouble.m4}, @file{longlong.m4}, @file{nls.m4}, @file{po.m4}, 6985@file{printf-posix.m4}, @file{progtest.m4}, @file{size_max.m4}, 6986@file{stdint_h.m4}, @file{uintmax_t.m4}, @file{ulonglong.m4}, 6987@file{visibility.m4}, @file{wchar_t.m4}, @file{wint_t.m4}, @file{xsize.m4} 6988from GNU @code{gettext}'s 6989@file{m4/} directory into a single file. If you have suppressed the 6990@file{intl/} directory, only @file{gettext.m4}, @file{iconv.m4}, 6991@file{lib-ld.m4}, @file{lib-link.m4}, @file{lib-prefix.m4}, 6992@file{nls.m4}, @file{po.m4}, @file{progtest.m4} need to be concatenated. 6993 6994If you are not using GNU @code{automake} 1.8 or newer, you will need to 6995add a file @file{mkdirp.m4} from a newer automake distribution to the 6996list of files above. 6997 6998If you already have an @file{aclocal.m4} file, then you will have 6999to merge the said macro files into your @file{aclocal.m4}. Note that if 7000you are upgrading from a previous release of GNU @code{gettext}, you 7001should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT}, 7002etc.), as they usually 7003change a little from one release of GNU @code{gettext} to the next. 7004Their contents may vary as we get more experience with strange systems 7005out there. 7006 7007If you are using GNU @code{automake} 1.5 or newer, it is enough to put 7008these macro files into a subdirectory named @file{m4/} and add the line 7009 7010@example 7011ACLOCAL_AMFLAGS = -I m4 7012@end example 7013 7014@noindent 7015to your top level @file{Makefile.am}. 7016 7017These macros check for the internationalization support functions 7018and related informations. Hopefully, once stabilized, these macros 7019might be integrated in the standard Autoconf set, because this 7020piece of @code{m4} code will be the same for all projects using GNU 7021@code{gettext}. 7022 7023@node acconfig, config.h.in, aclocal, Adjusting Files 7024@subsection @file{acconfig.h} at top level 7025@cindex @file{acconfig.h} file 7026 7027Earlier GNU @code{gettext} releases required to put definitions for 7028@code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES}, 7029@code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an 7030@file{acconfig.h} file. This is not needed any more; you can remove 7031them from your @file{acconfig.h} file unless your package uses them 7032independently from the @file{intl/} directory. 7033 7034@node config.h.in, Makefile, acconfig, Adjusting Files 7035@subsection @file{config.h.in} at top level 7036@cindex @file{config.h.in} file 7037 7038The include file template that holds the C macros to be defined by 7039@code{configure} is usually called @file{config.h.in} and may be 7040maintained either manually or automatically. 7041 7042If @code{gettextize} has created an @file{intl/} directory, this file 7043must be called @file{config.h.in} and must be at the top level. If, 7044however, you have suppressed the @file{intl/} directory by calling 7045@code{gettextize} without @samp{--intl} option, then you can choose the 7046name of this file and its location freely. 7047 7048If it is maintained automatically, by use of the @samp{autoheader} 7049program, you need to do nothing about it. This is the case in particular 7050if you are using GNU @code{automake}. 7051 7052If it is maintained manually, and if @code{gettextize} has created an 7053@file{intl/} directory, you should switch to using @samp{autoheader}. 7054The list of C macros to be added for the sake of the @file{intl/} 7055directory is just too long to be maintained manually; it also changes 7056between different versions of GNU @code{gettext}. 7057 7058If it is maintained manually, and if on the other hand you have 7059suppressed the @file{intl/} directory by calling @code{gettextize} 7060without @samp{--intl} option, then you can get away by adding the 7061following lines to @file{config.h.in}: 7062 7063@example 7064/* Define to 1 if translation of program messages to the user's 7065 native language is requested. */ 7066#undef ENABLE_NLS 7067@end example 7068 7069@node Makefile, src/Makefile, config.h.in, Adjusting Files 7070@subsection @file{Makefile.in} at top level 7071 7072Here are a few modifications you need to make to your main, top-level 7073@file{Makefile.in} file. 7074 7075@enumerate 7076@item 7077Add the following lines near the beginning of your @file{Makefile.in}, 7078so the @samp{dist:} goal will work properly (as explained further down): 7079 7080@example 7081PACKAGE = @@PACKAGE@@ 7082VERSION = @@VERSION@@ 7083@end example 7084 7085@item 7086Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets 7087distributed. 7088 7089@item 7090Wherever you process subdirectories in your @file{Makefile.in}, be sure 7091you also process the subdirectories @samp{intl} and @samp{po}. Special 7092rules in the @file{Makefiles} take care for the case where no 7093internationalization is wanted. 7094 7095If you are using Makefiles, either generated by automake, or hand-written 7096so they carefully follow the GNU coding standards, the effected goals for 7097which the new subdirectories must be handled include @samp{installdirs}, 7098@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}. 7099 7100Here is an example of a canonical order of processing. In this 7101example, we also define @code{SUBDIRS} in @code{Makefile.in} for it 7102to be further used in the @samp{dist:} goal. 7103 7104@example 7105SUBDIRS = doc intl lib src po 7106@end example 7107 7108Note that you must arrange for @samp{make} to descend into the 7109@code{intl} directory before descending into other directories containing 7110code which make use of the @code{libintl.h} header file. For this 7111reason, here we mention @code{intl} before @code{lib} and @code{src}. 7112 7113@item 7114A delicate point is the @samp{dist:} goal, as both 7115@file{intl/Makefile} and @file{po/Makefile} will later assume that the 7116proper directory has been set up from the main @file{Makefile}. Here is 7117an example at what the @samp{dist:} goal might look like: 7118 7119@example 7120distdir = $(PACKAGE)-$(VERSION) 7121dist: Makefile 7122 rm -fr $(distdir) 7123 mkdir $(distdir) 7124 chmod 777 $(distdir) 7125 for file in $(DISTFILES); do \ 7126 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ 7127 done 7128 for subdir in $(SUBDIRS); do \ 7129 mkdir $(distdir)/$$subdir || exit 1; \ 7130 chmod 777 $(distdir)/$$subdir; \ 7131 (cd $$subdir && $(MAKE) $@@) || exit 1; \ 7132 done 7133 tar chozf $(distdir).tar.gz $(distdir) 7134 rm -fr $(distdir) 7135@end example 7136 7137@end enumerate 7138 7139Note that if you are using GNU @code{automake}, @file{Makefile.in} is 7140automatically generated from @file{Makefile.am}, and all needed changes 7141to @file{Makefile.am} are already made by running @samp{gettextize}. 7142 7143@node src/Makefile, lib/gettext.h, Makefile, Adjusting Files 7144@subsection @file{Makefile.in} in @file{src/} 7145 7146Some of the modifications made in the main @file{Makefile.in} will 7147also be needed in the @file{Makefile.in} from your package sources, 7148which we assume here to be in the @file{src/} subdirectory. Here are 7149all the modifications needed in @file{src/Makefile.in}: 7150 7151@enumerate 7152@item 7153In view of the @samp{dist:} goal, you should have these lines near the 7154beginning of @file{src/Makefile.in}: 7155 7156@example 7157PACKAGE = @@PACKAGE@@ 7158VERSION = @@VERSION@@ 7159@end example 7160 7161@item 7162If not done already, you should guarantee that @code{top_srcdir} 7163gets defined. This will serve for @code{cpp} include files. Just add 7164the line: 7165 7166@example 7167top_srcdir = @@top_srcdir@@ 7168@end example 7169 7170@item 7171You might also want to define @code{subdir} as @samp{src}, later 7172allowing for almost uniform @samp{dist:} goals in all your 7173@file{Makefile.in}. At list, the @samp{dist:} goal below assume that 7174you used: 7175 7176@example 7177subdir = src 7178@end example 7179 7180@item 7181The @code{main} function of your program will normally call 7182@code{bindtextdomain} (see @pxref{Triggering}), like this: 7183 7184@example 7185bindtextdomain (@var{PACKAGE}, LOCALEDIR); 7186textdomain (@var{PACKAGE}); 7187@end example 7188 7189To make LOCALEDIR known to the program, add the following lines to 7190@file{Makefile.in}: 7191 7192@example 7193datadir = @@datadir@@ 7194localedir = $(datadir)/locale 7195DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@ 7196@end example 7197 7198Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus 7199@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}. 7200 7201@item 7202You should ensure that the final linking will use @code{@@LIBINTL@@} or 7203@code{@@LTLIBINTL@@} as a library. @code{@@LIBINTL@@} is for use without 7204@code{libtool}, @code{@@LTLIBINTL@@} is for use with @code{libtool}. An 7205easy way to achieve this is to manage that it gets into @code{LIBS}, like 7206this: 7207 7208@example 7209LIBS = @@LIBINTL@@ @@LIBS@@ 7210@end example 7211 7212In most packages internationalized with GNU @code{gettext}, one will 7213find a directory @file{lib/} in which a library containing some helper 7214functions will be build. (You need at least the few functions which the 7215GNU @code{gettext} Library itself needs.) However some of the functions 7216in the @file{lib/} also give messages to the user which of course should be 7217translated, too. Taking care of this, the support library (say 7218@file{libsupport.a}) should be placed before @code{@@LIBINTL@@} and 7219@code{@@LIBS@@} in the above example. So one has to write this: 7220 7221@example 7222LIBS = ../lib/libsupport.a @@LIBINTL@@ @@LIBS@@ 7223@end example 7224 7225@item 7226You should also ensure that directory @file{intl/} will be searched for 7227C preprocessor include files in all circumstances. So, you have to 7228manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will 7229be given to the C compiler. 7230 7231@item 7232Your @samp{dist:} goal has to conform with others. Here is a 7233reasonable definition for it: 7234 7235@example 7236distdir = ../$(PACKAGE)-$(VERSION)/$(subdir) 7237dist: Makefile $(DISTFILES) 7238 for file in $(DISTFILES); do \ 7239 ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \ 7240 done 7241@end example 7242 7243@end enumerate 7244 7245Note that if you are using GNU @code{automake}, @file{Makefile.in} is 7246automatically generated from @file{Makefile.am}, and the first three 7247changes and the last change are not necessary. The remaining needed 7248@file{Makefile.am} modifications are the following: 7249 7250@enumerate 7251@item 7252To make LOCALEDIR known to the program, add the following to 7253@file{Makefile.am}: 7254 7255@example 7256<module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\" 7257@end example 7258 7259@noindent 7260for each specific module or compilation unit, or 7261 7262@example 7263AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\" 7264@end example 7265 7266for all modules and compilation units together. Furthermore, add this 7267line to define @samp{localedir}: 7268 7269@example 7270localedir = $(datadir)/locale 7271@end example 7272 7273@item 7274To ensure that the final linking will use @code{@@LIBINTL@@} or 7275@code{@@LTLIBINTL@@} as a library, add the following to 7276@file{Makefile.am}: 7277 7278@example 7279<program>_LDADD = @@LIBINTL@@ 7280@end example 7281 7282@noindent 7283for each specific program, or 7284 7285@example 7286LDADD = @@LIBINTL@@ 7287@end example 7288 7289for all programs together. Remember that when you use @code{libtool} 7290to link a program, you need to use @@LTLIBINTL@@ instead of @@LIBINTL@@ 7291for that program. 7292 7293@item 7294If you have an @file{intl/} directory, whose contents is created by 7295@code{gettextize}, then to ensure that it will be searched for 7296C preprocessor include files in all circumstances, add something like 7297this to @file{Makefile.am}: 7298 7299@example 7300AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl 7301@end example 7302 7303@end enumerate 7304 7305@node lib/gettext.h, , src/Makefile, Adjusting Files 7306@subsection @file{gettext.h} in @file{lib/} 7307@cindex @file{gettext.h} file 7308@cindex turning off NLS support 7309@cindex disabling NLS 7310 7311Internationalization of packages, as provided by GNU @code{gettext}, is 7312optional. It can be turned off in two situations: 7313 7314@itemize @bullet 7315@item 7316When the installer has specified @samp{./configure --disable-nls}. This 7317can be useful when small binaries are more important than features, for 7318example when building utilities for boot diskettes. It can also be useful 7319in order to get some specific C compiler warnings about code quality with 7320some older versions of GCC (older than 3.0). 7321 7322@item 7323When the package does not include the @code{intl/} subdirectory, and the 7324libintl.h header (with its associated libintl library, if any) is not 7325already installed on the system, it is preferable that the package builds 7326without internationalization support, rather than to give a compilation 7327error. 7328@end itemize 7329 7330A C preprocessor macro can be used to detect these two cases. Usually, 7331when @code{libintl.h} was found and not explicitly disabled, the 7332@code{ENABLE_NLS} macro will be defined to 1 in the autoconf generated 7333configuration file (usually called @file{config.h}). In the two negative 7334situations, however, this macro will not be defined, thus it will evaluate 7335to 0 in C preprocessor expressions. 7336 7337@cindex include file @file{libintl.h} 7338@file{gettext.h} is a convenience header file for conditional use of 7339@file{<libintl.h>}, depending on the @code{ENABLE_NLS} macro. If 7340@code{ENABLE_NLS} is set, it includes @file{<libintl.h>}; otherwise it 7341defines no-op substitutes for the libintl.h functions. We recommend 7342the use of @code{"gettext.h"} over direct use of @file{<libintl.h>}, 7343so that portability to older systems is guaranteed and installers can 7344turn off internationalization if they want to. In the C code, you will 7345then write 7346 7347@example 7348#include "gettext.h" 7349@end example 7350 7351@noindent 7352instead of 7353 7354@example 7355#include <libintl.h> 7356@end example 7357 7358The location of @code{gettext.h} is usually in a directory containing 7359auxiliary include files. In many GNU packages, there is a directory 7360@file{lib/} containing helper functions; @file{gettext.h} fits there. 7361In other packages, it can go into the @file{src} directory. 7362 7363Do not install the @code{gettext.h} file in public locations. Every 7364package that needs it should contain a copy of it on its own. 7365 7366@node autoconf macros, CVS Issues, Adjusting Files, Maintainers 7367@section Autoconf macros for use in @file{configure.in} 7368@cindex autoconf macros for @code{gettext} 7369 7370GNU @code{gettext} installs macros for use in a package's 7371@file{configure.in} or @file{configure.ac}. 7372@xref{Top, , Introduction, autoconf, The Autoconf Manual}. 7373The primary macro is, of course, @code{AM_GNU_GETTEXT}. 7374 7375@menu 7376* AM_GNU_GETTEXT:: AM_GNU_GETTEXT in @file{gettext.m4} 7377* AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 7378* AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in @file{gettext.m4} 7379* AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4} 7380* AM_PO_SUBDIRS:: AM_PO_SUBDIRS in @file{po.m4} 7381* AM_ICONV:: AM_ICONV in @file{iconv.m4} 7382@end menu 7383 7384@node AM_GNU_GETTEXT, AM_GNU_GETTEXT_VERSION, autoconf macros, autoconf macros 7385@subsection AM_GNU_GETTEXT in @file{gettext.m4} 7386 7387@amindex AM_GNU_GETTEXT 7388The @code{AM_GNU_GETTEXT} macro tests for the presence of the GNU gettext 7389function family in either the C library or a separate @code{libintl} 7390library (shared or static libraries are both supported) or in the package's 7391@file{intl/} directory. It also invokes @code{AM_PO_SUBDIRS}, thus preparing 7392the @file{po/} directories of the package for building. 7393 7394@code{AM_GNU_GETTEXT} accepts up to three optional arguments. The general 7395syntax is 7396 7397@example 7398AM_GNU_GETTEXT([@var{intlsymbol}], [@var{needsymbol}], [@var{intldir}]) 7399@end example 7400 7401@c We don't document @var{intlsymbol} = @samp{use-libtool} here, because 7402@c it is of no use for packages other than GNU gettext itself. (Such packages 7403@c are not allowed to install the shared libintl. But if they use libtool, 7404@c then it is in order to install shared libraries that depend on libintl.) 7405@var{intlsymbol} can be @samp{external} or @samp{no-libtool}. The default 7406(if it is not specified or empty) is @samp{no-libtool}. @var{intlsymbol} 7407should be @samp{external} for packages with no @file{intl/} directory. 7408For packages with an @file{intl/} directory, you can either use an 7409@var{intlsymbol} equal to @samp{no-libtool}, or you can use @samp{external} 7410and override by using the macro @code{AM_GNU_GETTEXT_INTL_SUBDIR} elsewhere. 7411The two ways to specify the existence of an @file{intl/} directory are 7412equivalent. At build time, a static library 7413@code{$(top_builddir)/intl/libintl.a} will then be created. 7414 7415If @var{needsymbol} is specified and is @samp{need-ngettext}, then GNU 7416gettext implementations (in libc or libintl) without the @code{ngettext()} 7417function will be ignored. If @var{needsymbol} is specified and is 7418@samp{need-formatstring-macros}, then GNU gettext implementations that don't 7419support the ISO C 99 @file{<inttypes.h>} formatstring macros will be ignored. 7420Only one @var{needsymbol} can be specified. These requirements can also be 7421specified by using the macro @code{AM_GNU_GETTEXT_NEED} elsewhere. To specify 7422more than one requirement, just specify the strongest one among them, or 7423invoke the @code{AM_GNU_GETTEXT_NEED} macro several times. The hierarchy 7424among the various alternatives is as follows: @samp{need-formatstring-macros} 7425implies @samp{need-ngettext}. 7426 7427@var{intldir} is used to find the intl libraries. If empty, the value 7428@samp{$(top_builddir)/intl/} is used. 7429 7430The @code{AM_GNU_GETTEXT} macro determines whether GNU gettext is 7431available and should be used. If so, it sets the @code{USE_NLS} variable 7432to @samp{yes}; it defines @code{ENABLE_NLS} to 1 in the autoconf 7433generated configuration file (usually called @file{config.h}); it sets 7434the variables @code{LIBINTL} and @code{LTLIBINTL} to the linker options 7435for use in a Makefile (@code{LIBINTL} for use without libtool, 7436@code{LTLIBINTL} for use with libtool); it adds an @samp{-I} option to 7437@code{CPPFLAGS} if necessary. In the negative case, it sets 7438@code{USE_NLS} to @samp{no}; it sets @code{LIBINTL} and @code{LTLIBINTL} 7439to empty and doesn't change @code{CPPFLAGS}. 7440 7441The complexities that @code{AM_GNU_GETTEXT} deals with are the following: 7442 7443@itemize @bullet 7444@item 7445@cindex @code{libintl} library 7446Some operating systems have @code{gettext} in the C library, for example 7447glibc. Some have it in a separate library @code{libintl}. GNU @code{libintl} 7448might have been installed as part of the GNU @code{gettext} package. 7449 7450@item 7451GNU @code{libintl}, if installed, is not necessarily already in the search 7452path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for 7453the library search path). 7454 7455@item 7456Except for glibc, the operating system's native @code{gettext} cannot 7457exploit the GNU mo files, doesn't have the necessary locale dependency 7458features, and cannot convert messages from the catalog's text encoding 7459to the user's locale encoding. 7460 7461@item 7462GNU @code{libintl}, if installed, is not necessarily already in the 7463run time library search path. To avoid the need for setting an environment 7464variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate 7465run time search path options to the @code{LIBINTL} and @code{LTLIBINTL} 7466variables. This works on most systems, but not on some operating systems 7467with limited shared library support, like SCO. 7468 7469@item 7470GNU @code{libintl} relies on POSIX/XSI @code{iconv}. The macro checks for 7471linker options needed to use iconv and appends them to the @code{LIBINTL} 7472and @code{LTLIBINTL} variables. 7473@end itemize 7474 7475@node AM_GNU_GETTEXT_VERSION, AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT, autoconf macros 7476@subsection AM_GNU_GETTEXT_VERSION in @file{gettext.m4} 7477 7478@amindex AM_GNU_GETTEXT_VERSION 7479The @code{AM_GNU_GETTEXT_VERSION} macro declares the version number of 7480the GNU gettext infrastructure that is used by the package. 7481 7482The use of this macro is optional; only the @code{autopoint} program makes 7483use of it (@pxref{CVS Issues}). 7484 7485 7486@node AM_GNU_GETTEXT_NEED, AM_GNU_GETTEXT_INTL_SUBDIR, AM_GNU_GETTEXT_VERSION, autoconf macros 7487@subsection AM_GNU_GETTEXT_NEED in @file{gettext.m4} 7488 7489@amindex AM_GNU_GETTEXT_NEED 7490The @code{AM_GNU_GETTEXT_NEED} macro declares a constraint regarding the 7491GNU gettext implementation. The syntax is 7492 7493@example 7494AM_GNU_GETTEXT_NEED([@var{needsymbol}]) 7495@end example 7496 7497If @var{needsymbol} is @samp{need-ngettext}, then GNU gettext implementations 7498(in libc or libintl) without the @code{ngettext()} function will be ignored. 7499If @var{needsymbol} is @samp{need-formatstring-macros}, then GNU gettext 7500implementations that don't support the ISO C 99 @file{<inttypes.h>} 7501formatstring macros will be ignored. 7502 7503The optional second argument of @code{AM_GNU_GETTEXT} is also taken into 7504account. 7505 7506The @code{AM_GNU_GETTEXT_NEED} invocations can occur before or after 7507the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter. 7508 7509@node AM_GNU_GETTEXT_INTL_SUBDIR, AM_PO_SUBDIRS, AM_GNU_GETTEXT_NEED, autoconf macros 7510@subsection AM_GNU_GETTEXT_INTL_SUBDIR in @file{intldir.m4} 7511 7512@amindex AM_GNU_GETTEXT_INTL_SUBDIR 7513The @code{AM_GNU_GETTEXT_INTL_SUBDIR} macro specifies that the 7514@code{AM_GNU_GETTEXT} macro, although invoked with the first argument 7515@samp{external}, should also prepare for building the @file{intl/} 7516subdirectory. 7517 7518The @code{AM_GNU_GETTEXT_INTL_SUBDIR} invocation can occur before or after 7519the @code{AM_GNU_GETTEXT} invocation; the order doesn't matter. 7520 7521The use of this macro requires GNU automake 1.10 or newer and 7522GNU autoconf 2.61 or newer. 7523 7524@node AM_PO_SUBDIRS, AM_ICONV, AM_GNU_GETTEXT_INTL_SUBDIR, autoconf macros 7525@subsection AM_PO_SUBDIRS in @file{po.m4} 7526 7527@amindex AM_PO_SUBDIRS 7528The @code{AM_PO_SUBDIRS} macro prepares the @file{po/} directories of the 7529package for building. This macro should be used in internationalized 7530programs written in other programming languages than C, C++, Objective C, 7531for example @code{sh}, @code{Python}, @code{Lisp}. See @ref{Programming 7532Languages} for a list of programming languages that support localization 7533through PO files. 7534 7535The @code{AM_PO_SUBDIRS} macro determines whether internationalization 7536should be used. If so, it sets the @code{USE_NLS} variable to @samp{yes}, 7537otherwise to @samp{no}. It also determines the right values for Makefile 7538variables in each @file{po/} directory. 7539 7540@node AM_ICONV, , AM_PO_SUBDIRS, autoconf macros 7541@subsection AM_ICONV in @file{iconv.m4} 7542 7543@amindex AM_ICONV 7544The @code{AM_ICONV} macro tests for the presence of the POSIX/XSI 7545@code{iconv} function family in either the C library or a separate 7546@code{libiconv} library. If found, it sets the @code{am_cv_func_iconv} 7547variable to @samp{yes}; it defines @code{HAVE_ICONV} to 1 in the autoconf 7548generated configuration file (usually called @file{config.h}); it defines 7549@code{ICONV_CONST} to @samp{const} or to empty, depending on whether the 7550second argument of @code{iconv()} is of type @samp{const char **} or 7551@samp{char **}; it sets the variables @code{LIBICONV} and 7552@code{LTLIBICONV} to the linker options for use in a Makefile 7553(@code{LIBICONV} for use without libtool, @code{LTLIBICONV} for use with 7554libtool); it adds an @samp{-I} option to @code{CPPFLAGS} if 7555necessary. If not found, it sets @code{LIBICONV} and @code{LTLIBICONV} to 7556empty and doesn't change @code{CPPFLAGS}. 7557 7558The complexities that @code{AM_ICONV} deals with are the following: 7559 7560@itemize @bullet 7561@item 7562@cindex @code{libiconv} library 7563Some operating systems have @code{iconv} in the C library, for example 7564glibc. Some have it in a separate library @code{libiconv}, for example 7565OSF/1 or FreeBSD. Regardless of the operating system, GNU @code{libiconv} 7566might have been installed. In that case, it should be used instead of the 7567operating system's native @code{iconv}. 7568 7569@item 7570GNU @code{libiconv}, if installed, is not necessarily already in the search 7571path (@code{CPPFLAGS} for the include file search path, @code{LDFLAGS} for 7572the library search path). 7573 7574@item 7575GNU @code{libiconv} is binary incompatible with some operating system's 7576native @code{iconv}, for example on FreeBSD. Use of an @file{iconv.h} 7577and @file{libiconv.so} that don't fit together would produce program 7578crashes. 7579 7580@item 7581GNU @code{libiconv}, if installed, is not necessarily already in the 7582run time library search path. To avoid the need for setting an environment 7583variable like @code{LD_LIBRARY_PATH}, the macro adds the appropriate 7584run time search path options to the @code{LIBICONV} variable. This works 7585on most systems, but not on some operating systems with limited shared 7586library support, like SCO. 7587@end itemize 7588 7589@file{iconv.m4} is distributed with the GNU gettext package because 7590@file{gettext.m4} relies on it. 7591 7592@node CVS Issues, Release Management, autoconf macros, Maintainers 7593@section Integrating with CVS 7594 7595Many projects use CVS for distributed development, version control and 7596source backup. This section gives some advice how to manage the uses 7597of @code{cvs}, @code{gettextize}, @code{autopoint} and @code{autoconf}. 7598 7599@menu 7600* Distributed CVS:: Avoiding version mismatch in distributed development 7601* Files under CVS:: Files to put under CVS version control 7602* autopoint Invocation:: Invoking the @code{autopoint} Program 7603@end menu 7604 7605@node Distributed CVS, Files under CVS, CVS Issues, CVS Issues 7606@subsection Avoiding version mismatch in distributed development 7607 7608In a project development with multiple developers, using CVS, there 7609should be a single developer who occasionally - when there is desire to 7610upgrade to a new @code{gettext} version - runs @code{gettextize} and 7611performs the changes listed in @ref{Adjusting Files}, and then commits 7612his changes to the CVS. 7613 7614It is highly recommended that all developers on a project use the same 7615version of GNU @code{gettext} in the package. In other words, if a 7616developer runs @code{gettextize}, he should go the whole way, make the 7617necessary remaining changes and commit his changes to the CVS. 7618Otherwise the following damages will likely occur: 7619 7620@itemize @bullet 7621@item 7622Apparent version mismatch between developers. Since some @code{gettext} 7623specific portions in @file{configure.in}, @file{configure.ac} and 7624@code{Makefile.am}, @code{Makefile.in} files depend on the @code{gettext} 7625version, the use of infrastructure files belonging to different 7626@code{gettext} versions can easily lead to build errors. 7627 7628@item 7629Hidden version mismatch. Such version mismatch can also lead to 7630malfunctioning of the package, that may be undiscovered by the developers. 7631The worst case of hidden version mismatch is that internationalization 7632of the package doesn't work at all. 7633 7634@item 7635Release risks. All developers implicitly perform constant testing on 7636a package. This is important in the days and weeks before a release. 7637If the guy who makes the release tar files uses a different version 7638of GNU @code{gettext} than the other developers, the distribution will 7639be less well tested than if all had been using the same @code{gettext} 7640version. For example, it is possible that a platform specific bug goes 7641undiscovered due to this constellation. 7642@end itemize 7643 7644@node Files under CVS, autopoint Invocation, Distributed CVS, CVS Issues 7645@subsection Files to put under CVS version control 7646 7647There are basically three ways to deal with generated files in the 7648context of a CVS repository, such as @file{configure} generated from 7649@file{configure.in}, @code{@var{parser}.c} generated from 7650@code{@var{parser}.y}, or @code{po/Makefile.in.in} autoinstalled by 7651@code{gettextize} or @code{autopoint}. 7652 7653@enumerate 7654@item 7655All generated files are always committed into the repository. 7656 7657@item 7658All generated files are committed into the repository occasionally, 7659for example each time a release is made. 7660 7661@item 7662Generated files are never committed into the repository. 7663@end enumerate 7664 7665Each of these three approaches has different advantages and drawbacks. 7666 7667@enumerate 7668@item 7669The advantage is that anyone can check out the CVS at any moment and 7670gets a working build. The drawbacks are: 1a. It requires some frequent 7671"cvs commit" actions by the maintainers. 1b. The repository grows in size 7672quite fast. 7673 7674@item 7675The advantage is that anyone can check out the CVS, and the usual 7676"./configure; make" will work. The drawbacks are: 2a. The one who 7677checks out the repository needs tools like GNU @code{automake}, 7678GNU @code{autoconf}, GNU @code{m4} installed in his PATH; sometimes 7679he even needs particular versions of them. 2b. When a release is made 7680and a commit is made on the generated files, the other developers get 7681conflicts on the generated files after doing "cvs update". Although 7682these conflicts are easy to resolve, they are annoying. 7683 7684@item 7685The advantage is less work for the maintainers. The drawback is that 7686anyone who checks out the CVS not only needs tools like GNU @code{automake}, 7687GNU @code{autoconf}, GNU @code{m4} installed in his PATH, but also that 7688he needs to perform a package specific pre-build step before being able 7689to "./configure; make". 7690@end enumerate 7691 7692For the first and second approach, all files modified or brought in 7693by the occasional @code{gettextize} invocation and update should be 7694committed into the CVS. 7695 7696For the third approach, the maintainer can omit from the CVS repository 7697all the files that @code{gettextize} mentions as "copy". Instead, he 7698adds to the @file{configure.in} or @file{configure.ac} a line of the 7699form 7700 7701@example 7702AM_GNU_GETTEXT_VERSION(@value{VERSION}) 7703@end example 7704 7705@noindent 7706and adds to the package's pre-build script an invocation of 7707@samp{autopoint}. For everyone who checks out the CVS, this 7708@code{autopoint} invocation will copy into the right place the 7709@code{gettext} infrastructure files that have been omitted from the CVS. 7710 7711The version number used as argument to @code{AM_GNU_GETTEXT_VERSION} is 7712the version of the @code{gettext} infrastructure that the package wants 7713to use. It is also the minimum version number of the @samp{autopoint} 7714program. So, if you write @code{AM_GNU_GETTEXT_VERSION(0.11.5)} then the 7715developers can have any version >= 0.11.5 installed; the package will work 7716with the 0.11.5 infrastructure in all developers' builds. When the 7717maintainer then runs gettextize from, say, version 0.12.1 on the package, 7718the occurrence of @code{AM_GNU_GETTEXT_VERSION(0.11.5)} will be changed 7719into @code{AM_GNU_GETTEXT_VERSION(0.12.1)}, and all other developers that 7720use the CVS will henceforth need to have GNU @code{gettext} 0.12.1 or newer 7721installed. 7722 7723@node autopoint Invocation, , Files under CVS, CVS Issues 7724@subsection Invoking the @code{autopoint} Program 7725 7726@include autopoint.texi 7727 7728@node Release Management, , CVS Issues, Maintainers 7729@section Creating a Distribution Tarball 7730 7731@cindex release 7732@cindex distribution tarball 7733In projects that use GNU @code{automake}, the usual commands for creating 7734a distribution tarball, @samp{make dist} or @samp{make distcheck}, 7735automatically update the PO files as needed. 7736 7737If GNU @code{automake} is not used, the maintainer needs to perform this 7738update before making a release: 7739 7740@example 7741$ ./configure 7742$ (cd po; make update-po) 7743$ make distclean 7744@end example 7745 7746@node Installers, Programming Languages, Maintainers, Top 7747@chapter The Installer's and Distributor's View 7748@cindex package installer's view of @code{gettext} 7749@cindex package distributor's view of @code{gettext} 7750@cindex package build and installation options 7751@cindex setting up @code{gettext} at build time 7752 7753By default, packages fully using GNU @code{gettext}, internally, 7754are installed in such a way that they to allow translation of 7755messages. At @emph{configuration} time, those packages should 7756automatically detect whether the underlying host system already provides 7757the GNU @code{gettext} functions. If not, 7758the GNU @code{gettext} library should be automatically prepared 7759and used. Installers may use special options at configuration 7760time for changing this behavior. The command @samp{./configure 7761--with-included-gettext} bypasses system @code{gettext} to 7762use the included GNU @code{gettext} instead, 7763while @samp{./configure --disable-nls} 7764produces programs totally unable to translate messages. 7765 7766@vindex LINGUAS@r{, environment variable} 7767Internationalized packages have usually many @file{@var{ll}.po} 7768files. Unless 7769translations are disabled, all those available are installed together 7770with the package. However, the environment variable @code{LINGUAS} 7771may be set, prior to configuration, to limit the installed set. 7772@code{LINGUAS} should then contain a space separated list of two-letter 7773codes, stating which languages are allowed. 7774 7775@node Programming Languages, Conclusion, Installers, Top 7776@chapter Other Programming Languages 7777 7778While the presentation of @code{gettext} focuses mostly on C and 7779implicitly applies to C++ as well, its scope is far broader than that: 7780Many programming languages, scripting languages and other textual data 7781like GUI resources or package descriptions can make use of the gettext 7782approach. 7783 7784@menu 7785* Language Implementors:: The Language Implementor's View 7786* Programmers for other Languages:: The Programmer's View 7787* Translators for other Languages:: The Translator's View 7788* Maintainers for other Languages:: The Maintainer's View 7789* List of Programming Languages:: Individual Programming Languages 7790* List of Data Formats:: Internationalizable Data 7791@end menu 7792 7793@node Language Implementors, Programmers for other Languages, Programming Languages, Programming Languages 7794@section The Language Implementor's View 7795@cindex programming languages 7796@cindex scripting languages 7797 7798All programming and scripting languages that have the notion of strings 7799are eligible to supporting @code{gettext}. Supporting @code{gettext} 7800means the following: 7801 7802@enumerate 7803@item 7804You should add to the language a syntax for translatable strings. In 7805principle, a function call of @code{gettext} would do, but a shorthand 7806syntax helps keeping the legibility of internationalized programs. For 7807example, in C we use the syntax @code{_("string")}, and in GNU awk we use 7808the shorthand @code{_"string"}. 7809 7810@item 7811You should arrange that evaluation of such a translatable string at 7812runtime calls the @code{gettext} function, or performs equivalent 7813processing. 7814 7815@item 7816Similarly, you should make the functions @code{ngettext}, 7817@code{dcgettext}, @code{dcngettext} available from within the language. 7818These functions are less often used, but are nevertheless necessary for 7819particular purposes: @code{ngettext} for correct plural handling, and 7820@code{dcgettext} and @code{dcngettext} for obeying other locale 7821environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or 7822@code{LC_MONETARY}. For these latter functions, you need to make the 7823@code{LC_*} constants, available in the C header @code{<locale.h>}, 7824referenceable from within the language, usually either as enumeration 7825values or as strings. 7826 7827@item 7828You should allow the programmer to designate a message domain, either by 7829making the @code{textdomain} function available from within the 7830language, or by introducing a magic variable called @code{TEXTDOMAIN}. 7831Similarly, you should allow the programmer to designate where to search 7832for message catalogs, by providing access to the @code{bindtextdomain} 7833function. 7834 7835@item 7836You should either perform a @code{setlocale (LC_ALL, "")} call during 7837the startup of your language runtime, or allow the programmer to do so. 7838Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and 7839@code{LC_CTYPE} locale facets are not both set. 7840 7841@item 7842A programmer should have a way to extract translatable strings from a 7843program into a PO file. The GNU @code{xgettext} program is being 7844extended to support very different programming languages. Please 7845contact the GNU @code{gettext} maintainers to help them doing this. If 7846the string extractor is best integrated into your language's parser, GNU 7847@code{xgettext} can function as a front end to your string extractor. 7848 7849@item 7850The language's library should have a string formatting facility where 7851the arguments of a format string are denoted by a positional number or a 7852name. This is needed because for some languages and some messages with 7853more than one substitutable argument, the translation will need to 7854output the substituted arguments in different order. @xref{c-format Flag}. 7855 7856@item 7857If the language has more than one implementation, and not all of the 7858implementations use @code{gettext}, but the programs should be portable 7859across implementations, you should provide a no-i18n emulation, that 7860makes the other implementations accept programs written for yours, 7861without actually translating the strings. 7862 7863@item 7864To help the programmer in the task of marking translatable strings, 7865which is sometimes performed using the Emacs PO mode (@pxref{Marking}), 7866you are welcome to 7867contact the GNU @code{gettext} maintainers, so they can add support for 7868your language to @file{po-mode.el}. 7869@end enumerate 7870 7871On the implementation side, three approaches are possible, with 7872different effects on portability and copyright: 7873 7874@itemize @bullet 7875@item 7876You may integrate the GNU @code{gettext}'s @file{intl/} directory in 7877your package, as described in @ref{Maintainers}. This allows you to 7878have internationalization on all kinds of platforms. Note that when you 7879then distribute your package, it legally falls under the GNU General 7880Public License, and the GNU project will be glad about your contribution 7881to the Free Software pool. 7882 7883@item 7884You may link against GNU @code{gettext} functions if they are found in 7885the C library. For example, an autoconf test for @code{gettext()} and 7886@code{ngettext()} will detect this situation. For the moment, this test 7887will succeed on GNU systems and not on other platforms. No severe 7888copyright restrictions apply. 7889 7890@item 7891You may emulate or reimplement the GNU @code{gettext} functionality. 7892This has the advantage of full portability and no copyright 7893restrictions, but also the drawback that you have to reimplement the GNU 7894@code{gettext} features (such as the @code{LANGUAGE} environment 7895variable, the locale aliases database, the automatic charset conversion, 7896and plural handling). 7897@end itemize 7898 7899@node Programmers for other Languages, Translators for other Languages, Language Implementors, Programming Languages 7900@section The Programmer's View 7901 7902For the programmer, the general procedure is the same as for the C 7903language. The Emacs PO mode marking supports other languages, and the GNU 7904@code{xgettext} string extractor recognizes other languages based on the 7905file extension or a command-line option. In some languages, 7906@code{setlocale} is not needed because it is already performed by the 7907underlying language runtime. 7908 7909@node Translators for other Languages, Maintainers for other Languages, Programmers for other Languages, Programming Languages 7910@section The Translator's View 7911 7912The translator works exactly as in the C language case. The only 7913difference is that when translating format strings, she has to be aware 7914of the language's particular syntax for positional arguments in format 7915strings. 7916 7917@menu 7918* c-format:: C Format Strings 7919* objc-format:: Objective C Format Strings 7920* sh-format:: Shell Format Strings 7921* python-format:: Python Format Strings 7922* lisp-format:: Lisp Format Strings 7923* elisp-format:: Emacs Lisp Format Strings 7924* librep-format:: librep Format Strings 7925* scheme-format:: Scheme Format Strings 7926* smalltalk-format:: Smalltalk Format Strings 7927* java-format:: Java Format Strings 7928* csharp-format:: C# Format Strings 7929* awk-format:: awk Format Strings 7930* object-pascal-format:: Object Pascal Format Strings 7931* ycp-format:: YCP Format Strings 7932* tcl-format:: Tcl Format Strings 7933* perl-format:: Perl Format Strings 7934* php-format:: PHP Format Strings 7935* gcc-internal-format:: GCC internal Format Strings 7936* qt-format:: Qt Format Strings 7937* boost-format:: Boost Format Strings 7938@end menu 7939 7940@node c-format, objc-format, Translators for other Languages, Translators for other Languages 7941@subsection C Format Strings 7942 7943C format strings are described in POSIX (IEEE P1003.1 2001), section 7944XSH 3 fprintf(), 7945@uref{http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html}. 7946See also the fprintf() manual page, 7947@uref{http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php}, 7948@uref{http://informatik.fh-wuerzburg.de/student/i510/man/printf.html}. 7949 7950Although format strings with positions that reorder arguments, such as 7951 7952@example 7953"Only %2$d bytes free on '%1$s'." 7954@end example 7955 7956@noindent 7957which is semantically equivalent to 7958 7959@example 7960"'%s' has only %d bytes free." 7961@end example 7962 7963@noindent 7964are a POSIX/XSI feature and not specified by ISO C 99, translators can rely 7965on this reordering ability: On the few platforms where @code{printf()}, 7966@code{fprintf()} etc. don't support this feature natively, @file{libintl.a} 7967or @file{libintl.so} provides replacement functions, and GNU @code{<libintl.h>} 7968activates these replacement functions automatically. 7969 7970@cindex outdigits 7971@cindex Arabic digits 7972As a special feature for Farsi (Persian) and maybe Arabic, translators can 7973insert an @samp{I} flag into numeric format directives. For example, the 7974translation of @code{"%d"} can be @code{"%Id"}. The effect of this flag, 7975on systems with GNU @code{libc}, is that in the output, the ASCII digits are 7976replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale 7977facet. On other systems, the @code{gettext} function removes this flag, 7978so that it has no effect. 7979 7980Note that the programmer should @emph{not} put this flag into the 7981untranslated string. (Putting the @samp{I} format directive flag into an 7982@var{msgid} string would lead to undefined behaviour on platforms without 7983glibc when NLS is disabled.) 7984 7985@node objc-format, sh-format, c-format, Translators for other Languages 7986@subsection Objective C Format Strings 7987 7988Objective C format strings are like C format strings. They support an 7989additional format directive: "$@@", which when executed consumes an argument 7990of type @code{Object *}. 7991 7992@node sh-format, python-format, objc-format, Translators for other Languages 7993@subsection Shell Format Strings 7994 7995Shell format strings, as supported by GNU gettext and the @samp{envsubst} 7996program, are strings with references to shell variables in the form 7997@code{$@var{variable}} or @code{$@{@var{variable}@}}. References of the form 7998@code{$@{@var{variable}-@var{default}@}}, 7999@code{$@{@var{variable}:-@var{default}@}}, 8000@code{$@{@var{variable}=@var{default}@}}, 8001@code{$@{@var{variable}:=@var{default}@}}, 8002@code{$@{@var{variable}+@var{replacement}@}}, 8003@code{$@{@var{variable}:+@var{replacement}@}}, 8004@code{$@{@var{variable}?@var{ignored}@}}, 8005@code{$@{@var{variable}:?@var{ignored}@}}, 8006that would be valid inside shell scripts, are not supported. The 8007@var{variable} names must consist solely of alphanumeric or underscore 8008ASCII characters, not start with a digit and be nonempty; otherwise such 8009a variable reference is ignored. 8010 8011@node python-format, lisp-format, sh-format, Translators for other Languages 8012@subsection Python Format Strings 8013 8014Python format strings are described in 8015@w{Python Library reference} / 8016@w{2. Built-in Types, Exceptions and Functions} / 8017@w{2.2. Built-in Types} / 8018@w{2.2.6. Sequence Types} / 8019@w{2.2.6.2. String Formatting Operations}. 8020@uref{http://www.python.org/doc/2.2.1/lib/typesseq-strings.html}. 8021 8022@node lisp-format, elisp-format, python-format, Translators for other Languages 8023@subsection Lisp Format Strings 8024 8025Lisp format strings are described in the Common Lisp HyperSpec, 8026chapter 22.3 @w{Formatted Output}, 8027@uref{http://www.lisp.org/HyperSpec/Body/sec_22-3.html}. 8028 8029@node elisp-format, librep-format, lisp-format, Translators for other Languages 8030@subsection Emacs Lisp Format Strings 8031 8032Emacs Lisp format strings are documented in the Emacs Lisp reference, 8033section @w{Formatting Strings}, 8034@uref{http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75}. 8035Note that as of version 21, XEmacs supports numbered argument specifications 8036in format strings while FSF Emacs doesn't. 8037 8038@node librep-format, scheme-format, elisp-format, Translators for other Languages 8039@subsection librep Format Strings 8040 8041librep format strings are documented in the librep manual, section 8042@w{Formatted Output}, 8043@url{http://librep.sourceforge.net/librep-manual.html#Formatted%20Output}, 8044@url{http://www.gwinnup.org/research/docs/librep.html#SEC122}. 8045 8046@node scheme-format, smalltalk-format, librep-format, Translators for other Languages 8047@subsection Scheme Format Strings 8048 8049Scheme format strings are documented in the SLIB manual, section 8050@w{Format Specification}. 8051 8052@node smalltalk-format, java-format, scheme-format, Translators for other Languages 8053@subsection Smalltalk Format Strings 8054 8055Smalltalk format strings are described in the GNU Smalltalk documentation, 8056class @code{CharArray}, methods @samp{bindWith:} and 8057@samp{bindWithArguments:}. 8058@uref{http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238}. 8059In summary, a directive starts with @samp{%} and is followed by @samp{%} 8060or a nonzero digit (@samp{1} to @samp{9}). 8061 8062@node java-format, csharp-format, smalltalk-format, Translators for other Languages 8063@subsection Java Format Strings 8064 8065Java format strings are described in the JDK documentation for class 8066@code{java.text.MessageFormat}, 8067@uref{http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html}. 8068See also the ICU documentation 8069@uref{http://oss.software.ibm.com/icu/apiref/classMessageFormat.html}. 8070 8071@node csharp-format, awk-format, java-format, Translators for other Languages 8072@subsection C# Format Strings 8073 8074C# format strings are described in the .NET documentation for class 8075@code{System.String} and in 8076@uref{http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp}. 8077 8078@node awk-format, object-pascal-format, csharp-format, Translators for other Languages 8079@subsection awk Format Strings 8080 8081awk format strings are described in the gawk documentation, section 8082@w{Printf}, 8083@uref{http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf}. 8084 8085@node object-pascal-format, ycp-format, awk-format, Translators for other Languages 8086@subsection Object Pascal Format Strings 8087 8088Where is this documented? 8089 8090@node ycp-format, tcl-format, object-pascal-format, Translators for other Languages 8091@subsection YCP Format Strings 8092 8093YCP sformat strings are described in the libycp documentation 8094@uref{file:/usr/share/doc/packages/libycp/YCP-builtins.html}. 8095In summary, a directive starts with @samp{%} and is followed by @samp{%} 8096or a nonzero digit (@samp{1} to @samp{9}). 8097 8098@node tcl-format, perl-format, ycp-format, Translators for other Languages 8099@subsection Tcl Format Strings 8100 8101Tcl format strings are described in the @file{format.n} manual page, 8102@uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}. 8103 8104@node perl-format, php-format, tcl-format, Translators for other Languages 8105@subsection Perl Format Strings 8106 8107There are two kinds format strings in Perl: those acceptable to the 8108Perl built-in function @code{printf}, labelled as @samp{perl-format}, 8109and those acceptable to the @code{libintl-perl} function @code{__x}, 8110labelled as @samp{perl-brace-format}. 8111 8112Perl @code{printf} format strings are described in the @code{sprintf} 8113section of @samp{man perlfunc}. 8114 8115Perl brace format strings are described in the 8116@file{Locale::TextDomain(3pm)} manual page of the CPAN package 8117libintl-perl. In brief, Perl format uses placeholders put between 8118braces (@samp{@{} and @samp{@}}). The placeholder must have the syntax 8119of simple identifiers. 8120 8121@node php-format, gcc-internal-format, perl-format, Translators for other Languages 8122@subsection PHP Format Strings 8123 8124PHP format strings are described in the documentation of the PHP function 8125@code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or 8126@uref{http://www.php.net/manual/en/function.sprintf.php}. 8127 8128@node gcc-internal-format, qt-format, php-format, Translators for other Languages 8129@subsection GCC internal Format Strings 8130 8131These format strings are used inside the GCC sources. In such a format 8132string, a directive starts with @samp{%}, is optionally followed by a 8133size specifier @samp{l}, an optional flag @samp{+}, another optional flag 8134@samp{#}, and is finished by a specifier: @samp{%} denotes a literal 8135percent sign, @samp{c} denotes a character, @samp{s} denotes a string, 8136@samp{i} and @samp{d} denote an integer, @samp{o}, @samp{u}, @samp{x} 8137denote an unsigned integer, @samp{.*s} denotes a string preceded by a 8138width specification, @samp{H} denotes a @samp{location_t *} pointer, 8139@samp{D} denotes a general declaration, @samp{F} denotes a function 8140declaration, @samp{T} denotes a type, @samp{A} denotes a function argument, 8141@samp{C} denotes a tree code, @samp{E} denotes an expression, @samp{L} 8142denotes a programming language, @samp{O} denotes a binary operator, 8143@samp{P} denotes a function parameter, @samp{Q} denotes an assignment 8144operator, @samp{V} denotes a const/volatile qualifier. 8145 8146@node qt-format, boost-format, gcc-internal-format, Translators for other Languages 8147@subsection Qt Format Strings 8148 8149Qt format strings are described in the documentation of the QString class 8150@uref{file:/usr/lib/qt-3.0.5/doc/html/qstring.html}. 8151In summary, a directive consists of a @samp{%} followed by a digit. The same 8152directive cannot occur more than once in a format string. 8153 8154@node boost-format, , qt-format, Translators for other Languages 8155@subsection Boost Format Strings 8156 8157Boost format strings are described in the documentation of the 8158@code{boost::format} class, at 8159@uref{http://www.boost.org/libs/format/doc/format.html}. 8160In summary, a directive has either the same syntax as in a C format string, 8161such as @samp{%1$+5d}, or may be surrounded by vertical bars, such as 8162@samp{%|1$+5d|} or @samp{%|1$+5|}, or consists of just an argument number 8163between percent signs, such as @samp{%1%}. 8164 8165@node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages 8166@section The Maintainer's View 8167 8168For the maintainer, the general procedure differs from the C language 8169case in two ways. 8170 8171@itemize @bullet 8172@item 8173For those languages that don't use GNU gettext, the @file{intl/} directory 8174is not needed and can be omitted. This means that the maintainer calls the 8175@code{gettextize} program without the @samp{--intl} option, and that he 8176invokes the @code{AM_GNU_GETTEXT} autoconf macro via 8177@samp{AM_GNU_GETTEXT([external])}. 8178 8179@item 8180If only a single programming language is used, the @code{XGETTEXT_OPTIONS} 8181variable in @file{po/Makevars} (@pxref{po/Makevars}) should be adjusted to 8182match the @code{xgettext} options for that particular programming language. 8183If the package uses more than one programming language with @code{gettext} 8184support, it becomes necessary to change the POT file construction rule 8185in @file{po/Makefile.in.in}. It is recommended to make one @code{xgettext} 8186invocation per programming language, each with the options appropriate for 8187that language, and to combine the resulting files using @code{msgcat}. 8188@end itemize 8189 8190@node List of Programming Languages, List of Data Formats, Maintainers for other Languages, Programming Languages 8191@section Individual Programming Languages 8192 8193@c Here is a list of programming languages, as used for Free Software projects 8194@c on SourceForge/Freshmeat, as of February 2002. Those supported by gettext 8195@c are marked with a star. 8196@c C 3580 * 8197@c Perl 1911 * 8198@c C++ 1379 * 8199@c Java 1200 * 8200@c PHP 1051 * 8201@c Python 613 * 8202@c Unix Shell 357 * 8203@c Tcl 266 * 8204@c SQL 174 8205@c JavaScript 118 8206@c Assembly 108 8207@c Scheme 51 8208@c Ruby 47 8209@c Lisp 45 * 8210@c Objective C 39 * 8211@c PL/SQL 29 8212@c Fortran 25 8213@c Ada 24 8214@c Delphi 22 8215@c Awk 19 * 8216@c Pascal 19 8217@c ML 19 8218@c Eiffel 17 8219@c Emacs-Lisp 14 * 8220@c Zope 14 8221@c ASP 12 8222@c Forth 12 8223@c Cold Fusion 10 8224@c Haskell 9 8225@c Visual Basic 9 8226@c C# 6 * 8227@c Smalltalk 6 * 8228@c Basic 5 8229@c Erlang 5 8230@c Modula 5 8231@c Object Pascal 5 * 8232@c Rexx 5 8233@c Dylan 4 8234@c Prolog 4 8235@c APL 3 8236@c PROGRESS 2 8237@c Euler 1 8238@c Euphoria 1 8239@c Pliant 1 8240@c Simula 1 8241@c XBasic 1 8242@c Logo 0 8243@c Other Scripting Engines 49 8244@c Other 116 8245 8246@menu 8247* C:: C, C++, Objective C 8248* sh:: sh - Shell Script 8249* bash:: bash - Bourne-Again Shell Script 8250* Python:: Python 8251* Common Lisp:: GNU clisp - Common Lisp 8252* clisp C:: GNU clisp C sources 8253* Emacs Lisp:: Emacs Lisp 8254* librep:: librep 8255* Scheme:: GNU guile - Scheme 8256* Smalltalk:: GNU Smalltalk 8257* Java:: Java 8258* C#:: C# 8259* gawk:: GNU awk 8260* Pascal:: Pascal - Free Pascal Compiler 8261* wxWidgets:: wxWidgets library 8262* YCP:: YCP - YaST2 scripting language 8263* Tcl:: Tcl - Tk's scripting language 8264* Perl:: Perl 8265* PHP:: PHP Hypertext Preprocessor 8266* Pike:: Pike 8267* GCC-source:: GNU Compiler Collection sources 8268@end menu 8269 8270@node C, sh, List of Programming Languages, List of Programming Languages 8271@subsection C, C++, Objective C 8272@cindex C and C-like languages 8273 8274@table @asis 8275@item RPMs 8276gcc, gpp, gobjc, glibc, gettext 8277 8278@item File extension 8279For C: @code{c}, @code{h}. 8280@*For C++: @code{C}, @code{c++}, @code{cc}, @code{cxx}, @code{cpp}, @code{hpp}. 8281@*For Objective C: @code{m}. 8282 8283@item String syntax 8284@code{"abc"} 8285 8286@item gettext shorthand 8287@code{_("abc")} 8288 8289@item gettext/ngettext functions 8290@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, 8291@code{dngettext}, @code{dcngettext} 8292 8293@item textdomain 8294@code{textdomain} function 8295 8296@item bindtextdomain 8297@code{bindtextdomain} function 8298 8299@item setlocale 8300Programmer must call @code{setlocale (LC_ALL, "")} 8301 8302@item Prerequisite 8303@code{#include <libintl.h>} 8304@*@code{#include <locale.h>} 8305@*@code{#define _(string) gettext (string)} 8306 8307@item Use or emulate GNU gettext 8308Use 8309 8310@item Extractor 8311@code{xgettext -k_} 8312 8313@item Formatting with positions 8314@code{fprintf "%2$d %1$d"} 8315@*In C++: @code{autosprintf "%2$d %1$d"} 8316(@pxref{Top, , Introduction, autosprintf, GNU autosprintf}) 8317 8318@item Portability 8319autoconf (gettext.m4) and #if ENABLE_NLS 8320 8321@item po-mode marking 8322yes 8323@end table 8324 8325The following examples are available in the @file{examples} directory: 8326@code{hello-c}, @code{hello-c-gnome}, @code{hello-c++}, @code{hello-c++-qt}, 8327@code{hello-c++-kde}, @code{hello-c++-gnome}, @code{hello-c++-wxwidgets}, 8328@code{hello-objc}, @code{hello-objc-gnustep}, @code{hello-objc-gnome}. 8329 8330@node sh, bash, C, List of Programming Languages 8331@subsection sh - Shell Script 8332@cindex shell scripts 8333 8334@table @asis 8335@item RPMs 8336bash, gettext 8337 8338@item File extension 8339@code{sh} 8340 8341@item String syntax 8342@code{"abc"}, @code{'abc'}, @code{abc} 8343 8344@item gettext shorthand 8345@code{"`gettext \"abc\"`"} 8346 8347@item gettext/ngettext functions 8348@pindex gettext 8349@pindex ngettext 8350@code{gettext}, @code{ngettext} programs 8351@*@code{eval_gettext}, @code{eval_ngettext} shell functions 8352 8353@item textdomain 8354@vindex TEXTDOMAIN@r{, environment variable} 8355environment variable @code{TEXTDOMAIN} 8356 8357@item bindtextdomain 8358@vindex TEXTDOMAINDIR@r{, environment variable} 8359environment variable @code{TEXTDOMAINDIR} 8360 8361@item setlocale 8362automatic 8363 8364@item Prerequisite 8365@code{. gettext.sh} 8366 8367@item Use or emulate GNU gettext 8368use 8369 8370@item Extractor 8371@code{xgettext} 8372 8373@item Formatting with positions 8374--- 8375 8376@item Portability 8377fully portable 8378 8379@item po-mode marking 8380--- 8381@end table 8382 8383An example is available in the @file{examples} directory: @code{hello-sh}. 8384 8385@menu 8386* Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization 8387* gettext.sh:: Contents of @code{gettext.sh} 8388* gettext Invocation:: Invoking the @code{gettext} program 8389* ngettext Invocation:: Invoking the @code{ngettext} program 8390* envsubst Invocation:: Invoking the @code{envsubst} program 8391* eval_gettext Invocation:: Invoking the @code{eval_gettext} function 8392* eval_ngettext Invocation:: Invoking the @code{eval_ngettext} function 8393@end menu 8394 8395@node Preparing Shell Scripts, gettext.sh, sh, sh 8396@subsubsection Preparing Shell Scripts for Internationalization 8397@cindex preparing shell scripts for translation 8398 8399Preparing a shell script for internationalization is conceptually similar 8400to the steps described in @ref{Sources}. The concrete steps for shell 8401scripts are as follows. 8402 8403@enumerate 8404@item 8405Insert the line 8406 8407@smallexample 8408. gettext.sh 8409@end smallexample 8410 8411near the top of the script. @code{gettext.sh} is a shell function library 8412that provides the functions 8413@code{eval_gettext} (see @ref{eval_gettext Invocation}) and 8414@code{eval_ngettext} (see @ref{eval_ngettext Invocation}). 8415You have to ensure that @code{gettext.sh} can be found in the @code{PATH}. 8416 8417@item 8418Set and export the @code{TEXTDOMAIN} and @code{TEXTDOMAINDIR} environment 8419variables. Usually @code{TEXTDOMAIN} is the package or program name, and 8420@code{TEXTDOMAINDIR} is the absolute pathname corresponding to 8421@code{$prefix/share/locale}, where @code{$prefix} is the installation location. 8422 8423@smallexample 8424TEXTDOMAIN=@@PACKAGE@@ 8425export TEXTDOMAIN 8426TEXTDOMAINDIR=@@LOCALEDIR@@ 8427export TEXTDOMAINDIR 8428@end smallexample 8429 8430@item 8431Prepare the strings for translation, as described in @ref{Preparing Strings}. 8432 8433@item 8434Simplify translatable strings so that they don't contain command substitution 8435(@code{"`...`"} or @code{"$(...)"}), variable access with defaulting (like 8436@code{$@{@var{variable}-@var{default}@}}), access to positional arguments 8437(like @code{$0}, @code{$1}, ...) or highly volatile shell variables (like 8438@code{$?}). This can always be done through simple local code restructuring. 8439For example, 8440 8441@smallexample 8442echo "Usage: $0 [OPTION] FILE..." 8443@end smallexample 8444 8445becomes 8446 8447@smallexample 8448program_name=$0 8449echo "Usage: $program_name [OPTION] FILE..." 8450@end smallexample 8451 8452Similarly, 8453 8454@smallexample 8455echo "Remaining files: `ls | wc -l`" 8456@end smallexample 8457 8458becomes 8459 8460@smallexample 8461filecount="`ls | wc -l`" 8462echo "Remaining files: $filecount" 8463@end smallexample 8464 8465@item 8466For each translatable string, change the output command @samp{echo} or 8467@samp{$echo} to @samp{gettext} (if the string contains no references to 8468shell variables) or to @samp{eval_gettext} (if it refers to shell variables), 8469followed by a no-argument @samp{echo} command (to account for the terminating 8470newline). Similarly, for cases with plural handling, replace a conditional 8471@samp{echo} command with an invocation of @samp{ngettext} or 8472@samp{eval_ngettext}, followed by a no-argument @samp{echo} command. 8473 8474When doing this, you also need to add an extra backslash before the dollar 8475sign in references to shell variables, so that the @samp{eval_gettext} 8476function receives the translatable string before the variable values are 8477substituted into it. For example, 8478 8479@smallexample 8480echo "Remaining files: $filecount" 8481@end smallexample 8482 8483becomes 8484 8485@smallexample 8486eval_gettext "Remaining files: \$filecount"; echo 8487@end smallexample 8488 8489If the output command is not @samp{echo}, you can make it use @samp{echo} 8490nevertheless, through the use of backquotes. However, note that inside 8491backquotes, backslashes must be doubled to be effective (because the 8492backquoting eats one level of backslashes). For example, assuming that 8493@samp{error} is a shell function that signals an error, 8494 8495@smallexample 8496error "file not found: $filename" 8497@end smallexample 8498 8499is first transformed into 8500 8501@smallexample 8502error "`echo \"file not found: \$filename\"`" 8503@end smallexample 8504 8505which then becomes 8506 8507@smallexample 8508error "`eval_gettext \"file not found: \\\$filename\"`" 8509@end smallexample 8510@end enumerate 8511 8512@node gettext.sh, gettext Invocation, Preparing Shell Scripts, sh 8513@subsubsection Contents of @code{gettext.sh} 8514 8515@code{gettext.sh}, contained in the run-time package of GNU gettext, provides 8516the following: 8517 8518@itemize @bullet 8519@item $echo 8520The variable @code{echo} is set to a command that outputs its first argument 8521and a newline, without interpreting backslashes in the argument string. 8522 8523@item eval_gettext 8524See @ref{eval_gettext Invocation}. 8525 8526@item eval_ngettext 8527See @ref{eval_ngettext Invocation}. 8528@end itemize 8529 8530@node gettext Invocation, ngettext Invocation, gettext.sh, sh 8531@subsubsection Invoking the @code{gettext} program 8532 8533@include rt-gettext.texi 8534 8535@node ngettext Invocation, envsubst Invocation, gettext Invocation, sh 8536@subsubsection Invoking the @code{ngettext} program 8537 8538@include rt-ngettext.texi 8539 8540@node envsubst Invocation, eval_gettext Invocation, ngettext Invocation, sh 8541@subsubsection Invoking the @code{envsubst} program 8542 8543@include rt-envsubst.texi 8544 8545@node eval_gettext Invocation, eval_ngettext Invocation, envsubst Invocation, sh 8546@subsubsection Invoking the @code{eval_gettext} function 8547 8548@cindex @code{eval_gettext} function, usage 8549@example 8550eval_gettext @var{msgid} 8551@end example 8552 8553@cindex lookup message translation 8554This function outputs the native language translation of a textual message, 8555performing dollar-substitution on the result. Note that only shell variables 8556mentioned in @var{msgid} will be dollar-substituted in the result. 8557 8558@node eval_ngettext Invocation, , eval_gettext Invocation, sh 8559@subsubsection Invoking the @code{eval_ngettext} function 8560 8561@cindex @code{eval_ngettext} function, usage 8562@example 8563eval_ngettext @var{msgid} @var{msgid-plural} @var{count} 8564@end example 8565 8566@cindex lookup plural message translation 8567This function outputs the native language translation of a textual message 8568whose grammatical form depends on a number, performing dollar-substitution 8569on the result. Note that only shell variables mentioned in @var{msgid} or 8570@var{msgid-plural} will be dollar-substituted in the result. 8571 8572@node bash, Python, sh, List of Programming Languages 8573@subsection bash - Bourne-Again Shell Script 8574@cindex bash 8575 8576GNU @code{bash} 2.0 or newer has a special shorthand for translating a 8577string and substituting variable values in it: @code{$"msgid"}. But 8578the use of this construct is @strong{discouraged}, due to the security 8579holes it opens and due to its portability problems. 8580 8581The security holes of @code{$"..."} come from the fact that after looking up 8582the translation of the string, @code{bash} processes it like it processes 8583any double-quoted string: dollar and backquote processing, like @samp{eval} 8584does. 8585 8586@enumerate 8587@item 8588In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, 8589JOHAB, some double-byte characters have a second byte whose value is 8590@code{0x60}. For example, the byte sequence @code{\xe0\x60} is a single 8591character in these locales. Many versions of @code{bash} (all versions 8592up to bash-2.05, and newer versions on platforms without @code{mbsrtowcs()} 8593function) don't know about character boundaries and see a backquote character 8594where there is only a particular Chinese character. Thus it can start 8595executing part of the translation as a command list. This situation can occur 8596even without the translator being aware of it: if the translator provides 8597translations in the UTF-8 encoding, it is the @code{gettext()} function which 8598will, during its conversion from the translator's encoding to the user's 8599locale's encoding, produce the dangerous @code{\x60} bytes. 8600 8601@item 8602A translator could - voluntarily or inadvertently - use backquotes 8603@code{"`...`"} or dollar-parentheses @code{"$(...)"} in her translations. 8604The enclosed strings would be executed as command lists by the shell. 8605@end enumerate 8606 8607The portability problem is that @code{bash} must be built with 8608internationalization support; this is normally not the case on systems 8609that don't have the @code{gettext()} function in libc. 8610 8611@node Python, Common Lisp, bash, List of Programming Languages 8612@subsection Python 8613@cindex Python 8614 8615@table @asis 8616@item RPMs 8617python 8618 8619@item File extension 8620@code{py} 8621 8622@item String syntax 8623@code{'abc'}, @code{u'abc'}, @code{r'abc'}, @code{ur'abc'}, 8624@*@code{"abc"}, @code{u"abc"}, @code{r"abc"}, @code{ur"abc"}, 8625@*@code{'''abc'''}, @code{u'''abc'''}, @code{r'''abc'''}, @code{ur'''abc'''}, 8626@*@code{"""abc"""}, @code{u"""abc"""}, @code{r"""abc"""}, @code{ur"""abc"""} 8627 8628@item gettext shorthand 8629@code{_('abc')} etc. 8630 8631@item gettext/ngettext functions 8632@code{gettext.gettext}, @code{gettext.dgettext}, 8633@code{gettext.ngettext}, @code{gettext.dngettext}, 8634also @code{ugettext}, @code{ungettext} 8635 8636@item textdomain 8637@code{gettext.textdomain} function, or 8638@code{gettext.install(@var{domain})} function 8639 8640@item bindtextdomain 8641@code{gettext.bindtextdomain} function, or 8642@code{gettext.install(@var{domain},@var{localedir})} function 8643 8644@item setlocale 8645not used by the gettext emulation 8646 8647@item Prerequisite 8648@code{import gettext} 8649 8650@item Use or emulate GNU gettext 8651emulate 8652 8653@item Extractor 8654@code{xgettext} 8655 8656@item Formatting with positions 8657@code{'...%(ident)d...' % @{ 'ident': value @}} 8658 8659@item Portability 8660fully portable 8661 8662@item po-mode marking 8663--- 8664@end table 8665 8666An example is available in the @file{examples} directory: @code{hello-python}. 8667 8668@node Common Lisp, clisp C, Python, List of Programming Languages 8669@subsection GNU clisp - Common Lisp 8670@cindex Common Lisp 8671@cindex Lisp 8672@cindex clisp 8673 8674@table @asis 8675@item RPMs 8676clisp 2.28 or newer 8677 8678@item File extension 8679@code{lisp} 8680 8681@item String syntax 8682@code{"abc"} 8683 8684@item gettext shorthand 8685@code{(_ "abc")}, @code{(ENGLISH "abc")} 8686 8687@item gettext/ngettext functions 8688@code{i18n:gettext}, @code{i18n:ngettext} 8689 8690@item textdomain 8691@code{i18n:textdomain} 8692 8693@item bindtextdomain 8694@code{i18n:textdomaindir} 8695 8696@item setlocale 8697automatic 8698 8699@item Prerequisite 8700--- 8701 8702@item Use or emulate GNU gettext 8703use 8704 8705@item Extractor 8706@code{xgettext -k_ -kENGLISH} 8707 8708@item Formatting with positions 8709@code{format "~1@@*~D ~0@@*~D"} 8710 8711@item Portability 8712On platforms without gettext, no translation. 8713 8714@item po-mode marking 8715--- 8716@end table 8717 8718An example is available in the @file{examples} directory: @code{hello-clisp}. 8719 8720@node clisp C, Emacs Lisp, Common Lisp, List of Programming Languages 8721@subsection GNU clisp C sources 8722@cindex clisp C sources 8723 8724@table @asis 8725@item RPMs 8726clisp 8727 8728@item File extension 8729@code{d} 8730 8731@item String syntax 8732@code{"abc"} 8733 8734@item gettext shorthand 8735@code{ENGLISH ? "abc" : ""} 8736@*@code{GETTEXT("abc")} 8737@*@code{GETTEXTL("abc")} 8738 8739@item gettext/ngettext functions 8740@code{clgettext}, @code{clgettextl} 8741 8742@item textdomain 8743--- 8744 8745@item bindtextdomain 8746--- 8747 8748@item setlocale 8749automatic 8750 8751@item Prerequisite 8752@code{#include "lispbibl.c"} 8753 8754@item Use or emulate GNU gettext 8755use 8756 8757@item Extractor 8758@code{clisp-xgettext} 8759 8760@item Formatting with positions 8761@code{fprintf "%2$d %1$d"} 8762 8763@item Portability 8764On platforms without gettext, no translation. 8765 8766@item po-mode marking 8767--- 8768@end table 8769 8770@node Emacs Lisp, librep, clisp C, List of Programming Languages 8771@subsection Emacs Lisp 8772@cindex Emacs Lisp 8773 8774@table @asis 8775@item RPMs 8776emacs, xemacs 8777 8778@item File extension 8779@code{el} 8780 8781@item String syntax 8782@code{"abc"} 8783 8784@item gettext shorthand 8785@code{(_"abc")} 8786 8787@item gettext/ngettext functions 8788@code{gettext}, @code{dgettext} (xemacs only) 8789 8790@item textdomain 8791@code{domain} special form (xemacs only) 8792 8793@item bindtextdomain 8794@code{bind-text-domain} function (xemacs only) 8795 8796@item setlocale 8797automatic 8798 8799@item Prerequisite 8800--- 8801 8802@item Use or emulate GNU gettext 8803use 8804 8805@item Extractor 8806@code{xgettext} 8807 8808@item Formatting with positions 8809@code{format "%2$d %1$d"} 8810 8811@item Portability 8812Only XEmacs. Without @code{I18N3} defined at build time, no translation. 8813 8814@item po-mode marking 8815--- 8816@end table 8817 8818@node librep, Scheme, Emacs Lisp, List of Programming Languages 8819@subsection librep 8820@cindex @code{librep} Lisp 8821 8822@table @asis 8823@item RPMs 8824librep 0.15.3 or newer 8825 8826@item File extension 8827@code{jl} 8828 8829@item String syntax 8830@code{"abc"} 8831 8832@item gettext shorthand 8833@code{(_"abc")} 8834 8835@item gettext/ngettext functions 8836@code{gettext} 8837 8838@item textdomain 8839@code{textdomain} function 8840 8841@item bindtextdomain 8842@code{bindtextdomain} function 8843 8844@item setlocale 8845--- 8846 8847@item Prerequisite 8848@code{(require 'rep.i18n.gettext)} 8849 8850@item Use or emulate GNU gettext 8851use 8852 8853@item Extractor 8854@code{xgettext} 8855 8856@item Formatting with positions 8857@code{format "%2$d %1$d"} 8858 8859@item Portability 8860On platforms without gettext, no translation. 8861 8862@item po-mode marking 8863--- 8864@end table 8865 8866An example is available in the @file{examples} directory: @code{hello-librep}. 8867 8868@node Scheme, Smalltalk, librep, List of Programming Languages 8869@subsection GNU guile - Scheme 8870@cindex Scheme 8871@cindex guile 8872 8873@table @asis 8874@item RPMs 8875guile 8876 8877@item File extension 8878@code{scm} 8879 8880@item String syntax 8881@code{"abc"} 8882 8883@item gettext shorthand 8884@code{(_ "abc")} 8885 8886@item gettext/ngettext functions 8887@code{gettext}, @code{ngettext} 8888 8889@item textdomain 8890@code{textdomain} 8891 8892@item bindtextdomain 8893@code{bindtextdomain} 8894 8895@item setlocale 8896@code{(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))} 8897 8898@item Prerequisite 8899@code{(use-modules (ice-9 format))} 8900 8901@item Use or emulate GNU gettext 8902use 8903 8904@item Extractor 8905@code{xgettext -k_} 8906 8907@item Formatting with positions 8908@c @code{format "~1@@*~D ~0@@*~D~2@@*"}, requires @code{(use-modules (ice-9 format))} 8909@c not yet supported 8910--- 8911 8912@item Portability 8913On platforms without gettext, no translation. 8914 8915@item po-mode marking 8916--- 8917@end table 8918 8919An example is available in the @file{examples} directory: @code{hello-guile}. 8920 8921@node Smalltalk, Java, Scheme, List of Programming Languages 8922@subsection GNU Smalltalk 8923@cindex Smalltalk 8924 8925@table @asis 8926@item RPMs 8927smalltalk 8928 8929@item File extension 8930@code{st} 8931 8932@item String syntax 8933@code{'abc'} 8934 8935@item gettext shorthand 8936@code{NLS ? 'abc'} 8937 8938@item gettext/ngettext functions 8939@code{LcMessagesDomain>>#at:}, @code{LcMessagesDomain>>#at:plural:with:} 8940 8941@item textdomain 8942@code{LcMessages>>#domain:localeDirectory:} (returns a @code{LcMessagesDomain} 8943object).@* 8944Example: @code{I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'} 8945 8946@item bindtextdomain 8947@code{LcMessages>>#domain:localeDirectory:}, see above. 8948 8949@item setlocale 8950Automatic if you use @code{I18N Locale default}. 8951 8952@item Prerequisite 8953@code{PackageLoader fileInPackage: 'I18N'!} 8954 8955@item Use or emulate GNU gettext 8956emulate 8957 8958@item Extractor 8959@code{xgettext} 8960 8961@item Formatting with positions 8962@code{'%1 %2' bindWith: 'Hello' with: 'world'} 8963 8964@item Portability 8965fully portable 8966 8967@item po-mode marking 8968--- 8969@end table 8970 8971An example is available in the @file{examples} directory: 8972@code{hello-smalltalk}. 8973 8974@node Java, C#, Smalltalk, List of Programming Languages 8975@subsection Java 8976@cindex Java 8977 8978@table @asis 8979@item RPMs 8980java, java2 8981 8982@item File extension 8983@code{java} 8984 8985@item String syntax 8986"abc" 8987 8988@item gettext shorthand 8989_("abc") 8990 8991@item gettext/ngettext functions 8992@code{GettextResource.gettext}, @code{GettextResource.ngettext} 8993 8994@item textdomain 8995---, use @code{ResourceBundle.getResource} instead 8996 8997@item bindtextdomain 8998---, use CLASSPATH instead 8999 9000@item setlocale 9001automatic 9002 9003@item Prerequisite 9004--- 9005 9006@item Use or emulate GNU gettext 9007---, uses a Java specific message catalog format 9008 9009@item Extractor 9010@code{xgettext -k_} 9011 9012@item Formatting with positions 9013@code{MessageFormat.format "@{1,number@} @{0,number@}"} 9014 9015@item Portability 9016fully portable 9017 9018@item po-mode marking 9019--- 9020@end table 9021 9022Before marking strings as internationalizable, uses of the string 9023concatenation operator need to be converted to @code{MessageFormat} 9024applications. For example, @code{"file "+filename+" not found"} becomes 9025@code{MessageFormat.format("file @{0@} not found", new Object[] @{ filename @})}. 9026Only after this is done, can the strings be marked and extracted. 9027 9028GNU gettext uses the native Java internationalization mechanism, namely 9029@code{ResourceBundle}s. There are two formats of @code{ResourceBundle}s: 9030@code{.properties} files and @code{.class} files. The @code{.properties} 9031format is a text file which the translators can directly edit, like PO 9032files, but which doesn't support plural forms. Whereas the @code{.class} 9033format is compiled from @code{.java} source code and can support plural 9034forms (provided it is accessed through an appropriate API, see below). 9035 9036To convert a PO file to a @code{.properties} file, the @code{msgcat} 9037program can be used with the option @code{--properties-output}. To convert 9038a @code{.properties} file back to a PO file, the @code{msgcat} program 9039can be used with the option @code{--properties-input}. All the tools 9040that manipulate PO files can work with @code{.properties} files as well, 9041if given the @code{--properties-input} and/or @code{--properties-output} 9042option. 9043 9044To convert a PO file to a ResourceBundle class, the @code{msgfmt} program 9045can be used with the option @code{--java} or @code{--java2}. To convert a 9046ResourceBundle back to a PO file, the @code{msgunfmt} program can be used 9047with the option @code{--java}. 9048 9049Two different programmatic APIs can be used to access ResourceBundles. 9050Note that both APIs work with all kinds of ResourceBundles, whether 9051GNU gettext generated classes, or other @code{.class} or @code{.properties} 9052files. 9053 9054@enumerate 9055@item 9056The @code{java.util.ResourceBundle} API. 9057 9058In particular, its @code{getString} function returns a string translation. 9059Note that a missing translation yields a @code{MissingResourceException}. 9060 9061This has the advantage of being the standard API. And it does not require 9062any additional libraries, only the @code{msgcat} generated @code{.properties} 9063files or the @code{msgfmt} generated @code{.class} files. But it cannot do 9064plural handling, even if the resource was generated by @code{msgfmt} from 9065a PO file with plural handling. 9066 9067@item 9068The @code{gnu.gettext.GettextResource} API. 9069 9070Reference documentation in Javadoc 1.1 style format 9071is in the @uref{javadoc1/tree.html,javadoc1 directory} and 9072in Javadoc 2 style format 9073in the @uref{javadoc2/index.html,javadoc2 directory}. 9074 9075Its @code{gettext} function returns a string translation. Note that when 9076a translation is missing, the @var{msgid} argument is returned unchanged. 9077 9078This has the advantage of having the @code{ngettext} function for plural 9079handling. 9080 9081@cindex @code{libintl} for Java 9082To use this API, one needs the @code{libintl.jar} file which is part of 9083the GNU gettext package and distributed under the LGPL. 9084@end enumerate 9085 9086Three examples, using the second API, are available in the @file{examples} 9087directory: @code{hello-java}, @code{hello-java-awt}, @code{hello-java-swing}. 9088 9089Now, to make use of the API and define a shorthand for @samp{getString}, 9090there are three idioms that you can choose from: 9091 9092@itemize @bullet 9093@item 9094(This one assumes Java 1.5 or newer.) 9095In a unique class of your project, say @samp{Util}, define a static variable 9096holding the @code{ResourceBundle} instance and the shorthand: 9097 9098@smallexample 9099private static ResourceBundle myResources = 9100 ResourceBundle.getBundle("domain-name"); 9101public static String _(String s) @{ 9102 return myResources.getString(s); 9103@} 9104@end smallexample 9105 9106All classes containing internationalized strings then contain 9107 9108@smallexample 9109import static Util._; 9110@end smallexample 9111 9112@noindent 9113and the shorthand is used like this: 9114 9115@smallexample 9116System.out.println(_("Operation completed.")); 9117@end smallexample 9118 9119@item 9120In a unique class of your project, say @samp{Util}, define a static variable 9121holding the @code{ResourceBundle} instance: 9122 9123@smallexample 9124public static ResourceBundle myResources = 9125 ResourceBundle.getBundle("domain-name"); 9126@end smallexample 9127 9128All classes containing internationalized strings then contain 9129 9130@smallexample 9131private static ResourceBundle res = Util.myResources; 9132private static String _(String s) @{ return res.getString(s); @} 9133@end smallexample 9134 9135@noindent 9136and the shorthand is used like this: 9137 9138@smallexample 9139System.out.println(_("Operation completed.")); 9140@end smallexample 9141 9142@item 9143You add a class with a very short name, say @samp{S}, containing just the 9144definition of the resource bundle and of the shorthand: 9145 9146@smallexample 9147public class S @{ 9148 public static ResourceBundle myResources = 9149 ResourceBundle.getBundle("domain-name"); 9150 public static String _(String s) @{ 9151 return myResources.getString(s); 9152 @} 9153@} 9154@end smallexample 9155 9156@noindent 9157and the shorthand is used like this: 9158 9159@smallexample 9160System.out.println(S._("Operation completed.")); 9161@end smallexample 9162@end itemize 9163 9164Which of the three idioms you choose, will depend on whether your project 9165requires portability to Java versions prior to Java 1.5 and, if so, whether 9166copying two lines of codes into every class is more acceptable in your project 9167than a class with a single-letter name. 9168 9169@node C#, gawk, Java, List of Programming Languages 9170@subsection C# 9171@cindex C# 9172 9173@table @asis 9174@item RPMs 9175pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer 9176 9177@item File extension 9178@code{cs} 9179 9180@item String syntax 9181@code{"abc"}, @code{@@"abc"} 9182 9183@item gettext shorthand 9184_("abc") 9185 9186@item gettext/ngettext functions 9187@code{GettextResourceManager.GetString}, 9188@code{GettextResourceManager.GetPluralString} 9189 9190@item textdomain 9191@code{new GettextResourceManager(domain)} 9192 9193@item bindtextdomain 9194---, compiled message catalogs are located in subdirectories of the directory 9195containing the executable 9196 9197@item setlocale 9198automatic 9199 9200@item Prerequisite 9201--- 9202 9203@item Use or emulate GNU gettext 9204---, uses a C# specific message catalog format 9205 9206@item Extractor 9207@code{xgettext -k_} 9208 9209@item Formatting with positions 9210@code{String.Format "@{1@} @{0@}"} 9211 9212@item Portability 9213fully portable 9214 9215@item po-mode marking 9216--- 9217@end table 9218 9219Before marking strings as internationalizable, uses of the string 9220concatenation operator need to be converted to @code{String.Format} 9221invocations. For example, @code{"file "+filename+" not found"} becomes 9222@code{String.Format("file @{0@} not found", filename)}. 9223Only after this is done, can the strings be marked and extracted. 9224 9225GNU gettext uses the native C#/.NET internationalization mechanism, namely 9226the classes @code{ResourceManager} and @code{ResourceSet}. Applications 9227use the @code{ResourceManager} methods to retrieve the native language 9228translation of strings. An instance of @code{ResourceSet} is the in-memory 9229representation of a message catalog file. The @code{ResourceManager} loads 9230and accesses @code{ResourceSet} instances as needed to look up the 9231translations. 9232 9233There are two formats of @code{ResourceSet}s that can be directly loaded by 9234the C# runtime: @code{.resources} files and @code{.dll} files. 9235 9236@itemize @bullet 9237@item 9238The @code{.resources} format is a binary file usually generated through the 9239@code{resgen} or @code{monoresgen} utility, but which doesn't support plural 9240forms. @code{.resources} files can also be embedded in .NET @code{.exe} files. 9241This only affects whether a file system access is performed to load the message 9242catalog; it doesn't affect the contents of the message catalog. 9243 9244@item 9245On the other hand, the @code{.dll} format is a binary file that is compiled 9246from @code{.cs} source code and can support plural forms (provided it is 9247accessed through the GNU gettext API, see below). 9248@end itemize 9249 9250Note that these .NET @code{.dll} and @code{.exe} files are not tied to a 9251particular platform; their file format and GNU gettext for C# can be used 9252on any platform. 9253 9254To convert a PO file to a @code{.resources} file, the @code{msgfmt} program 9255can be used with the option @samp{--csharp-resources}. To convert a 9256@code{.resources} file back to a PO file, the @code{msgunfmt} program can be 9257used with the option @samp{--csharp-resources}. You can also, in some cases, 9258use the @code{resgen} program (from the @code{pnet} package) or the 9259@code{monoresgen} program (from the @code{mono}/@code{mcs} package). These 9260programs can also convert a @code{.resources} file back to a PO file. But 9261beware: as of this writing (January 2004), the @code{monoresgen} converter is 9262quite buggy and the @code{resgen} converter ignores the encoding of the PO 9263files. 9264 9265To convert a PO file to a @code{.dll} file, the @code{msgfmt} program can be 9266used with the option @code{--csharp}. The result will be a @code{.dll} file 9267containing a subclass of @code{GettextResourceSet}, which itself is a subclass 9268of @code{ResourceSet}. To convert a @code{.dll} file containing a 9269@code{GettextResourceSet} subclass back to a PO file, the @code{msgunfmt} 9270program can be used with the option @code{--csharp}. 9271 9272The advantages of the @code{.dll} format over the @code{.resources} format 9273are: 9274 9275@enumerate 9276@item 9277Freedom to localize: Users can add their own translations to an application 9278after it has been built and distributed. Whereas when the programmer uses 9279a @code{ResourceManager} constructor provided by the system, the set of 9280@code{.resources} files for an application must be specified when the 9281application is built and cannot be extended afterwards. 9282@c If this were the only issue with the @code{.resources} format, one could 9283@c use the @code{ResourceManager.CreateFileBasedResourceManager} function. 9284 9285@item 9286Plural handling: A message catalog in @code{.dll} format supports the plural 9287handling function @code{GetPluralString}. Whereas @code{.resources} files can 9288only contain data and only support lookups that depend on a single string. 9289 9290@item 9291The @code{GettextResourceManager} that loads the message catalogs in 9292@code{.dll} format also provides for inheritance on a per-message basis. 9293For example, in Austrian (@code{de_AT}) locale, translations from the German 9294(@code{de}) message catalog will be used for messages not found in the 9295Austrian message catalog. This has the consequence that the Austrian 9296translators need only translate those few messages for which the translation 9297into Austrian differs from the German one. Whereas when working with 9298@code{.resources} files, each message catalog must provide the translations 9299of all messages by itself. 9300 9301@item 9302The @code{GettextResourceManager} that loads the message catalogs in 9303@code{.dll} format also provides for a fallback: The English @var{msgid} is 9304returned when no translation can be found. Whereas when working with 9305@code{.resources} files, a language-neutral @code{.resources} file must 9306explicitly be provided as a fallback. 9307@end enumerate 9308 9309On the side of the programmatic APIs, the programmer can use either the 9310standard @code{ResourceManager} API and the GNU @code{GettextResourceManager} 9311API. The latter is an extension of the former, because 9312@code{GettextResourceManager} is a subclass of @code{ResourceManager}. 9313 9314@enumerate 9315@item 9316The @code{System.Resources.ResourceManager} API. 9317 9318This API works with resources in @code{.resources} format. 9319 9320The creation of the @code{ResourceManager} is done through 9321@smallexample 9322 new ResourceManager(domainname, Assembly.GetExecutingAssembly()) 9323@end smallexample 9324@noindent 9325 9326The @code{GetString} function returns a string's translation. Note that this 9327function returns null when a translation is missing (i.e.@: not even found in 9328the fallback resource file). 9329 9330@item 9331The @code{GNU.Gettext.GettextResourceManager} API. 9332 9333This API works with resources in @code{.dll} format. 9334 9335Reference documentation is in the 9336@uref{csharpdoc/index.html,csharpdoc directory}. 9337 9338The creation of the @code{ResourceManager} is done through 9339@smallexample 9340 new GettextResourceManager(domainname) 9341@end smallexample 9342 9343The @code{GetString} function returns a string's translation. Note that when 9344a translation is missing, the @var{msgid} argument is returned unchanged. 9345 9346The @code{GetPluralString} function returns a string translation with plural 9347handling, like the @code{ngettext} function in C. 9348 9349@cindex @code{libintl} for C# 9350To use this API, one needs the @code{GNU.Gettext.dll} file which is part of 9351the GNU gettext package and distributed under the LGPL. 9352@end enumerate 9353 9354You can also mix both approaches: use the 9355@code{GNU.Gettext.GettextResourceManager} constructor, but otherwise use 9356only the @code{ResourceManager} type and only the @code{GetString} method. 9357This is appropriate when you want to profit from the tools for PO files, 9358but don't want to change an existing source code that uses 9359@code{ResourceManager} and don't (yet) need the @code{GetPluralString} method. 9360 9361Two examples, using the second API, are available in the @file{examples} 9362directory: @code{hello-csharp}, @code{hello-csharp-forms}. 9363 9364Now, to make use of the API and define a shorthand for @samp{GetString}, 9365there are two idioms that you can choose from: 9366 9367@itemize @bullet 9368@item 9369In a unique class of your project, say @samp{Util}, define a static variable 9370holding the @code{ResourceManager} instance: 9371 9372@smallexample 9373public static GettextResourceManager MyResourceManager = 9374 new GettextResourceManager("domain-name"); 9375@end smallexample 9376 9377All classes containing internationalized strings then contain 9378 9379@smallexample 9380private static GettextResourceManager Res = Util.MyResourceManager; 9381private static String _(String s) @{ return Res.GetString(s); @} 9382@end smallexample 9383 9384@noindent 9385and the shorthand is used like this: 9386 9387@smallexample 9388Console.WriteLine(_("Operation completed.")); 9389@end smallexample 9390 9391@item 9392You add a class with a very short name, say @samp{S}, containing just the 9393definition of the resource manager and of the shorthand: 9394 9395@smallexample 9396public class S @{ 9397 public static GettextResourceManager MyResourceManager = 9398 new GettextResourceManager("domain-name"); 9399 public static String _(String s) @{ 9400 return MyResourceManager.GetString(s); 9401 @} 9402@} 9403@end smallexample 9404 9405@noindent 9406and the shorthand is used like this: 9407 9408@smallexample 9409Console.WriteLine(S._("Operation completed.")); 9410@end smallexample 9411@end itemize 9412 9413Which of the two idioms you choose, will depend on whether copying two lines 9414of codes into every class is more acceptable in your project than a class 9415with a single-letter name. 9416 9417@node gawk, Pascal, C#, List of Programming Languages 9418@subsection GNU awk 9419@cindex awk 9420@cindex gawk 9421 9422@table @asis 9423@item RPMs 9424gawk 3.1 or newer 9425 9426@item File extension 9427@code{awk} 9428 9429@item String syntax 9430@code{"abc"} 9431 9432@item gettext shorthand 9433@code{_"abc"} 9434 9435@item gettext/ngettext functions 9436@code{dcgettext}, missing @code{dcngettext} in gawk-3.1.0 9437 9438@item textdomain 9439@code{TEXTDOMAIN} variable 9440 9441@item bindtextdomain 9442@code{bindtextdomain} function 9443 9444@item setlocale 9445automatic, but missing @code{setlocale (LC_MESSAGES, "")} in gawk-3.1.0 9446 9447@item Prerequisite 9448--- 9449 9450@item Use or emulate GNU gettext 9451use 9452 9453@item Extractor 9454@code{xgettext} 9455 9456@item Formatting with positions 9457@code{printf "%2$d %1$d"} (GNU awk only) 9458 9459@item Portability 9460On platforms without gettext, no translation. On non-GNU awks, you must 9461define @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} 9462yourself. 9463 9464@item po-mode marking 9465--- 9466@end table 9467 9468An example is available in the @file{examples} directory: @code{hello-gawk}. 9469 9470@node Pascal, wxWidgets, gawk, List of Programming Languages 9471@subsection Pascal - Free Pascal Compiler 9472@cindex Pascal 9473@cindex Free Pascal 9474@cindex Object Pascal 9475 9476@table @asis 9477@item RPMs 9478fpk 9479 9480@item File extension 9481@code{pp}, @code{pas} 9482 9483@item String syntax 9484@code{'abc'} 9485 9486@item gettext shorthand 9487automatic 9488 9489@item gettext/ngettext functions 9490---, use @code{ResourceString} data type instead 9491 9492@item textdomain 9493---, use @code{TranslateResourceStrings} function instead 9494 9495@item bindtextdomain 9496---, use @code{TranslateResourceStrings} function instead 9497 9498@item setlocale 9499automatic, but uses only LANG, not LC_MESSAGES or LC_ALL 9500 9501@item Prerequisite 9502@code{@{$mode delphi@}} or @code{@{$mode objfpc@}}@*@code{uses gettext;} 9503 9504@item Use or emulate GNU gettext 9505emulate partially 9506 9507@item Extractor 9508@code{ppc386} followed by @code{xgettext} or @code{rstconv} 9509 9510@item Formatting with positions 9511@code{uses sysutils;}@*@code{format "%1:d %0:d"} 9512 9513@item Portability 9514? 9515 9516@item po-mode marking 9517--- 9518@end table 9519 9520The Pascal compiler has special support for the @code{ResourceString} data 9521type. It generates a @code{.rst} file. This is then converted to a 9522@code{.pot} file by use of @code{xgettext} or @code{rstconv}. At runtime, 9523a @code{.mo} file corresponding to translations of this @code{.pot} file 9524can be loaded using the @code{TranslateResourceStrings} function in the 9525@code{gettext} unit. 9526 9527An example is available in the @file{examples} directory: @code{hello-pascal}. 9528 9529@node wxWidgets, YCP, Pascal, List of Programming Languages 9530@subsection wxWidgets library 9531@cindex @code{wxWidgets} library 9532 9533@table @asis 9534@item RPMs 9535wxGTK, gettext 9536 9537@item File extension 9538@code{cpp} 9539 9540@item String syntax 9541@code{"abc"} 9542 9543@item gettext shorthand 9544@code{_("abc")} 9545 9546@item gettext/ngettext functions 9547@code{wxLocale::GetString}, @code{wxGetTranslation} 9548 9549@item textdomain 9550@code{wxLocale::AddCatalog} 9551 9552@item bindtextdomain 9553@code{wxLocale::AddCatalogLookupPathPrefix} 9554 9555@item setlocale 9556@code{wxLocale::Init}, @code{wxSetLocale} 9557 9558@item Prerequisite 9559@code{#include <wx/intl.h>} 9560 9561@item Use or emulate GNU gettext 9562emulate, see @code{include/wx/intl.h} and @code{src/common/intl.cpp} 9563 9564@item Extractor 9565@code{xgettext} 9566 9567@item Formatting with positions 9568wxString::Format supports positions if and only if the system has 9569@code{wprintf()}, @code{vswprintf()} functions and they support positions 9570according to POSIX. 9571 9572@item Portability 9573fully portable 9574 9575@item po-mode marking 9576yes 9577@end table 9578 9579@node YCP, Tcl, wxWidgets, List of Programming Languages 9580@subsection YCP - YaST2 scripting language 9581@cindex YCP 9582@cindex YaST2 scripting language 9583 9584@table @asis 9585@item RPMs 9586libycp, libycp-devel, yast2-core, yast2-core-devel 9587 9588@item File extension 9589@code{ycp} 9590 9591@item String syntax 9592@code{"abc"} 9593 9594@item gettext shorthand 9595@code{_("abc")} 9596 9597@item gettext/ngettext functions 9598@code{_()} with 1 or 3 arguments 9599 9600@item textdomain 9601@code{textdomain} statement 9602 9603@item bindtextdomain 9604--- 9605 9606@item setlocale 9607--- 9608 9609@item Prerequisite 9610--- 9611 9612@item Use or emulate GNU gettext 9613use 9614 9615@item Extractor 9616@code{xgettext} 9617 9618@item Formatting with positions 9619@code{sformat "%2 %1"} 9620 9621@item Portability 9622fully portable 9623 9624@item po-mode marking 9625--- 9626@end table 9627 9628An example is available in the @file{examples} directory: @code{hello-ycp}. 9629 9630@node Tcl, Perl, YCP, List of Programming Languages 9631@subsection Tcl - Tk's scripting language 9632@cindex Tcl 9633@cindex Tk's scripting language 9634 9635@table @asis 9636@item RPMs 9637tcl 9638 9639@item File extension 9640@code{tcl} 9641 9642@item String syntax 9643@code{"abc"} 9644 9645@item gettext shorthand 9646@code{[_ "abc"]} 9647 9648@item gettext/ngettext functions 9649@code{::msgcat::mc} 9650 9651@item textdomain 9652--- 9653 9654@item bindtextdomain 9655---, use @code{::msgcat::mcload} instead 9656 9657@item setlocale 9658automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL 9659 9660@item Prerequisite 9661@code{package require msgcat} 9662@*@code{proc _ @{s@} @{return [::msgcat::mc $s]@}} 9663 9664@item Use or emulate GNU gettext 9665---, uses a Tcl specific message catalog format 9666 9667@item Extractor 9668@code{xgettext -k_} 9669 9670@item Formatting with positions 9671@code{format "%2\$d %1\$d"} 9672 9673@item Portability 9674fully portable 9675 9676@item po-mode marking 9677--- 9678@end table 9679 9680Two examples are available in the @file{examples} directory: 9681@code{hello-tcl}, @code{hello-tcl-tk}. 9682 9683Before marking strings as internationalizable, substitutions of variables 9684into the string need to be converted to @code{format} applications. For 9685example, @code{"file $filename not found"} becomes 9686@code{[format "file %s not found" $filename]}. 9687Only after this is done, can the strings be marked and extracted. 9688After marking, this example becomes 9689@code{[format [_ "file %s not found"] $filename]} or 9690@code{[msgcat::mc "file %s not found" $filename]}. Note that the 9691@code{msgcat::mc} function implicitly calls @code{format} when more than one 9692argument is given. 9693 9694@node Perl, PHP, Tcl, List of Programming Languages 9695@subsection Perl 9696@cindex Perl 9697 9698@table @asis 9699@item RPMs 9700perl 9701 9702@item File extension 9703@code{pl}, @code{PL}, @code{pm}, @code{cgi} 9704 9705@item String syntax 9706@itemize @bullet 9707 9708@item @code{"abc"} 9709 9710@item @code{'abc'} 9711 9712@item @code{qq (abc)} 9713 9714@item @code{q (abc)} 9715 9716@item @code{qr /abc/} 9717 9718@item @code{qx (/bin/date)} 9719 9720@item @code{/pattern match/} 9721 9722@item @code{?pattern match?} 9723 9724@item @code{s/substitution/operators/} 9725 9726@item @code{$tied_hash@{"message"@}} 9727 9728@item @code{$tied_hash_reference->@{"message"@}} 9729 9730@item etc., issue the command @samp{man perlsyn} for details 9731 9732@end itemize 9733 9734@item gettext shorthand 9735@code{__} (double underscore) 9736 9737@item gettext/ngettext functions 9738@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, 9739@code{dngettext}, @code{dcngettext} 9740 9741@item textdomain 9742@code{textdomain} function 9743 9744@item bindtextdomain 9745@code{bindtextdomain} function 9746 9747@item bind_textdomain_codeset 9748@code{bind_textdomain_codeset} function 9749 9750@item setlocale 9751Use @code{setlocale (LC_ALL, "");} 9752 9753@item Prerequisite 9754@code{use POSIX;} 9755@*@code{use Locale::TextDomain;} (included in the package libintl-perl 9756which is available on the Comprehensive Perl Archive Network CPAN, 9757http://www.cpan.org/). 9758 9759@item Use or emulate GNU gettext 9760platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext 9761 9762@item Extractor 9763@code{xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k} 9764 9765@item Formatting with positions 9766Both kinds of format strings support formatting with positions. 9767@*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer) 9768@*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)} 9769 9770@item Portability 9771The @code{libintl-perl} package is platform independent but is not 9772part of the Perl core. The programmer is responsible for 9773providing a dummy implementation of the required functions if the 9774package is not installed on the target system. 9775 9776@item po-mode marking 9777--- 9778 9779@item Documentation 9780Included in @code{libintl-perl}, available on CPAN 9781(http://www.cpan.org/). 9782 9783@end table 9784 9785An example is available in the @file{examples} directory: @code{hello-perl}. 9786 9787@cindex marking Perl sources 9788 9789The @code{xgettext} parser backend for Perl differs significantly from 9790the parser backends for other programming languages, just as Perl 9791itself differs significantly from other programming languages. The 9792Perl parser backend offers many more string marking facilities than 9793the other backends but it also has some Perl specific limitations, the 9794worst probably being its imperfectness. 9795 9796@menu 9797* General Problems:: General Problems Parsing Perl Code 9798* Default Keywords:: Which Keywords Will xgettext Look For? 9799* Special Keywords:: How to Extract Hash Keys 9800* Quote-like Expressions:: What are Strings And Quote-like Expressions? 9801* Interpolation I:: Invalid String Interpolation 9802* Interpolation II:: Valid String Interpolation 9803* Parentheses:: When To Use Parentheses 9804* Long Lines:: How To Grok with Long Lines 9805* Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work 9806@end menu 9807 9808@node General Problems, Default Keywords, , Perl 9809@subsubsection General Problems Parsing Perl Code 9810 9811It is often heard that only Perl can parse Perl. This is not true. 9812Perl cannot be @emph{parsed} at all, it can only be @emph{executed}. 9813Perl has various built-in ambiguities that can only be resolved at runtime. 9814 9815The following example may illustrate one common problem: 9816 9817@example 9818print gettext "Hello World!"; 9819@end example 9820 9821Although this example looks like a bullet-proof case of a function 9822invocation, it is not: 9823 9824@example 9825open gettext, ">testfile" or die; 9826print gettext "Hello world!" 9827@end example 9828 9829In this context, the string @code{gettext} looks more like a 9830file handle. But not necessarily: 9831 9832@example 9833use Locale::Messages qw (:libintl_h); 9834open gettext ">testfile" or die; 9835print gettext "Hello world!"; 9836@end example 9837 9838Now, the file is probably syntactically incorrect, provided that the module 9839@code{Locale::Messages} found first in the Perl include path exports a 9840function @code{gettext}. But what if the module 9841@code{Locale::Messages} really looks like this? 9842 9843@example 9844use vars qw (*gettext); 9845 98461; 9847@end example 9848 9849In this case, the string @code{gettext} will be interpreted as a file 9850handle again, and the above example will create a file @file{testfile} 9851and write the string ``Hello world!'' into it. Even advanced 9852control flow analysis will not really help: 9853 9854@example 9855if (0.5 < rand) @{ 9856 eval "use Sane"; 9857@} else @{ 9858 eval "use InSane"; 9859@} 9860print gettext "Hello world!"; 9861@end example 9862 9863If the module @code{Sane} exports a function @code{gettext} that does 9864what we expect, and the module @code{InSane} opens a file for writing 9865and associates the @emph{handle} @code{gettext} with this output 9866stream, we are clueless again about what will happen at runtime. It is 9867completely unpredictable. The truth is that Perl has so many ways to 9868fill its symbol table at runtime that it is impossible to interpret a 9869particular piece of code without executing it. 9870 9871Of course, @code{xgettext} will not execute your Perl sources while 9872scanning for translatable strings, but rather use heuristics in order 9873to guess what you meant. 9874 9875Another problem is the ambiguity of the slash and the question mark. 9876Their interpretation depends on the context: 9877 9878@example 9879# A pattern match. 9880print "OK\n" if /foobar/; 9881 9882# A division. 9883print 1 / 2; 9884 9885# Another pattern match. 9886print "OK\n" if ?foobar?; 9887 9888# Conditional. 9889print $x ? "foo" : "bar"; 9890@end example 9891 9892The slash may either act as the division operator or introduce a 9893pattern match, whereas the question mark may act as the ternary 9894conditional operator or as a pattern match, too. Other programming 9895languages like @code{awk} present similar problems, but the consequences of a 9896misinterpretation are particularly nasty with Perl sources. In @code{awk} 9897for instance, a statement can never exceed one line and the parser 9898can recover from a parsing error at the next newline and interpret 9899the rest of the input stream correctly. Perl is different, as a 9900pattern match is terminated by the next appearance of the delimiter 9901(the slash or the question mark) in the input stream, regardless of 9902the semantic context. If a slash is really a division sign but 9903mis-interpreted as a pattern match, the rest of the input file is most 9904probably parsed incorrectly. 9905 9906If you find that @code{xgettext} fails to extract strings from 9907portions of your sources, you should therefore look out for slashes 9908and/or question marks preceding these sections. You may have come 9909across a bug in @code{xgettext}'s Perl parser (and of course you 9910should report that bug). In the meantime you should consider to 9911reformulate your code in a manner less challenging to @code{xgettext}. 9912 9913@node Default Keywords, Special Keywords, General Problems, Perl 9914@subsubsection Which keywords will xgettext look for? 9915@cindex Perl default keywords 9916 9917Unless you instruct @code{xgettext} otherwise by invoking it with one 9918of the options @code{--keyword} or @code{-k}, it will recognize the 9919following keywords in your Perl sources: 9920 9921@itemize @bullet 9922 9923@item @code{gettext} 9924 9925@item @code{dgettext} 9926 9927@item @code{dcgettext} 9928 9929@item @code{ngettext:1,2} 9930 9931The first (singular) and the second (plural) argument will be 9932extracted. 9933 9934@item @code{dngettext:1,2} 9935 9936The first (singular) and the second (plural) argument will be 9937extracted. 9938 9939@item @code{dcngettext:1,2} 9940 9941The first (singular) and the second (plural) argument will be 9942extracted. 9943 9944@item @code{gettext_noop} 9945 9946@item @code{%gettext} 9947 9948The keys of lookups into the hash @code{%gettext} will be extracted. 9949 9950@item @code{$gettext} 9951 9952The keys of lookups into the hash reference @code{$gettext} will be extracted. 9953 9954@end itemize 9955 9956@node Special Keywords, Quote-like Expressions, Default Keywords, Perl 9957@subsubsection How to Extract Hash Keys 9958@cindex Perl special keywords for hash-lookups 9959 9960Translating messages at runtime is normally performed by looking up the 9961original string in the translation database and returning the 9962translated version. The ``natural'' Perl implementation is a hash 9963lookup, and, of course, @code{xgettext} supports such practice. 9964 9965@example 9966print __"Hello world!"; 9967print $__@{"Hello world!"@}; 9968print $__->@{"Hello world!"@}; 9969print $$__@{"Hello world!"@}; 9970@end example 9971 9972The above four lines all do the same thing. The Perl module 9973@code{Locale::TextDomain} exports by default a hash @code{%__} that 9974is tied to the function @code{__()}. It also exports a reference 9975@code{$__} to @code{%__}. 9976 9977If an argument to the @code{xgettext} option @code{--keyword}, 9978resp. @code{-k} starts with a percent sign, the rest of the keyword is 9979interpreted as the name of a hash. If it starts with a dollar 9980sign, the rest of the keyword is interpreted as a reference to a 9981hash. 9982 9983Note that you can omit the quotation marks (single or double) around 9984the hash key (almost) whenever Perl itself allows it: 9985 9986@example 9987print $gettext@{Error@}; 9988@end example 9989 9990The exact rule is: You can omit the surrounding quotes, when the hash 9991key is a valid C (!) identifier, i.e.@: when it starts with an 9992underscore or an ASCII letter and is followed by an arbitrary number 9993of underscores, ASCII letters or digits. Other Unicode characters 9994are @emph{not} allowed, regardless of the @code{use utf8} pragma. 9995 9996@node Quote-like Expressions, Interpolation I, Special Keywords, Perl 9997@subsubsection What are Strings And Quote-like Expressions? 9998@cindex Perl quote-like expressions 9999 10000Perl offers a plethora of different string constructs. Those that can 10001be used either as arguments to functions or inside braces for hash 10002lookups are generally supported by @code{xgettext}. 10003 10004@itemize @bullet 10005@item @strong{double-quoted strings} 10006@* 10007@example 10008print gettext "Hello World!"; 10009@end example 10010 10011@item @strong{single-quoted strings} 10012@* 10013@example 10014print gettext 'Hello World!'; 10015@end example 10016 10017@item @strong{the operator qq} 10018@* 10019@example 10020print gettext qq |Hello World!|; 10021print gettext qq <E-mail: <guido\@@imperia.net>>; 10022@end example 10023 10024The operator @code{qq} is fully supported. You can use arbitrary 10025delimiters, including the four bracketing delimiters (round, angle, 10026square, curly) that nest. 10027 10028@item @strong{the operator q} 10029@* 10030@example 10031print gettext q |Hello World!|; 10032print gettext q <E-mail: <guido@@imperia.net>>; 10033@end example 10034 10035The operator @code{q} is fully supported. You can use arbitrary 10036delimiters, including the four bracketing delimiters (round, angle, 10037square, curly) that nest. 10038 10039@item @strong{the operator qx} 10040@* 10041@example 10042print gettext qx ;LANGUAGE=C /bin/date; 10043print gettext qx [/usr/bin/ls | grep '^[A-Z]*']; 10044@end example 10045 10046The operator @code{qx} is fully supported. You can use arbitrary 10047delimiters, including the four bracketing delimiters (round, angle, 10048square, curly) that nest. 10049 10050The example is actually a useless use of @code{gettext}. It will 10051invoke the @code{gettext} function on the output of the command 10052specified with the @code{qx} operator. The feature was included 10053in order to make the interface consistent (the parser will extract 10054all strings and quote-like expressions). 10055 10056@item @strong{here documents} 10057@* 10058@example 10059@group 10060print gettext <<'EOF'; 10061program not found in $PATH 10062EOF 10063 10064print ngettext <<EOF, <<"EOF"; 10065one file deleted 10066EOF 10067several files deleted 10068EOF 10069@end group 10070@end example 10071 10072Here-documents are recognized. If the delimiter is enclosed in single 10073quotes, the string is not interpolated. If it is enclosed in double 10074quotes or has no quotes at all, the string is interpolated. 10075 10076Delimiters that start with a digit are not supported! 10077 10078@end itemize 10079 10080@node Interpolation I, Interpolation II, Quote-like Expressions, Perl 10081@subsubsection Invalid Uses Of String Interpolation 10082@cindex Perl invalid string interpolation 10083 10084Perl is capable of interpolating variables into strings. This offers 10085some nice features in localized programs but can also lead to 10086problems. 10087 10088A common error is a construct like the following: 10089 10090@example 10091print gettext "This is the program $0!\n"; 10092@end example 10093 10094Perl will interpolate at runtime the value of the variable @code{$0} 10095into the argument of the @code{gettext()} function. Hence, this 10096argument is not a string constant but a variable argument (@code{$0} 10097is a global variable that holds the name of the Perl script being 10098executed). The interpolation is performed by Perl before the string 10099argument is passed to @code{gettext()} and will therefore depend on 10100the name of the script which can only be determined at runtime. 10101Consequently, it is almost impossible that a translation can be looked 10102up at runtime (except if, by accident, the interpolated string is found 10103in the message catalog). 10104 10105The @code{xgettext} program will therefore terminate parsing with a fatal 10106error if it encounters a variable inside of an extracted string. In 10107general, this will happen for all kinds of string interpolations that 10108cannot be safely performed at compile time. If you absolutely know 10109what you are doing, you can always circumvent this behavior: 10110 10111@example 10112my $know_what_i_am_doing = "This is program $0!\n"; 10113print gettext $know_what_i_am_doing; 10114@end example 10115 10116Since the parser only recognizes strings and quote-like expressions, 10117but not variables or other terms, the above construct will be 10118accepted. You will have to find another way, however, to let your 10119original string make it into your message catalog. 10120 10121If invoked with the option @code{--extract-all}, resp. @code{-a}, 10122variable interpolation will be accepted. Rationale: You will 10123generally use this option in order to prepare your sources for 10124internationalization. 10125 10126Please see the manual page @samp{man perlop} for details of strings and 10127quote-like expressions that are subject to interpolation and those 10128that are not. Safe interpolations (that will not lead to a fatal 10129error) are: 10130 10131@itemize @bullet 10132 10133@item the escape sequences @code{\t} (tab, HT, TAB), @code{\n} 10134(newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF), 10135@code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e} 10136(escape, ESC). 10137 10138@item octal chars, like @code{\033} 10139@* 10140Note that octal escapes in the range of 400-777 are translated into a 10141UTF-8 representation, regardless of the presence of the @code{use utf8} pragma. 10142 10143@item hex chars, like @code{\x1b} 10144 10145@item wide hex chars, like @code{\x@{263a@}} 10146@* 10147Note that this escape is translated into a UTF-8 representation, 10148regardless of the presence of the @code{use utf8} pragma. 10149 10150@item control chars, like @code{\c[} (CTRL-[) 10151 10152@item named Unicode chars, like @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}} 10153@* 10154Note that this escape is translated into a UTF-8 representation, 10155regardless of the presence of the @code{use utf8} pragma. 10156@end itemize 10157 10158The following escapes are considered partially safe: 10159 10160@itemize @bullet 10161 10162@item @code{\l} lowercase next char 10163 10164@item @code{\u} uppercase next char 10165 10166@item @code{\L} lowercase till \E 10167 10168@item @code{\U} uppercase till \E 10169 10170@item @code{\E} end case modification 10171 10172@item @code{\Q} quote non-word characters till \E 10173 10174@end itemize 10175 10176These escapes are only considered safe if the string consists of 10177ASCII characters only. Translation of characters outside the range 10178defined by ASCII is locale-dependent and can actually only be performed 10179at runtime; @code{xgettext} doesn't do these locale-dependent translations 10180at extraction time. 10181 10182Except for the modifier @code{\Q}, these translations, albeit valid, 10183are generally useless and only obfuscate your sources. If a 10184translation can be safely performed at compile time you can just as 10185well write what you mean. 10186 10187@node Interpolation II, Parentheses, Interpolation I, Perl 10188@subsubsection Valid Uses Of String Interpolation 10189@cindex Perl valid string interpolation 10190 10191Perl is often used to generate sources for other programming languages 10192or arbitrary file formats. Web applications that output HTML code 10193make a prominent example for such usage. 10194 10195You will often come across situations where you want to intersperse 10196code written in the target (programming) language with translatable 10197messages, like in the following HTML example: 10198 10199@example 10200print gettext <<EOF; 10201<h1>My Homepage</h1> 10202<script language="JavaScript"><!-- 10203for (i = 0; i < 100; ++i) @{ 10204 alert ("Thank you so much for visiting my homepage!"); 10205@} 10206//--></script> 10207EOF 10208@end example 10209 10210The parser will extract the entire here document, and it will appear 10211entirely in the resulting PO file, including the JavaScript snippet 10212embedded in the HTML code. If you exaggerate with constructs like 10213the above, you will run the risk that the translators of your package 10214will look out for a less challenging project. You should consider an 10215alternative expression here: 10216 10217@example 10218print <<EOF; 10219<h1>$gettext@{"My Homepage"@}</h1> 10220<script language="JavaScript"><!-- 10221for (i = 0; i < 100; ++i) @{ 10222 alert ("$gettext@{'Thank you so much for visiting my homepage!'@}"); 10223@} 10224//--></script> 10225EOF 10226@end example 10227 10228Only the translatable portions of the code will be extracted here, and 10229the resulting PO file will begrudgingly improve in terms of readability. 10230 10231You can interpolate hash lookups in all strings or quote-like 10232expressions that are subject to interpolation (see the manual page 10233@samp{man perlop} for details). Double interpolation is invalid, however: 10234 10235@example 10236# TRANSLATORS: Replace "the earth" with the name of your planet. 10237print gettext qq@{Welcome to $gettext->@{"the earth"@}@}; 10238@end example 10239 10240The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in 10241the first place, and checked for invalid variable interpolation. The 10242dollar sign of hash-dereferencing will therefore terminate the parser 10243with an ``invalid interpolation'' error. 10244 10245It is valid to interpolate hash lookups in regular expressions: 10246 10247@example 10248if ($var =~ /$gettext@{"the earth"@}/) @{ 10249 print gettext "Match!\n"; 10250@} 10251s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g; 10252@end example 10253 10254@node Parentheses, Long Lines, Interpolation II, Perl 10255@subsubsection When To Use Parentheses 10256@cindex Perl parentheses 10257 10258In Perl, parentheses around function arguments are mostly optional. 10259@code{xgettext} will always assume that all 10260recognized keywords (except for hashes and hash references) are names 10261of properly prototyped functions, and will (hopefully) only require 10262parentheses where Perl itself requires them. All constructs in the 10263following example are therefore ok to use: 10264 10265@example 10266@group 10267print gettext ("Hello World!\n"); 10268print gettext "Hello World!\n"; 10269print dgettext ($package => "Hello World!\n"); 10270print dgettext $package, "Hello World!\n"; 10271 10272# The "fat comma" => turns the left-hand side argument into a 10273# single-quoted string! 10274print dgettext smellovision => "Hello World!\n"; 10275 10276# The following assignment only works with prototyped functions. 10277# Otherwise, the functions will act as "greedy" list operators and 10278# eat up all following arguments. 10279my $anonymous_hash = @{ 10280 planet => gettext "earth", 10281 cakes => ngettext "one cake", "several cakes", $n, 10282 still => $works, 10283@}; 10284# The same without fat comma: 10285my $other_hash = @{ 10286 'planet', gettext "earth", 10287 'cakes', ngettext "one cake", "several cakes", $n, 10288 'still', $works, 10289@}; 10290 10291# Parentheses are only significant for the first argument. 10292print dngettext 'package', ("one cake", "several cakes", $n), $discarded; 10293@end group 10294@end example 10295 10296@node Long Lines, Perl Pitfalls, Parentheses, Perl 10297@subsubsection How To Grok with Long Lines 10298@cindex Perl long lines 10299 10300The necessity of long messages can often lead to a cumbersome or 10301unreadable coding style. Perl has several options that may prevent 10302you from writing unreadable code, and 10303@code{xgettext} does its best to do likewise. This is where the dot 10304operator (the string concatenation operator) may come in handy: 10305 10306@example 10307@group 10308print gettext ("This is a very long" 10309 . " message that is still" 10310 . " readable, because" 10311 . " it is split into" 10312 . " multiple lines.\n"); 10313@end group 10314@end example 10315 10316Perl is smart enough to concatenate these constant string fragments 10317into one long string at compile time, and so is 10318@code{xgettext}. You will only find one long message in the resulting 10319POT file. 10320 10321Note that the future Perl 6 will probably use the underscore 10322(@samp{_}) as the string concatenation operator, and the dot 10323(@samp{.}) for dereferencing. This new syntax is not yet supported by 10324@code{xgettext}. 10325 10326If embedded newline characters are not an issue, or even desired, you 10327may also insert newline characters inside quoted strings wherever you 10328feel like it: 10329 10330@example 10331@group 10332print gettext ("<em>In HTML output 10333embedded newlines are generally no 10334problem, since adjacent whitespace 10335is always rendered into a single 10336space character.</em>"); 10337@end group 10338@end example 10339 10340You may also consider to use here documents: 10341 10342@example 10343@group 10344print gettext <<EOF; 10345<em>In HTML output 10346embedded newlines are generally no 10347problem, since adjacent whitespace 10348is always rendered into a single 10349space character.</em> 10350EOF 10351@end group 10352@end example 10353 10354Please do not forget that the line breaks are real, i.e.@: they 10355translate into newline characters that will consequently show up in 10356the resulting POT file. 10357 10358@node Perl Pitfalls, , Long Lines, Perl 10359@subsubsection Bugs, Pitfalls, And Things That Do Not Work 10360@cindex Perl pitfalls 10361 10362The foregoing sections should have proven that 10363@code{xgettext} is quite smart in extracting translatable strings from 10364Perl sources. Yet, some more or less exotic constructs that could be 10365expected to work, actually do not work. 10366 10367One of the more relevant limitations can be found in the 10368implementation of variable interpolation inside quoted strings. Only 10369simple hash lookups can be used there: 10370 10371@example 10372print <<EOF; 10373$gettext@{"The dot operator" 10374 . " does not work" 10375 . "here!"@} 10376Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@} 10377inside quoted strings or quote-like expressions. 10378EOF 10379@end example 10380 10381This is valid Perl code and will actually trigger invocations of the 10382@code{gettext} function at runtime. Yet, the Perl parser in 10383@code{xgettext} will fail to recognize the strings. A less obvious 10384example can be found in the interpolation of regular expressions: 10385 10386@example 10387s/<!--START_OF_WEEK-->/gettext ("Sunday")/e; 10388@end example 10389 10390The modifier @code{e} will cause the substitution to be interpreted as 10391an evaluable statement. Consequently, at runtime the function 10392@code{gettext()} is called, but again, the parser fails to extract the 10393string ``Sunday''. Use a temporary variable as a simple workaround if 10394you really happen to need this feature: 10395 10396@example 10397my $sunday = gettext "Sunday"; 10398s/<!--START_OF_WEEK-->/$sunday/; 10399@end example 10400 10401Hash slices would also be handy but are not recognized: 10402 10403@example 10404my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday', 10405 'Thursday', 'Friday', 'Saturday'@}; 10406# Or even: 10407@@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday 10408 Friday Saturday) @}; 10409@end example 10410 10411This is perfectly valid usage of the tied hash @code{%gettext} but the 10412strings are not recognized and therefore will not be extracted. 10413 10414Another caveat of the current version is its rudimentary support for 10415non-ASCII characters in identifiers. You may encounter serious 10416problems if you use identifiers with characters outside the range of 10417'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'. 10418 10419Maybe some of these missing features will be implemented in future 10420versions, but since you can always make do without them at minimal effort, 10421these todos have very low priority. 10422 10423A nasty problem are brace format strings that already contain braces 10424as part of the normal text, for example the usage strings typically 10425encountered in programs: 10426 10427@example 10428die "usage: $0 @{OPTIONS@} FILENAME...\n"; 10429@end example 10430 10431If you want to internationalize this code with Perl brace format strings, 10432you will run into a problem: 10433 10434@example 10435die __x ("usage: @{program@} @{OPTIONS@} FILENAME...\n", program => $0); 10436@end example 10437 10438Whereas @samp{@{program@}} is a placeholder, @samp{@{OPTIONS@}} 10439is not and should probably be translated. Yet, there is no way to teach 10440the Perl parser in @code{xgettext} to recognize the first one, and leave 10441the other one alone. 10442 10443There are two possible work-arounds for this problem. If you are 10444sure that your program will run under Perl 5.8.0 or newer (these 10445Perl versions handle positional parameters in @code{printf()}) or 10446if you are sure that the translator will not have to reorder the arguments 10447in her translation -- for example if you have only one brace placeholder 10448in your string, or if it describes a syntax, like in this one --, you can 10449mark the string as @code{no-perl-brace-format} and use @code{printf()}: 10450 10451@example 10452# xgettext: no-perl-brace-format 10453die sprintf ("usage: %s @{OPTIONS@} FILENAME...\n", $0); 10454@end example 10455 10456If you want to use the more portable Perl brace format, you will have to do 10457put placeholders in place of the literal braces: 10458 10459@example 10460die __x ("usage: @{program@} @{[@}OPTIONS@{]@} FILENAME...\n", 10461 program => $0, '[' => '@{', ']' => '@}'); 10462@end example 10463 10464Perl brace format strings know no escaping mechanism. No matter how this 10465escaping mechanism looked like, it would either give the programmer a 10466hard time, make translating Perl brace format strings heavy-going, or 10467result in a performance penalty at runtime, when the format directives 10468get executed. Most of the time you will happily get along with 10469@code{printf()} for this special case. 10470 10471@node PHP, Pike, Perl, List of Programming Languages 10472@subsection PHP Hypertext Preprocessor 10473@cindex PHP 10474 10475@table @asis 10476@item RPMs 10477mod_php4, mod_php4-core, phpdoc 10478 10479@item File extension 10480@code{php}, @code{php3}, @code{php4} 10481 10482@item String syntax 10483@code{"abc"}, @code{'abc'} 10484 10485@item gettext shorthand 10486@code{_("abc")} 10487 10488@item gettext/ngettext functions 10489@code{gettext}, @code{dgettext}, @code{dcgettext}; starting with PHP 4.2.0 10490also @code{ngettext}, @code{dngettext}, @code{dcngettext} 10491 10492@item textdomain 10493@code{textdomain} function 10494 10495@item bindtextdomain 10496@code{bindtextdomain} function 10497 10498@item setlocale 10499Programmer must call @code{setlocale (LC_ALL, "")} 10500 10501@item Prerequisite 10502--- 10503 10504@item Use or emulate GNU gettext 10505use 10506 10507@item Extractor 10508@code{xgettext} 10509 10510@item Formatting with positions 10511@code{printf "%2\$d %1\$d"} 10512 10513@item Portability 10514On platforms without gettext, the functions are not available. 10515 10516@item po-mode marking 10517--- 10518@end table 10519 10520An example is available in the @file{examples} directory: @code{hello-php}. 10521 10522@node Pike, GCC-source, PHP, List of Programming Languages 10523@subsection Pike 10524@cindex Pike 10525 10526@table @asis 10527@item RPMs 10528roxen 10529 10530@item File extension 10531@code{pike} 10532 10533@item String syntax 10534@code{"abc"} 10535 10536@item gettext shorthand 10537--- 10538 10539@item gettext/ngettext functions 10540@code{gettext}, @code{dgettext}, @code{dcgettext} 10541 10542@item textdomain 10543@code{textdomain} function 10544 10545@item bindtextdomain 10546@code{bindtextdomain} function 10547 10548@item setlocale 10549@code{setlocale} function 10550 10551@item Prerequisite 10552@code{import Locale.Gettext;} 10553 10554@item Use or emulate GNU gettext 10555use 10556 10557@item Extractor 10558--- 10559 10560@item Formatting with positions 10561--- 10562 10563@item Portability 10564On platforms without gettext, the functions are not available. 10565 10566@item po-mode marking 10567--- 10568@end table 10569 10570@node GCC-source, , Pike, List of Programming Languages 10571@subsection GNU Compiler Collection sources 10572@cindex GCC-source 10573 10574@table @asis 10575@item RPMs 10576gcc 10577 10578@item File extension 10579@code{c}, @code{h}. 10580 10581@item String syntax 10582@code{"abc"} 10583 10584@item gettext shorthand 10585@code{_("abc")} 10586 10587@item gettext/ngettext functions 10588@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, 10589@code{dngettext}, @code{dcngettext} 10590 10591@item textdomain 10592@code{textdomain} function 10593 10594@item bindtextdomain 10595@code{bindtextdomain} function 10596 10597@item setlocale 10598Programmer must call @code{setlocale (LC_ALL, "")} 10599 10600@item Prerequisite 10601@code{#include "intl.h"} 10602 10603@item Use or emulate GNU gettext 10604Use 10605 10606@item Extractor 10607@code{xgettext -k_} 10608 10609@item Formatting with positions 10610--- 10611 10612@item Portability 10613Uses autoconf macros 10614 10615@item po-mode marking 10616yes 10617@end table 10618 10619@c This is the template for new languages. 10620@ignore 10621 10622@ node 10623@ subsection 10624 10625@table @asis 10626@item RPMs 10627 10628@item File extension 10629 10630@item String syntax 10631 10632@item gettext shorthand 10633 10634@item gettext/ngettext functions 10635 10636@item textdomain 10637 10638@item bindtextdomain 10639 10640@item setlocale 10641 10642@item Prerequisite 10643 10644@item Use or emulate GNU gettext 10645 10646@item Extractor 10647 10648@item Formatting with positions 10649 10650@item Portability 10651 10652@item po-mode marking 10653@end table 10654 10655@end ignore 10656 10657@node List of Data Formats, , List of Programming Languages, Programming Languages 10658@section Internationalizable Data 10659 10660Here is a list of other data formats which can be internationalized 10661using GNU gettext. 10662 10663@menu 10664* POT:: POT - Portable Object Template 10665* RST:: Resource String Table 10666* Glade:: Glade - GNOME user interface description 10667@end menu 10668 10669@node POT, RST, List of Data Formats, List of Data Formats 10670@subsection POT - Portable Object Template 10671 10672@table @asis 10673@item RPMs 10674gettext 10675 10676@item File extension 10677@code{pot}, @code{po} 10678 10679@item Extractor 10680@code{xgettext} 10681@end table 10682 10683@node RST, Glade, POT, List of Data Formats 10684@subsection Resource String Table 10685@cindex RST 10686 10687@table @asis 10688@item RPMs 10689fpk 10690 10691@item File extension 10692@code{rst} 10693 10694@item Extractor 10695@code{xgettext}, @code{rstconv} 10696@end table 10697 10698@node Glade, , RST, List of Data Formats 10699@subsection Glade - GNOME user interface description 10700 10701@table @asis 10702@item RPMs 10703glade, libglade, glade2, libglade2, intltool 10704 10705@item File extension 10706@code{glade}, @code{glade2} 10707 10708@item Extractor 10709@code{xgettext}, @code{libglade-xgettext}, @code{xml-i18n-extract}, @code{intltool-extract} 10710@end table 10711 10712@c This is the template for new data formats. 10713@ignore 10714 10715@ node 10716@ subsection 10717 10718@table @asis 10719@item RPMs 10720 10721@item File extension 10722 10723@item Extractor 10724@end table 10725 10726@end ignore 10727 10728@node Conclusion, Language Codes, Programming Languages, Top 10729@chapter Concluding Remarks 10730 10731We would like to conclude this GNU @code{gettext} manual by presenting 10732an history of the Translation Project so far. We finally give 10733a few pointers for those who want to do further research or readings 10734about Native Language Support matters. 10735 10736@menu 10737* History:: History of GNU @code{gettext} 10738* References:: Related Readings 10739@end menu 10740 10741@node History, References, Conclusion, Conclusion 10742@section History of GNU @code{gettext} 10743@cindex history of GNU @code{gettext} 10744 10745Internationalization concerns and algorithms have been informally 10746and casually discussed for years in GNU, sometimes around GNU 10747@code{libc}, maybe around the incoming @code{Hurd}, or otherwise 10748(nobody clearly remembers). And even then, when the work started for 10749real, this was somewhat independently of these previous discussions. 10750 10751This all began in July 1994, when Patrick D'Cruze had the idea and 10752initiative of internationalizing version 3.9.2 of GNU @code{fileutils}. 10753He then asked Jim Meyering, the maintainer, how to get those changes 10754folded into an official release. That first draft was full of 10755@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find 10756nicer ways. Patrick and Jim shared some tries and experimentations 10757in this area. Then, feeling that this might eventually have a deeper 10758impact on GNU, Jim wanted to know what standards were, and contacted 10759Richard Stallman, who very quickly and verbally described an overall 10760design for what was meant to become @code{glocale}, at that time. 10761 10762Jim implemented @code{glocale} and got a lot of exhausting feedback 10763from Patrick and Richard, of course, but also from Mitchum DSouza 10764(who wrote a @code{catgets}-like package), Roland McGrath, maybe David 10765MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and 10766pulling in various directions, not always compatible, to the extent 10767that after a couple of test releases, @code{glocale} was torn apart. 10768In particular, Paul Eggert -- always keeping an eye on developments 10769in Solaris -- advocated the use of the @code{gettext} API over 10770@code{glocale}'s @code{catgets}-based API. 10771 10772While Jim took some distance and time and became dad for a second 10773time, Roland wanted to get GNU @code{libc} internationalized, and 10774got Ulrich Drepper involved in that project. Instead of starting 10775from @code{glocale}, Ulrich rewrote something from scratch, but 10776more conforming to the set of guidelines who emerged out of the 10777@code{glocale} effort. Then, Ulrich got people from the previous 10778forum to involve themselves into this new project, and the switch 10779from @code{glocale} to what was first named @code{msgutils}, renamed 10780@code{nlsutils}, and later @code{gettext}, became officially accepted 10781by Richard in May 1995 or so. 10782 10783Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext} 10784in April 1995. The first official release of the package, including 10785PO mode, occurred in July 1995, and was numbered 0.7. Other people 10786contributed to the effort by providing a discussion forum around 10787Ulrich, writing little pieces of code, or testing. These are quoted 10788in the @code{THANKS} file which comes with the GNU @code{gettext} 10789distribution. 10790 10791While this was being done, Fran@,{c}ois adapted half a dozen of 10792GNU packages to @code{glocale} first, then later to @code{gettext}, 10793putting them in pretest, so providing along the way an effective 10794user environment for fine tuning the evolving tools. He also took 10795the responsibility of organizing and coordinating the Translation 10796Project. After nearly a year of informal exchanges between people from 10797many countries, translator teams started to exist in May 1995, through 10798the creation and support by Patrick D'Cruze of twenty unmoderated 10799mailing lists for that many native languages, and two moderated 10800lists: one for reaching all teams at once, the other for reaching 10801all willing maintainers of internationalized free software packages. 10802 10803Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration 10804of Greg McGary, as a kind of contribution to Ulrich's package. 10805He also gave a hand with the GNU @code{gettext} Texinfo manual. 10806 10807In 1997, Ulrich Drepper released the GNU libc 2.0, which included the 10808@code{gettext}, @code{textdomain} and @code{bindtextdomain} functions. 10809 10810In 2000, Ulrich Drepper added plural form handling (the @code{ngettext} 10811function) to GNU libc. Later, in 2001, he released GNU libc 2.2.x, 10812which is the first free C library with full internationalization support. 10813 10814Ulrich being quite busy in his role of General Maintainer of GNU libc, 10815he handed over the GNU @code{gettext} maintenance to Bruno Haible in 108162000. Bruno added the plural form handling to the tools as well, added 10817support for UTF-8 and CJK locales, and wrote a few new tools for 10818manipulating PO files. 10819 10820@node References, , History, Conclusion 10821@section Related Readings 10822@cindex related reading 10823@cindex bibliography 10824 10825@strong{ NOTE: } This documentation section is outdated and needs to be 10826revised. 10827 10828Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting 10829bibliography on internationalization matters, called 10830@cite{Internationalization Reference List}, which is available as: 10831@example 10832ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt 10833@end example 10834 10835Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a 10836Frequently Asked Questions (FAQ) list, entitled @cite{Programming for 10837Internationalisation}. This FAQ discusses writing programs which 10838can handle different language conventions, character sets, etc.; 10839and is applicable to all character set encodings, with particular 10840emphasis on @w{ISO 8859-1}. It is regularly published in Usenet 10841groups @file{comp.unix.questions}, @file{comp.std.internat}, 10842@file{comp.software.international}, @file{comp.lang.c}, 10843@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers} 10844and @file{news.answers}. The home location of this document is: 10845@example 10846ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming 10847@end example 10848 10849Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS 10850matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took 10851over the responsibility of maintaining it. It may be found as: 10852@example 10853ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/... 10854 ...locale-tutorial-0.8.txt.gz 10855@end example 10856@noindent 10857This site is mirrored in: 10858@example 10859ftp://ftp.ibp.fr/pub/linux/sunsite/ 10860@end example 10861 10862A French version of the same tutorial should be findable at: 10863@example 10864ftp://ftp.ibp.fr/pub/linux/french/docs/ 10865@end example 10866@noindent 10867together with French translations of many Linux-related documents. 10868 10869@node Language Codes, Country Codes, Conclusion, Top 10870@appendix Language Codes 10871@cindex language codes 10872@cindex ISO 639 10873 10874The @w{ISO 639} standard defines two-letter codes for many languages, and 10875three-letter codes for more rarely used languages. 10876All abbreviations for languages used in the Translation Project should 10877come from this standard. 10878 10879@menu 10880* Usual Language Codes:: Two-letter ISO 639 language codes 10881* Rare Language Codes:: Three-letter ISO 639 language codes 10882@end menu 10883 10884@node Usual Language Codes, Rare Language Codes, Language Codes, Language Codes 10885@appendixsec Usual Language Codes 10886 10887For the commonly used languages, the @w{ISO 639-1} standard defines two-letter 10888codes. 10889 10890@table @samp 10891@include iso-639.texi 10892@end table 10893 10894@node Rare Language Codes, , Usual Language Codes, Language Codes 10895@appendixsec Rare Language Codes 10896 10897For rarely used languages, the @w{ISO 639-2} standard defines three-letter 10898codes. Here is the current list, reduced to only living languages with at least 10899one million of speakers. 10900 10901@table @samp 10902@include iso-639-2.texi 10903@end table 10904 10905@node Country Codes, Licenses, Language Codes, Top 10906@appendix Country Codes 10907@cindex country codes 10908@cindex ISO 3166 10909 10910The @w{ISO 3166} standard defines two character codes for many countries 10911and territories. All abbreviations for countries used in the Translation 10912Project should come from this standard. 10913 10914@table @samp 10915@include iso-3166.texi 10916@end table 10917 10918@node Licenses, Program Index, Country Codes, Top 10919@appendix Licenses 10920@cindex Licenses 10921 10922The files of this package are covered by the licenses indicated in each 10923particular file or directory. Here is a summary: 10924 10925@itemize @bullet 10926@item 10927The @code{libintl} and @code{libasprintf} libraries are covered by the 10928GNU Library General Public License (LGPL). 10929A copy of the license is included in @ref{GNU LGPL}. 10930 10931@item 10932The executable programs of this package and the @code{libgettextpo} library 10933are covered by the GNU General Public License (GPL). 10934A copy of the license is included in @ref{GNU GPL}. 10935 10936@item 10937This manual is free documentation. It is dually licensed under the 10938GNU FDL and the GNU GPL. This means that you can redistribute this 10939manual under either of these two licenses, at your choice. 10940@* 10941This manual is covered by the GNU FDL. Permission is granted to copy, 10942distribute and/or modify this document under the terms of the 10943GNU Free Documentation License (FDL), either version 1.2 of the 10944License, or (at your option) any later version published by the 10945Free Software Foundation (FSF); with no Invariant Sections, with no 10946Front-Cover Text, and with no Back-Cover Texts. 10947A copy of the license is included in @ref{GNU FDL}. 10948@* 10949This manual is covered by the GNU GPL. You can redistribute it and/or 10950modify it under the terms of the GNU General Public License (GPL), either 10951version 2 of the License, or (at your option) any later version published 10952by the Free Software Foundation (FSF). 10953A copy of the license is included in @ref{GNU GPL}. 10954@end itemize 10955 10956@menu 10957* GNU GPL:: GNU General Public License 10958* GNU LGPL:: GNU Lesser General Public License 10959* GNU FDL:: GNU Free Documentation License 10960@end menu 10961 10962@page 10963@include gpl.texi 10964@page 10965@include lgpl.texi 10966@page 10967@include fdl.texi 10968 10969@node Program Index, Option Index, Licenses, Top 10970@unnumbered Program Index 10971 10972@printindex pg 10973 10974@node Option Index, Variable Index, Program Index, Top 10975@unnumbered Option Index 10976 10977@printindex op 10978 10979@node Variable Index, PO Mode Index, Option Index, Top 10980@unnumbered Variable Index 10981 10982@printindex vr 10983 10984@node PO Mode Index, Autoconf Macro Index, Variable Index, Top 10985@unnumbered PO Mode Index 10986 10987@printindex em 10988 10989@node Autoconf Macro Index, Index, PO Mode Index, Top 10990@unnumbered Autoconf Macro Index 10991 10992@printindex am 10993 10994@node Index, , Autoconf Macro Index, Top 10995@unnumbered General Index 10996 10997@printindex cp 10998 10999@iftex 11000@c Table of Contents 11001@contents 11002@end iftex 11003 11004@bye 11005 11006@c Local variables: 11007@c texinfo-column-for-description: 32 11008@c End: 11009