gettext-tools/doc/gettext_4.html

*946379e7Schristos<HTML>
*946379e7Schristos<HEAD>
*946379e7Schristos<!-- This HTML file has been created by texi2html 1.52b
*946379e7Schristos     from gettext.texi on 27 November 2006 -->
*946379e7Schristos
*946379e7Schristos<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
*946379e7Schristos<TITLE>GNU gettext utilities - 4  Preparing Program Sources</TITLE>
*946379e7Schristos</HEAD>
*946379e7Schristos<BODY>
*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
*946379e7Schristos<P><HR><P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H1><A NAME="SEC11" HREF="gettext_toc.html#TOC11">4  Preparing Program Sources</A></H1>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX102"></A>
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosFor the programmer, changes to the C source code fall into three
*946379e7Schristoscategories.  First, you have to make the localization functions
*946379e7Schristosknown to all modules needing message translation.  Second, you should
*946379e7Schristosproperly trigger the operation of GNU <CODE>gettext</CODE> when the program
*946379e7Schristosinitializes, usually from the <CODE>main</CODE> function.  Last, you should
*946379e7Schristosidentify, adjust and mark all constant strings in your program
*946379e7Schristosneeding translation.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC12" HREF="gettext_toc.html#TOC12">4.1  Importing the <CODE>gettext</CODE> declaration</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosPresuming that your set of programs, or package, has been adjusted
*946379e7Schristosso all needed GNU <CODE>gettext</CODE> files are available, and your
*946379e7Schristos<TT>&lsquo;Makefile&rsquo;</TT> files are adjusted (see section <A HREF="gettext_13.html#SEC196">13  The Maintainer's View</A>), each C module
*946379e7Schristoshaving translated C strings should contain the line:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX103"></A>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#include &#60;libintl.h&#62;
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosSimilarly, each C module containing <CODE>printf()</CODE>/<CODE>fprintf()</CODE>/...
*946379e7Schristoscalls with a format string that could be a translated C string (even if
*946379e7Schristosthe C string comes from a different C module) should contain the line:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#include &#60;libintl.h&#62;
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC13" HREF="gettext_toc.html#TOC13">4.2  Triggering <CODE>gettext</CODE> Operations</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX104"></A>
*946379e7SchristosThe initialization of locale data should be done with more or less
*946379e7Schristosthe same code in every program, as demonstrated below:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosint
*946379e7Schristosmain (int argc, char *argv[])
*946379e7Schristos{
*946379e7Schristos  ...
*946379e7Schristos  setlocale (LC_ALL, "");
*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
*946379e7Schristos  textdomain (PACKAGE);
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<VAR>PACKAGE</VAR> and <VAR>LOCALEDIR</VAR> should be provided either by
*946379e7Schristos<TT>&lsquo;config.h&rsquo;</TT> or by the Makefile.  For now consult the <CODE>gettext</CODE>
*946379e7Schristosor <CODE>hello</CODE> sources for more information.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX105"></A>
*946379e7Schristos<A NAME="IDX106"></A>
*946379e7SchristosThe use of <CODE>LC_ALL</CODE> might not be appropriate for you.
*946379e7Schristos<CODE>LC_ALL</CODE> includes all locale categories and especially
*946379e7Schristos<CODE>LC_CTYPE</CODE>.  This later category is responsible for determining
*946379e7Schristoscharacter classes with the <CODE>isalnum</CODE> etc. functions from
*946379e7Schristos<TT>&lsquo;ctype.h&rsquo;</TT> which could especially for programs, which process some
*946379e7Schristoskind of input language, be wrong.  For example this would mean that a
*946379e7Schristossource code using the &ccedil; (c-cedilla character) is runnable in
*946379e7SchristosFrance but not in the U.S.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosSome systems also have problems with parsing numbers using the
*946379e7Schristos<CODE>scanf</CODE> functions if an other but the <CODE>LC_ALL</CODE> locale is used.
*946379e7SchristosThe standards say that additional formats but the one known in the
*946379e7Schristos<CODE>"C"</CODE> locale might be recognized.  But some systems seem to reject
*946379e7Schristosnumbers in the <CODE>"C"</CODE> locale format.  In some situation, it might
*946379e7Schristosalso be a problem with the notation itself which makes it impossible to
*946379e7Schristosrecognize whether the number is in the <CODE>"C"</CODE> locale or the local
*946379e7Schristosformat.  This can happen if thousands separator characters are used.
*946379e7SchristosSome locales define this character according to the national
*946379e7Schristosconventions to <CODE>'.'</CODE> which is the same character used in the
*946379e7Schristos<CODE>"C"</CODE> locale to denote the decimal point.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosSo it is sometimes necessary to replace the <CODE>LC_ALL</CODE> line in the
*946379e7Schristoscode above by a sequence of <CODE>setlocale</CODE> lines
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos{
*946379e7Schristos  ...
*946379e7Schristos  setlocale (LC_CTYPE, "");
*946379e7Schristos  setlocale (LC_MESSAGES, "");
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX107"></A>
*946379e7Schristos<A NAME="IDX108"></A>
*946379e7Schristos<A NAME="IDX109"></A>
*946379e7Schristos<A NAME="IDX110"></A>
*946379e7Schristos<A NAME="IDX111"></A>
*946379e7Schristos<A NAME="IDX112"></A>
*946379e7Schristos<A NAME="IDX113"></A>
*946379e7SchristosOn all POSIX conformant systems the locale categories <CODE>LC_CTYPE</CODE>,
*946379e7Schristos<CODE>LC_MESSAGES</CODE>, <CODE>LC_COLLATE</CODE>, <CODE>LC_MONETARY</CODE>,
*946379e7Schristos<CODE>LC_NUMERIC</CODE>, and <CODE>LC_TIME</CODE> are available.  On some systems
*946379e7Schristoswhich are only ISO C compliant, <CODE>LC_MESSAGES</CODE> is missing, but
*946379e7Schristosa substitute for it is defined in GNU gettext's <CODE>&#60;libintl.h&#62;</CODE>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosNote that changing the <CODE>LC_CTYPE</CODE> also affects the functions
*946379e7Schristosdeclared in the <CODE>&#60;ctype.h&#62;</CODE> standard header.  If this is not
*946379e7Schristosdesirable in your application (for example in a compiler's parser),
*946379e7Schristosyou can use a set of substitute functions which hardwire the C locale,
*946379e7Schristossuch as found in the <CODE>&#60;c-ctype.h&#62;</CODE> and <CODE>&#60;c-ctype.c&#62;</CODE> files
*946379e7Schristosin the gettext source distribution.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosIt is also possible to switch the locale forth and back between the
*946379e7Schristosenvironment dependent locale and the C locale, but this approach is
*946379e7Schristosnormally avoided because a <CODE>setlocale</CODE> call is expensive,
*946379e7Schristosbecause it is tedious to determine the places where a locale switch
*946379e7Schristosis needed in a large program's source, and because switching a locale
*946379e7Schristosis not multithread-safe.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC14" HREF="gettext_toc.html#TOC14">4.3  Preparing Translatable Strings</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX114"></A>
*946379e7SchristosBefore strings can be marked for translations, they sometimes need to
*946379e7Schristosbe adjusted.  Usually preparing a string for translation is done right
*946379e7Schristosbefore marking it, during the marking phase which is described in the
*946379e7Schristosnext sections.  What you have to keep in mind while doing that is the
*946379e7Schristosfollowing.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<UL>
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosDecent English style.
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosEntire sentences.
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosSplit at paragraphs.
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosUse format strings instead of string concatenation.
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosAvoid unusual markup and unusual control characters.
*946379e7Schristos</UL>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosLet's look at some examples of these guidelines.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX115"></A>
*946379e7SchristosTranslatable strings should be in good English style.  If slang language
*946379e7Schristoswith abbreviations and shortcuts is used, often translators will not
*946379e7Schristosunderstand the message and will produce very inappropriate translations.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos"%s: is parameter\n"
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThis is nearly untranslatable: Is the displayed item <EM>a</EM> parameter or
*946379e7Schristos<EM>the</EM> parameter?
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos"No match"
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThe ambiguity in this message makes it unintelligible: Is the program
*946379e7Schristosattempting to set something on fire? Does it mean "The given object does
*946379e7Schristosnot match the template"? Does it mean "The template does not fit for any
*946379e7Schristosof the objects"?
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX116"></A>
*946379e7SchristosIn both cases, adding more words to the message will help both the
*946379e7Schristostranslator and the English speaking user.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX117"></A>
*946379e7SchristosTranslatable strings should be entire sentences.  It is often not possible
*946379e7Schristosto translate single verbs or adjectives in a substitutable way.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf ("File %s is %s protected", filename, rw ? "write" : "read");
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosMost translators will not look at the source and will thus only see the
*946379e7Schristosstring <CODE>"File %s is %s protected"</CODE>, which is unintelligible.  Change
*946379e7Schristosthis to
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf (rw ? "File %s is write protected" : "File %s is read protected",
*946379e7Schristos        filename);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThis way the translator will not only understand the message, she will
*946379e7Schristosalso be able to find the appropriate grammatical construction.  A French
*946379e7Schristostranslator for example translates "write protected" like "protected
*946379e7Schristosagainst writing".
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosEntire sentences are also important because in many languages, the
*946379e7Schristosdeclination of some word in a sentence depends on the gender or the
*946379e7Schristosnumber (singular/plural) of another part of the sentence.  There are
*946379e7Schristosusually more interdependencies between words than in English.  The
*946379e7Schristosconsequence is that asking a translator to translate two half-sentences
*946379e7Schristosand then combining these two half-sentences through dumb string concatenation
*946379e7Schristoswill not work, for many languages, even though it would work for English.
*946379e7SchristosThat's why translators need to handle entire sentences.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosOften sentences don't fit into a single line.  If a sentence is output
*946379e7Schristosusing two subsequent <CODE>printf</CODE> statements, like this
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf ("Locale charset \"%s\" is different from\n", lcharset);
*946379e7Schristosprintf ("input file charset \"%s\".\n", fcharset);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristosthe translator would have to translate two half sentences, but nothing
*946379e7Schristosin the POT file would tell her that the two half sentences belong together.
*946379e7SchristosIt is necessary to merge the two <CODE>printf</CODE> statements so that the
*946379e7Schristostranslator can handle the entire sentence at once and decide at which
*946379e7Schristosplace to insert a line break in the translation (if at all):
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf ("Locale charset \"%s\" is different from\n\
*946379e7Schristosinput file charset \"%s\".\n", lcharset, fcharset);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosYou may now ask: how about two or more adjacent sentences? Like in this case:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosputs ("Apollo 13 scenario: Stack overflow handling failed.");
*946379e7Schristosputs ("On the next stack overflow we will crash!!!");
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosShould these two statements merged into a single one? I would recommend to
*946379e7Schristosmerge them if the two sentences are related to each other, because then it
*946379e7Schristosmakes it easier for the translator to understand and translate both.  On
*946379e7Schristosthe other hand, if one of the two messages is a stereotypic one, occurring
*946379e7Schristosin other places as well, you will do a favour to the translator by not
*946379e7Schristosmerging the two.  (Identical messages occurring in several places are
*946379e7Schristoscombined by xgettext, so the translator has to handle them once only.)
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX118"></A>
*946379e7SchristosTranslatable strings should be limited to one paragraph; don't let a
*946379e7Schristossingle message be longer than ten lines.  The reason is that when the
*946379e7Schristostranslatable string changes, the translator is faced with the task of
*946379e7Schristosupdating the entire translated string.  Maybe only a single word will
*946379e7Schristoshave changed in the English string, but the translator doesn't see that
*946379e7Schristos(with the current translation tools), therefore she has to proofread
*946379e7Schristosthe entire message.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX119"></A>
*946379e7SchristosMany GNU programs have a <SAMP>&lsquo;--help&rsquo;</SAMP> output that extends over several
*946379e7Schristosscreen pages.  It is a courtesy towards the translators to split such a
*946379e7Schristosmessage into several ones of five to ten lines each.  While doing that,
*946379e7Schristosyou can also attempt to split the documented options into groups,
*946379e7Schristossuch as the input options, the output options, and the informative
*946379e7Schristosoutput options.  This will help every user to find the option he is
*946379e7Schristoslooking for.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX120"></A>
*946379e7Schristos<A NAME="IDX121"></A>
*946379e7SchristosHardcoded string concatenation is sometimes used to construct English
*946379e7Schristosstrings:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosstrcpy (s, "Replace ");
*946379e7Schristosstrcat (s, object1);
*946379e7Schristosstrcat (s, " with ");
*946379e7Schristosstrcat (s, object2);
*946379e7Schristosstrcat (s, "?");
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosIn order to present to the translator only entire sentences, and also
*946379e7Schristosbecause in some languages the translator might want to swap the order
*946379e7Schristosof <CODE>object1</CODE> and <CODE>object2</CODE>, it is necessary to change this
*946379e7Schristosto use a format string:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristossprintf (s, "Replace %s with %s?", object1, object2);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX122"></A>
*946379e7SchristosA similar case is compile time concatenation of strings.  The ISO C 99
*946379e7Schristosinclude file <CODE>&#60;inttypes.h&#62;</CODE> contains a macro <CODE>PRId64</CODE> that
*946379e7Schristoscan be used as a formatting directive for outputting an <SAMP>&lsquo;int64_t&rsquo;</SAMP>
*946379e7Schristosinteger through <CODE>printf</CODE>.  It expands to a constant string, usually
*946379e7Schristos"d" or "ld" or "lld" or something like this, depending on the platform.
*946379e7SchristosAssume you have code like
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf ("The amount is %0" PRId64 "\n", number);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThe <CODE>gettext</CODE> tools and library have special support for these
*946379e7Schristos<CODE>&#60;inttypes.h&#62;</CODE> macros.  You can therefore simply write
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf (gettext ("The amount is %0" PRId64 "\n"), number);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThe PO file will contain the string "The amount is %0&#60;PRId64&#62;\n".
*946379e7SchristosThe translators will provide a translation containing "%0&#60;PRId64&#62;"
*946379e7Schristosas well, and at runtime the <CODE>gettext</CODE> function's result will
*946379e7Schristoscontain the appropriate constant string, "d" or "ld" or "lld".
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThis works only for the predefined <CODE>&#60;inttypes.h&#62;</CODE> macros.  If
*946379e7Schristosyou have defined your own similar macros, let's say <SAMP>&lsquo;MYPRId64&rsquo;</SAMP>,
*946379e7Schristosthat are not known to <CODE>xgettext</CODE>, the solution for this problem
*946379e7Schristosis to change the code like this:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristoschar buf1[100];
*946379e7Schristossprintf (buf1, "%0" MYPRId64, number);
*946379e7Schristosprintf (gettext ("The amount is %s\n"), buf1);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThis means, you put the platform dependent code in one statement, and the
*946379e7Schristosinternationalization code in a different statement.  Note that a buffer length
*946379e7Schristosof 100 is safe, because all available hardware integer types are limited to
*946379e7Schristos128 bits, and to print a 128 bit integer one needs at most 54 characters,
*946379e7Schristosregardless whether in decimal, octal or hexadecimal.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX123"></A>
*946379e7Schristos<A NAME="IDX124"></A>
*946379e7SchristosAll this applies to other programming languages as well.  For example, in
*946379e7SchristosJava and C#, string concatenation is very frequently used, because it is a
*946379e7Schristoscompiler built-in operator.  Like in C, in Java, you would change
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7SchristosSystem.out.println("Replace "+object1+" with "+object2+"?");
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristosinto a statement involving a format string:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7SchristosSystem.out.println(
*946379e7Schristos    MessageFormat.format("Replace {0} with {1}?",
*946379e7Schristos                         new Object[] { object1, object2 }));
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosSimilarly, in C#, you would change
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7SchristosConsole.WriteLine("Replace "+object1+" with "+object2+"?");
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristosinto a statement involving a format string:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7SchristosConsole.WriteLine(
*946379e7Schristos    String.Format("Replace {0} with {1}?", object1, object2));
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX125"></A>
*946379e7Schristos<A NAME="IDX126"></A>
*946379e7SchristosUnusual markup or control characters should not be used in translatable
*946379e7Schristosstrings.  Translators will likely not understand the particular meaning
*946379e7Schristosof the markup or control characters.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosFor example, if you have a convention that <SAMP>&lsquo;|&rsquo;</SAMP> delimits the
*946379e7Schristosleft-hand and right-hand part of some GUI elements, translators will
*946379e7Schristosoften not understand it without specific comments.  It might be
*946379e7Schristosbetter to have the translator translate the left-hand and right-hand
*946379e7Schristospart separately.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosAnother example is the <SAMP>&lsquo;argp&rsquo;</SAMP> convention to use a single <SAMP>&lsquo;\v&rsquo;</SAMP>
*946379e7Schristos(vertical tab) control character to delimit two sections inside a
*946379e7Schristosstring.  This is flawed.  Some translators may convert it to a simple
*946379e7Schristosnewline, some to blank lines.  With some PO file editors it may not be
*946379e7Schristoseasy to even enter a vertical tab control character.  So, you cannot
*946379e7Schristosbe sure that the translation will contain a <SAMP>&lsquo;\v&rsquo;</SAMP> character, at the
*946379e7Schristoscorresponding position.  The solution is, again, to let the translator
*946379e7Schristostranslate two separate strings and combine at run-time the two translated
*946379e7Schristosstrings with the <SAMP>&lsquo;\v&rsquo;</SAMP> required by the convention.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosHTML markup, however, is common enough that it's probably ok to use in
*946379e7Schristostranslatable strings.  But please bear in mind that the GNU gettext tools
*946379e7Schristosdon't verify that the translations are well-formed HTML.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC15" HREF="gettext_toc.html#TOC15">4.4  How Marks Appear in Sources</A></H2>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX127"></A>
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosAll strings requiring translation should be marked in the C sources.  Marking
*946379e7Schristosis done in such a way that each translatable string appears to be
*946379e7Schristosthe sole argument of some function or preprocessor macro.  There are
*946379e7Schristosonly a few such possible functions or macros meant for translation,
*946379e7Schristosand their names are said to be marking keywords.  The marking is
*946379e7Schristosattached to strings themselves, rather than to what we do with them.
*946379e7SchristosThis approach has more uses.  A blatant example is an error message
*946379e7Schristosproduced by formatting.  The format string needs translation, as
*946379e7Schristoswell as some strings inserted through some <SAMP>&lsquo;%s&rsquo;</SAMP> specification
*946379e7Schristosin the format, while the result from <CODE>sprintf</CODE> may have so many
*946379e7Schristosdifferent instances that it is impractical to list them all in some
*946379e7Schristos<SAMP>&lsquo;error_string_out()&rsquo;</SAMP> routine, say.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThis marking operation has two goals.  The first goal of marking
*946379e7Schristosis for triggering the retrieval of the translation, at run time.
*946379e7SchristosThe keyword is possibly resolved into a routine able to dynamically
*946379e7Schristosreturn the proper translation, as far as possible or wanted, for the
*946379e7Schristosargument string.  Most localizable strings are found in executable
*946379e7Schristospositions, that is, attached to variables or given as parameters to
*946379e7Schristosfunctions.  But this is not universal usage, and some translatable
*946379e7Schristosstrings appear in structured initializations.  See section <A HREF="gettext_4.html#SEC18">4.7  Special Cases of Translatable Strings</A>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe second goal of the marking operation is to help <CODE>xgettext</CODE>
*946379e7Schristosat properly extracting all translatable strings when it scans a set
*946379e7Schristosof program sources and produces PO file templates.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe canonical keyword for marking translatable strings is
*946379e7Schristos<SAMP>&lsquo;gettext&rsquo;</SAMP>, it gave its name to the whole GNU <CODE>gettext</CODE>
*946379e7Schristospackage.  For packages making only light use of the <SAMP>&lsquo;gettext&rsquo;</SAMP>
*946379e7Schristoskeyword, macro or function, it is easily used <EM>as is</EM>.  However,
*946379e7Schristosfor packages using the <CODE>gettext</CODE> interface more heavily, it
*946379e7Schristosis usually more convenient to give the main keyword a shorter, less
*946379e7Schristosobtrusive name.  Indeed, the keyword might appear on a lot of strings
*946379e7Schristosall over the package, and programmers usually do not want nor need
*946379e7Schristostheir program sources to remind them forcefully, all the time, that they
*946379e7Schristosare internationalized.  Further, a long keyword has the disadvantage
*946379e7Schristosof using more horizontal space, forcing more indentation work on
*946379e7Schristossources for those trying to keep them within 79 or 80 columns.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX128"></A>
*946379e7SchristosMany packages use <SAMP>&lsquo;_&rsquo;</SAMP> (a simple underline) as a keyword,
*946379e7Schristosand write <SAMP>&lsquo;_("Translatable string")&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext
*946379e7Schristos("Translatable string")&rsquo;</SAMP>.  Further, the coding rule, from GNU standards,
*946379e7Schristoswanting that there is a space between the keyword and the opening
*946379e7Schristosparenthesis is relaxed, in practice, for this particular usage.
*946379e7SchristosSo, the textual overhead per translatable string is reduced to
*946379e7Schristosonly three characters: the underline and the two parentheses.
*946379e7SchristosHowever, even if GNU <CODE>gettext</CODE> uses this convention internally,
*946379e7Schristosit does not offer it officially.  The real, genuine keyword is truly
*946379e7Schristos<SAMP>&lsquo;gettext&rsquo;</SAMP> indeed.  It is fairly easy for those wanting to use
*946379e7Schristos<SAMP>&lsquo;_&rsquo;</SAMP> instead of <SAMP>&lsquo;gettext&rsquo;</SAMP> to declare:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#include &#60;libintl.h&#62;
*946379e7Schristos#define _(String) gettext (String)
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristosinstead of merely using <SAMP>&lsquo;#include &#60;libintl.h&#62;&rsquo;</SAMP>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe marking keywords <SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> take the translatable
*946379e7Schristosstring as sole argument.  It is also possible to define marking functions
*946379e7Schristosthat take it at another argument position.  It is even possible to make
*946379e7Schristosthe marked argument position depend on the total number of arguments of
*946379e7Schristosthe function call; this is useful in C++.  All this is achieved using
*946379e7Schristos<CODE>xgettext</CODE>'s <SAMP>&lsquo;--keyword&rsquo;</SAMP> option.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosNote also that long strings can be split across lines, into multiple
*946379e7Schristosadjacent string tokens.  Automatic string concatenation is performed
*946379e7Schristosat compile time according to ISO C and ISO C++; <CODE>xgettext</CODE> also
*946379e7Schristossupports this syntax.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosLater on, the maintenance is relatively easy.  If, as a programmer,
*946379e7Schristosyou add or modify a string, you will have to ask yourself if the
*946379e7Schristosnew or altered string requires translation, and include it within
*946379e7Schristos<SAMP>&lsquo;_()&rsquo;</SAMP> if you think it should be translated.  For example, <SAMP>&lsquo;"%s"&rsquo;</SAMP>
*946379e7Schristosis an example of string <EM>not</EM> requiring translation.  But
*946379e7Schristos<SAMP>&lsquo;"%s: %d"&rsquo;</SAMP> <EM>does</EM> require translation, because in French, unlike
*946379e7Schristosin English, it's customary to put a space before a colon.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC16" HREF="gettext_toc.html#TOC16">4.5  Marking Translatable Strings</A></H2>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX129"></A>
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosIn PO mode, one set of features is meant more for the programmer than
*946379e7Schristosfor the translator, and allows him to interactively mark which strings,
*946379e7Schristosin a set of program sources, are translatable, and which are not.
*946379e7SchristosEven if it is a fairly easy job for a programmer to find and mark
*946379e7Schristossuch strings by other means, using any editor of his choice, PO mode
*946379e7Schristosmakes this work more comfortable.  Further, this gives translators
*946379e7Schristoswho feel a little like programmers, or programmers who feel a little
*946379e7Schristoslike translators, a tool letting them work at marking translatable
*946379e7Schristosstrings in the program sources, while simultaneously producing a set of
*946379e7Schristostranslation in some language, for the package being internationalized.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX130"></A>
*946379e7SchristosThe set of program sources, targeted by the PO mode commands describe
*946379e7Schristoshere, should have an Emacs tags table constructed for your project,
*946379e7Schristosprior to using these PO file commands.  This is easy to do.  In any
*946379e7Schristosshell window, change the directory to the root of your project, then
*946379e7Schristosexecute a command resembling:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosetags src/*.[hc] lib/*.[hc]
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristospresuming here you want to process all <TT>&lsquo;.h&rsquo;</TT> and <TT>&lsquo;.c&rsquo;</TT> files
*946379e7Schristosfrom the <TT>&lsquo;src/&rsquo;</TT> and <TT>&lsquo;lib/&rsquo;</TT> directories.  This command will
*946379e7Schristosexplore all said files and create a <TT>&lsquo;TAGS&rsquo;</TT> file in your root
*946379e7Schristosdirectory, somewhat summarizing the contents using a special file
*946379e7Schristosformat Emacs can understand.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX131"></A>
*946379e7SchristosFor packages following the GNU coding standards, there is
*946379e7Schristosa make goal <CODE>tags</CODE> or <CODE>TAGS</CODE> which constructs the tag files in
*946379e7Schristosall directories and for all files containing source code.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosOnce your <TT>&lsquo;TAGS&rsquo;</TT> file is ready, the following commands assist
*946379e7Schristosthe programmer at marking translatable strings in his set of sources.
*946379e7SchristosBut these commands are necessarily driven from within a PO file
*946379e7Schristoswindow, and it is likely that you do not even have such a PO file yet.
*946379e7SchristosThis is not a problem at all, as you may safely open a new, empty PO
*946379e7Schristosfile, mainly for using these commands.  This empty PO file will slowly
*946379e7Schristosfill in while you mark strings as translatable in your program sources.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<DL COMPACT>
*946379e7Schristos
*946379e7Schristos<DT><KBD>,</KBD>
*946379e7Schristos<DD>
*946379e7Schristos<A NAME="IDX132"></A>
*946379e7SchristosSearch through program sources for a string which looks like a
*946379e7Schristoscandidate for translation (<CODE>po-tags-search</CODE>).
*946379e7Schristos
*946379e7Schristos<DT><KBD>M-,</KBD>
*946379e7Schristos<DD>
*946379e7Schristos<A NAME="IDX133"></A>
*946379e7SchristosMark the last string found with <SAMP>&lsquo;_()&rsquo;</SAMP> (<CODE>po-mark-translatable</CODE>).
*946379e7Schristos
*946379e7Schristos<DT><KBD>M-.</KBD>
*946379e7Schristos<DD>
*946379e7Schristos<A NAME="IDX134"></A>
*946379e7SchristosMark the last string found with a keyword taken from a set of possible
*946379e7Schristoskeywords.  This command with a prefix allows some management of these
*946379e7Schristoskeywords (<CODE>po-select-mark-and-mark</CODE>).
*946379e7Schristos
*946379e7Schristos</DL>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX135"></A>
*946379e7SchristosThe <KBD>,</KBD> (<CODE>po-tags-search</CODE>) command searches for the next
*946379e7Schristosoccurrence of a string which looks like a possible candidate for
*946379e7Schristostranslation, and displays the program source in another Emacs window,
*946379e7Schristospositioned in such a way that the string is near the top of this other
*946379e7Schristoswindow.  If the string is too big to fit whole in this window, it is
*946379e7Schristospositioned so only its end is shown.  In any case, the cursor
*946379e7Schristosis left in the PO file window.  If the shown string would be better
*946379e7Schristospresented differently in different native languages, you may mark it
*946379e7Schristosusing <KBD>M-,</KBD> or <KBD>M-.</KBD>.  Otherwise, you might rather ignore it
*946379e7Schristosand skip to the next string by merely repeating the <KBD>,</KBD> command.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosA string is a good candidate for translation if it contains a sequence
*946379e7Schristosof three or more letters.  A string containing at most two letters in
*946379e7Schristosa row will be considered as a candidate if it has more letters than
*946379e7Schristosnon-letters.  The command disregards strings containing no letters,
*946379e7Schristosor isolated letters only.  It also disregards strings within comments,
*946379e7Schristosor strings already marked with some keyword PO mode knows (see below).
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosIf you have never told Emacs about some <TT>&lsquo;TAGS&rsquo;</TT> file to use, the
*946379e7Schristoscommand will request that you specify one from the minibuffer, the
*946379e7Schristosfirst time you use the command.  You may later change your <TT>&lsquo;TAGS&rsquo;</TT>
*946379e7Schristosfile by using the regular Emacs command <KBD>M-x visit-tags-table</KBD>,
*946379e7Schristoswhich will ask you to name the precise <TT>&lsquo;TAGS&rsquo;</TT> file you want
*946379e7Schristosto use.  See section ‘Tag Tables’ in <CITE>The Emacs Editor</CITE>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosEach time you use the <KBD>,</KBD> command, the search resumes from where it was
*946379e7Schristosleft by the previous search, and goes through all program sources,
*946379e7Schristosobeying the <TT>&lsquo;TAGS&rsquo;</TT> file, until all sources have been processed.
*946379e7SchristosHowever, by giving a prefix argument to the command (<KBD>C-u
*946379e7Schristos,)</KBD>, you may request that the search be restarted all over again
*946379e7Schristosfrom the first program source; but in this case, strings that you
*946379e7Schristosrecently marked as translatable will be automatically skipped.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosUsing this <KBD>,</KBD> command does not prevent using of other regular
*946379e7SchristosEmacs tags commands.  For example, regular <CODE>tags-search</CODE> or
*946379e7Schristos<CODE>tags-query-replace</CODE> commands may be used without disrupting the
*946379e7Schristosindependent <KBD>,</KBD> search sequence.  However, as implemented, the
*946379e7Schristos<EM>initial</EM> <KBD>,</KBD> command (or the <KBD>,</KBD> command is used with a
*946379e7Schristosprefix) might also reinitialize the regular Emacs tags searching to the
*946379e7Schristosfirst tags file, this reinitialization might be considered spurious.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX136"></A>
*946379e7Schristos<A NAME="IDX137"></A>
*946379e7SchristosThe <KBD>M-,</KBD> (<CODE>po-mark-translatable</CODE>) command will mark the
*946379e7Schristosrecently found string with the <SAMP>&lsquo;_&rsquo;</SAMP> keyword.  The <KBD>M-.</KBD>
*946379e7Schristos(<CODE>po-select-mark-and-mark</CODE>) command will request that you type
*946379e7Schristosone keyword from the minibuffer and use that keyword for marking
*946379e7Schristosthe string.  Both commands will automatically create a new PO file
*946379e7Schristosuntranslated entry for the string being marked, and make it the
*946379e7Schristoscurrent entry (making it easy for you to immediately proceed to its
*946379e7Schristostranslation, if you feel like doing it right away).  It is possible
*946379e7Schristosthat the modifications made to the program source by <KBD>M-,</KBD> or
*946379e7Schristos<KBD>M-.</KBD> render some source line longer than 80 columns, forcing you
*946379e7Schristosto break and re-indent this line differently.  You may use the <KBD>O</KBD>
*946379e7Schristoscommand from PO mode, or any other window changing command from
*946379e7SchristosEmacs, to break out into the program source window, and do any
*946379e7Schristosneeded adjustments.  You will have to use some regular Emacs command
*946379e7Schristosto return the cursor to the PO file window, if you want command
*946379e7Schristos<KBD>,</KBD> for the next string, say.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe <KBD>M-.</KBD> command has a few built-in speedups, so you do not
*946379e7Schristoshave to explicitly type all keywords all the time.  The first such
*946379e7Schristosspeedup is that you are presented with a <EM>preferred</EM> keyword,
*946379e7Schristoswhich you may accept by merely typing <KBD><KBD>RET</KBD></KBD> at the prompt.
*946379e7SchristosThe second speedup is that you may type any non-ambiguous prefix of the
*946379e7Schristoskeyword you really mean, and the command will complete it automatically
*946379e7Schristosfor you.  This also means that PO mode has to <EM>know</EM> all
*946379e7Schristosyour possible keywords, and that it will not accept mistyped keywords.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosIf you reply <KBD>?</KBD> to the keyword request, the command gives a
*946379e7Schristoslist of all known keywords, from which you may choose.  When the
*946379e7Schristoscommand is prefixed by an argument (<KBD>C-u M-.</KBD>), it inhibits
*946379e7Schristosupdating any program source or PO file buffer, and does some simple
*946379e7Schristoskeyword management instead.  In this case, the command asks for a
*946379e7Schristoskeyword, written in full, which becomes a new allowed keyword for
*946379e7Schristoslater <KBD>M-.</KBD> commands.  Moreover, this new keyword automatically
*946379e7Schristosbecomes the <EM>preferred</EM> keyword for later commands.  By typing
*946379e7Schristosan already known keyword in response to <KBD>C-u M-.</KBD>, one merely
*946379e7Schristoschanges the <EM>preferred</EM> keyword and does nothing more.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosAll keywords known for <KBD>M-.</KBD> are recognized by the <KBD>,</KBD> command
*946379e7Schristoswhen scanning for strings, and strings already marked by any of those
*946379e7Schristosknown keywords are automatically skipped.  If many PO files are opened
*946379e7Schristossimultaneously, each one has its own independent set of known keywords.
*946379e7SchristosThere is no provision in PO mode, currently, for deleting a known
*946379e7Schristoskeyword, you have to quit the file (maybe using <KBD>q</KBD>) and reopen
*946379e7Schristosit afresh.  When a PO file is newly brought up in an Emacs window, only
*946379e7Schristos<SAMP>&lsquo;gettext&rsquo;</SAMP> and <SAMP>&lsquo;_&rsquo;</SAMP> are known as keywords, and <SAMP>&lsquo;gettext&rsquo;</SAMP>
*946379e7Schristosis preferred for the <KBD>M-.</KBD> command.  In fact, this is not useful to
*946379e7Schristosprefer <SAMP>&lsquo;_&rsquo;</SAMP>, as this one is already built in the <KBD>M-,</KBD> command.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC17" HREF="gettext_toc.html#TOC17">4.6  Special Comments preceding Keywords</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX138"></A>
*946379e7SchristosIn C programs strings are often used within calls of functions from the
*946379e7Schristos<CODE>printf</CODE> family.  The special thing about these format strings is
*946379e7Schristosthat they can contain format specifiers introduced with <KBD>%</KBD>.  Assume
*946379e7Schristoswe have the code
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosA possible German translation for the above string might be:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos"%d Zeichen lang ist die Zeichenkette `%s'"
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosA C programmer, even if he cannot speak German, will recognize that
*946379e7Schristosthere is something wrong here.  The order of the two format specifiers
*946379e7Schristosis changed but of course the arguments in the <CODE>printf</CODE> don't have.
*946379e7SchristosThis will most probably lead to problems because now the length of the
*946379e7Schristosstring is regarded as the address.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosTo prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
*946379e7Schristostool can check statically whether the arguments in the original and the
*946379e7Schristostranslation string match in type and number.  If this is not the case
*946379e7Schristosand the <SAMP>&lsquo;-c&rsquo;</SAMP> option has been passed to <CODE>msgfmt</CODE>, <CODE>msgfmt</CODE>
*946379e7Schristoswill give an error and refuse to produce a MO file.  Thus consequent
*946379e7Schristosuse of <SAMP>&lsquo;msgfmt -c&rsquo;</SAMP> will catch the error, so that it cannot cause
*946379e7Schristoscause problems at runtime.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosIf the word order in the above German translation would be correct one
*946379e7Schristoswould have to write
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosThe routines in <CODE>msgfmt</CODE> know about this special notation.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosBecause not all strings in a program must be format strings it is not
*946379e7Schristosuseful for <CODE>msgfmt</CODE> to test all the strings in the <TT>&lsquo;.po&rsquo;</TT> file.
*946379e7SchristosThis might cause problems because the string might contain what looks
*946379e7Schristoslike a format specifier, but the string is not used in <CODE>printf</CODE>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosTherefore the <CODE>xgettext</CODE> adds a special tag to those messages it
*946379e7Schristosthinks might be a format string.  There is no absolute rule for this,
*946379e7Schristosonly a heuristic.  In the <TT>&lsquo;.po&rsquo;</TT> file the entry is marked using the
*946379e7Schristos<CODE>c-format</CODE> flag in the <CODE>#,</CODE> comment line (see section <A HREF="gettext_3.html#SEC10">3  The Format of PO Files</A>).
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX139"></A>
*946379e7Schristos<A NAME="IDX140"></A>
*946379e7SchristosThe careful reader now might say that this again can cause problems.
*946379e7SchristosThe heuristic might guess it wrong.  This is true and therefore
*946379e7Schristos<CODE>xgettext</CODE> knows about a special kind of comment which lets
*946379e7Schristosthe programmer take over the decision.  If in the same line as or
*946379e7Schristosthe immediately preceding line to the <CODE>gettext</CODE> keyword
*946379e7Schristosthe <CODE>xgettext</CODE> program finds a comment containing the words
*946379e7Schristos<CODE>xgettext:c-format</CODE>, it will mark the string in any case with
*946379e7Schristosthe <CODE>c-format</CODE> flag.  This kind of comment should be used when
*946379e7Schristos<CODE>xgettext</CODE> does not recognize the string as a format string but
*946379e7Schristosit really is one and it should be tested.  Please note that when the
*946379e7Schristoscomment is in the same line as the <CODE>gettext</CODE> keyword, it must be
*946379e7Schristosbefore the string to be translated.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThis situation happens quite often.  The <CODE>printf</CODE> function is often
*946379e7Schristoscalled with strings which do not contain a format specifier.  Of course
*946379e7Schristosone would normally use <CODE>fputs</CODE> but it does happen.  In this case
*946379e7Schristos<CODE>xgettext</CODE> does not recognize this as a format string but what
*946379e7Schristoshappens if the translation introduces a valid format specifier?  The
*946379e7Schristos<CODE>printf</CODE> function will try to access one of the parameters but none
*946379e7Schristosexists because the original code does not pass any parameters.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7Schristos<CODE>xgettext</CODE> of course could make a wrong decision the other way
*946379e7Schristosround, i.e. a string marked as a format string actually is not a format
*946379e7Schristosstring.  In this case the <CODE>msgfmt</CODE> might give too many warnings and
*946379e7Schristoswould prevent translating the <TT>&lsquo;.po&rsquo;</TT> file.  The method to prevent
*946379e7Schristosthis wrong decision is similar to the one used above, only the comment
*946379e7Schristosto use must contain the string <CODE>xgettext:no-c-format</CODE>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosIf a string is marked with <CODE>c-format</CODE> and this is not correct the
*946379e7Schristosuser can find out who is responsible for the decision.  See
*946379e7Schristossection <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A> to see how the <CODE>--debug</CODE> option can be
*946379e7Schristosused for solving this problem.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC18" HREF="gettext_toc.html#TOC18">4.7  Special Cases of Translatable Strings</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7Schristos<A NAME="IDX141"></A>
*946379e7SchristosThe attentive reader might now point out that it is not always possible
*946379e7Schristosto mark translatable string with <CODE>gettext</CODE> or something like this.
*946379e7SchristosConsider the following case:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos{
*946379e7Schristos  static const char *messages[] = {
*946379e7Schristos    "some very meaningful message",
*946379e7Schristos    "and another one"
*946379e7Schristos  };
*946379e7Schristos  const char *string;
*946379e7Schristos  ...
*946379e7Schristos  string
*946379e7Schristos    = index &#62; 1 ? "a default message" : messages[index];
*946379e7Schristos
*946379e7Schristos  fputs (string);
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosWhile it is no problem to mark the string <CODE>"a default message"</CODE> it
*946379e7Schristosis not possible to mark the string initializers for <CODE>messages</CODE>.
*946379e7SchristosWhat is to be done?  We have to fulfill two tasks.  First we have to mark the
*946379e7Schristosstrings so that the <CODE>xgettext</CODE> program (see section <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A>)
*946379e7Schristoscan find them, and second we have to translate the string at runtime
*946379e7Schristosbefore printing them.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe first task can be fulfilled by creating a new keyword, which names a
*946379e7Schristosno-op.  For the second we have to mark all access points to a string
*946379e7Schristosfrom the array.  So one solution can look like this:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#define gettext_noop(String) String
*946379e7Schristos
*946379e7Schristos{
*946379e7Schristos  static const char *messages[] = {
*946379e7Schristos    gettext_noop ("some very meaningful message"),
*946379e7Schristos    gettext_noop ("and another one")
*946379e7Schristos  };
*946379e7Schristos  const char *string;
*946379e7Schristos  ...
*946379e7Schristos  string
*946379e7Schristos    = index &#62; 1 ? gettext ("a default message") : gettext (messages[index]);
*946379e7Schristos
*946379e7Schristos  fputs (string);
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosPlease convince yourself that the string which is written by
*946379e7Schristos<CODE>fputs</CODE> is translated in any case.  How to get <CODE>xgettext</CODE> know
*946379e7Schristosthe additional keyword <CODE>gettext_noop</CODE> is explained in section <A HREF="gettext_5.html#SEC22">5.1  Invoking the <CODE>xgettext</CODE> Program</A>.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe above is of course not the only solution.  You could also come along
*946379e7Schristoswith the following one:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#define gettext_noop(String) String
*946379e7Schristos
*946379e7Schristos{
*946379e7Schristos  static const char *messages[] = {
*946379e7Schristos    gettext_noop ("some very meaningful message",
*946379e7Schristos    gettext_noop ("and another one")
*946379e7Schristos  };
*946379e7Schristos  const char *string;
*946379e7Schristos  ...
*946379e7Schristos  string
*946379e7Schristos    = index &#62; 1 ? gettext_noop ("a default message") : messages[index];
*946379e7Schristos
*946379e7Schristos  fputs (gettext (string));
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosBut this has a drawback.  The programmer has to take care that
*946379e7Schristoshe uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
*946379e7SchristosA use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosOne advantage is that you need not make control flow analysis to make
*946379e7Schristossure the output is really translated in any case.  But this analysis is
*946379e7Schristosgenerally not very difficult.  If it should be in any situation you can
*946379e7Schristosuse this second method in this situation.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC19" HREF="gettext_toc.html#TOC19">4.8  Marking Proper Names for Translation</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosShould names of persons, cities, locations etc. be marked for translation
*946379e7Schristosor not?  People who only know languages that can be written with Latin
*946379e7Schristosletters (English, Spanish, French, German, etc.) are tempted to say “no”,
*946379e7Schristosbecause names usually do not change when transported between these languages.
*946379e7SchristosHowever, in general when translating from one script to another, names
*946379e7Schristosare translated too, usually phonetically or by transliteration.  For
*946379e7Schristosexample, Russian or Greek names are converted to the Latin alphabet when
*946379e7Schristosbeing translated to English, and English or French names are converted
*946379e7Schristosto the Katakana script when being translated to Japanese.  This is
*946379e7Schristosnecessary because the speakers of the target language in general cannot
*946379e7Schristosread the script the name is originally written in.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosAs a programmer, you should therefore make sure that names are marked
*946379e7Schristosfor translation, with a special comment telling the translators that it
*946379e7Schristosis a proper name and how to pronounce it.  Like this:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosprintf (_("Written by %s.\n"),
*946379e7Schristos        /* TRANSLATORS: This is a proper name.  See the gettext
*946379e7Schristos           manual, section Names.  Note this is actually a non-ASCII
*946379e7Schristos           name: The first name is (with Unicode escapes)
*946379e7Schristos           "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
*946379e7Schristos           Pronunciation is like "fraa-swa pee-nar".  */
*946379e7Schristos        _("Francois Pinard"));
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosAs a translator, you should use some care when translating names, because
*946379e7Schristosit is frustrating if people see their names mutilated or distorted.  If
*946379e7Schristosyour language uses the Latin script, all you need to do is to reproduce
*946379e7Schristosthe name as perfectly as you can within the usual character set of your
*946379e7Schristoslanguage.  In this particular case, this means to provide a translation
*946379e7Schristoscontaining the c-cedilla character.  If your language uses a different
*946379e7Schristosscript and the people speaking it don't usually read Latin words, it means
*946379e7Schristostransliteration; but you should still give, in parentheses, the original
*946379e7Schristoswriting of the name -- for the sake of the people that do read the Latin
*946379e7Schristosscript.  Here is an example, using Greek as the target script:
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#. This is a proper name.  See the gettext
*946379e7Schristos#. manual, section Names.  Note this is actually a non-ASCII
*946379e7Schristos#. name: The first name is (with Unicode escapes)
*946379e7Schristos#. "Fran\u00e7ois" or (with HTML entities) "Fran&#38;ccedil;ois".
*946379e7Schristos#. Pronunciation is like "fraa-swa pee-nar".
*946379e7Schristosmsgid "Francois Pinard"
*946379e7Schristosmsgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
*946379e7Schristos       " (Francois Pinard)"
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosBecause translation of names is such a sensitive domain, it is a good
*946379e7Schristosidea to test your translation before submitting it.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosThe translation project <A HREF="http://sourceforge.net/projects/translation">http://sourceforge.net/projects/translation</A>
*946379e7Schristoshas set up a POT file and translation domain consisting of program author
*946379e7Schristosnames, with better facilities for the translator than those presented here.
*946379e7SchristosNamely, there the original name is written directly in Unicode (rather
*946379e7Schristosthan with Unicode escapes or HTML entities), and the pronunciation is
*946379e7Schristosdenoted using the International Phonetic Alphabet (see
*946379e7Schristos<A HREF="http://www.wikipedia.org/wiki/International_Phonetic_Alphabet">http://www.wikipedia.org/wiki/International_Phonetic_Alphabet</A>).
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos<P>
*946379e7SchristosHowever, we don't recommend this approach for all POT files in all packages,
*946379e7Schristosbecause this would force translators to use PO files in UTF-8 encoding,
*946379e7Schristoswhich is - in the current state of software (as of 2003) - a major hassle
*946379e7Schristosfor translators using GNU Emacs or XEmacs with po-mode.
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<H2><A NAME="SEC20" HREF="gettext_toc.html#TOC20">4.9  Preparing Library Sources</A></H2>
*946379e7Schristos
*946379e7Schristos<P>
*946379e7SchristosWhen you are preparing a library, not a program, for the use of
*946379e7Schristos<CODE>gettext</CODE>, only a few details are different.  Here we assume that
*946379e7Schristosthe library has a translation domain and a POT file of its own.  (If
*946379e7Schristosit uses the translation domain and POT file of the main program, then
*946379e7Schristosthe previous sections apply without changes.)
*946379e7Schristos
*946379e7Schristos</P>
*946379e7Schristos
*946379e7Schristos<OL>
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosThe library code doesn't call <CODE>setlocale (LC_ALL, "")</CODE>.  It's the
*946379e7Schristosresponsibility of the main program to set the locale.  The library's
*946379e7Schristosdocumentation should mention this fact, so that developers of programs
*946379e7Schristosusing the library are aware of it.
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosThe library code doesn't call <CODE>textdomain (PACKAGE)</CODE>, because it
*946379e7Schristoswould interfere with the text domain set by the main program.
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosThe initialization code for a program was
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos  setlocale (LC_ALL, "");
*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
*946379e7Schristos  textdomain (PACKAGE);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7SchristosFor a library it is reduced to
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7SchristosIf your library's API doesn't already have an initialization function,
*946379e7Schristosyou need to create one, containing at least the <CODE>bindtextdomain</CODE>
*946379e7Schristosinvocation.  However, you usually don't need to export and document this
*946379e7Schristosinitialization function: It is sufficient that all entry points of the
*946379e7Schristoslibrary call the initialization function if it hasn't been called before.
*946379e7SchristosThe typical idiom used to achieve this is a static boolean variable that
*946379e7Schristosindicates whether the initialization function has been called. Like this:
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristosstatic bool libfoo_initialized;
*946379e7Schristos
*946379e7Schristosstatic void
*946379e7Schristoslibfoo_initialize (void)
*946379e7Schristos{
*946379e7Schristos  bindtextdomain (PACKAGE, LOCALEDIR);
*946379e7Schristos  libfoo_initialized = true;
*946379e7Schristos}
*946379e7Schristos
*946379e7Schristos/* This function is part of the exported API.  */
*946379e7Schristosstruct foo *
*946379e7Schristoscreate_foo (...)
*946379e7Schristos{
*946379e7Schristos  /* Must ensure the initialization is performed.  */
*946379e7Schristos  if (!libfoo_initialized)
*946379e7Schristos    libfoo_initialize ();
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos
*946379e7Schristos/* This function is part of the exported API.  The argument must be
*946379e7Schristos   non-NULL and have been created through create_foo().  */
*946379e7Schristosint
*946379e7Schristosfoo_refcount (struct foo *argument)
*946379e7Schristos{
*946379e7Schristos  /* No need to invoke the initialization function here, because
*946379e7Schristos     create_foo() must already have been called before.  */
*946379e7Schristos  ...
*946379e7Schristos}
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristos<LI>
*946379e7Schristos
*946379e7SchristosThe usual declaration of the <SAMP>&lsquo;_&rsquo;</SAMP> macro in each source file was
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#include &#60;libintl.h&#62;
*946379e7Schristos#define _(String) gettext (String)
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7Schristosfor a program.  For a library, which has its own translation domain,
*946379e7Schristosit reads like this:
*946379e7Schristos
*946379e7Schristos
*946379e7Schristos<PRE>
*946379e7Schristos#include &#60;libintl.h&#62;
*946379e7Schristos#define _(String) dgettext (PACKAGE, String)
*946379e7Schristos</PRE>
*946379e7Schristos
*946379e7SchristosIn other words, <CODE>dgettext</CODE> is used instead of <CODE>gettext</CODE>.
*946379e7SchristosSimilarly, the <CODE>dngettext</CODE> function should be used in place of the
*946379e7Schristos<CODE>ngettext</CODE> function.
*946379e7Schristos</OL>
*946379e7Schristos
*946379e7Schristos<P><HR><P>
*946379e7SchristosGo to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_3.html">previous</A>, <A HREF="gettext_5.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
*946379e7Schristos</BODY>
*946379e7Schristos</HTML>