1.\" $NetBSD: nls.7,v 1.9 2003/05/18 09:10:51 wiz Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.Dd May 17, 2003 38.Dt NLS 7 39.Os 40.Sh NAME 41.Nm NLS 42.Nd Native Language Support Overview 43.Sh DESCRIPTION 44Native Language Support (NLS) provides commands for a single 45worldwide operating system base. 46An internationalized system has no built-in assumptions or dependencies 47on language-specific or cultural-specific conventions such as: 48.Pp 49.Bl -bullet -indent -compact 50.It 51Character classifications 52.It 53Character comparison rules 54.It 55Character collation order 56.It 57Numeric and monetary formatting 58.It 59Date and time formatting 60.It 61Message-text language 62.It 63Character sets 64.El 65.Pp 66All information pertaining to cultural conventions and language is 67obtained at program run time. 68.Pp 69.Dq Internationalization 70(often abbreviated 71.Dq i18n ) 72refers to the operation by which system software is developed to support 73multiple cultural-specific and language-specific conventions. 74This is a generalization process by which the system is untied from 75calling only English strings or other English-specific conventions. 76.Dq Localization 77(often abbreviated 78.Dq l10n ) 79refers to the operations by which the user environment is customized to 80handle its input and output appropriate for specific language and cultural 81conventions. 82This is a specialization process, by which generic methods already 83implemented in an internationalized system are used in specific ways. 84The formal description of cultural conventions for some country, together 85with all associated translations targeted to the native language, is 86called the 87.Dq locale . 88.Pp 89.Nx 90provides extensive support to programmers and system developers to 91enable internationalized software to be developed. 92.Nx 93also supplies a large variety of locales for system localization. 94.Ss Localization of Information 95All locale information is accessible to programs at run time so that 96data is processed and displayed correctly for specific cultural 97conventions and language. 98.Pp 99A locale is divided into categories. 100A category is a group of language-specific and culture-specific conventions 101as outlined in the list above. 102ISO C specifies the following six standard categories supported by 103.Nx : 104.Pp 105.Bl -tag -compact -width LC_MONETARYXX 106.It LC_COLLATE 107string-collation order information 108.It LC_CTYPE 109character classification, case conversion, and other character attributes 110.It LC_MESSAGES 111the format for affirmative and negative responses 112.It LC_MONETARY 113rules and symbols for formatting monetary numeric information 114.It LC_NUMERIC 115rules and symbols for formatting nonmonetary numeric information 116.It LC_TIME 117rules and symbols for formatting time and date information 118.El 119.Pp 120Localization of the system is achieved by setting appropriate values 121in environment variables to identify which locale should be used. 122The environment variables have the same names as their respective 123locale categories. 124Additionally, the 125.Ev LANG , 126.Ev LC_ALL , 127and 128.Ev NLSPATH 129environment variables are used. 130The 131.Ev NLSPATH 132environment variable specifies a colon-separated list of directory names 133where the message catalog files of the NLS database are located. 134The 135.Ev LC_ALL 136and 137.Ev LANG 138environment variables also determine the current locale. 139.Pp 140The values of these environment variables contains a string format as: 141.Pp 142.Bd -literal 143 language[_territory][.codeset][@modifier] 144.Ed 145.Pp 146Valid values for the language field come from the ISO639 standard which 147defines two-character codes for many languages. 148Some common language codes are: 149.Pp 150.nf 151.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 152\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 154.sp 5p 155ABKHAZIAN AB IBERO-CAUCASIAN 156AFAN (OROMO) OM HAMITIC 157AFAR AA HAMITIC 158AFRIKAANS AF GERMANIC 159ALBANIAN SQ INDO-EUROPEAN (OTHER) 160AMHARIC AM SEMITIC 161ARABIC AR SEMITIC 162ARMENIAN HY INDO-EUROPEAN (OTHER) 163ASSAMESE AS INDIAN 164AYMARA AY AMERINDIAN 165AZERBAIJANI AZ TURKIC/ALTAIC 166BASHKIR BA TURKIC/ALTAIC 167BASQUE EU BASQUE 168BENGALI BN INDIAN 169BHUTANI DZ ASIAN 170BIHARI BH INDIAN 171BISLAMA BI 172BRETON BR CELTIC 173BULGARIAN BG SLAVIC 174BURMESE MY ASIAN 175BYELORUSSIAN BE SLAVIC 176CAMBODIAN KM ASIAN 177CATALAN CA ROMANCE 178CHINESE ZH ASIAN 179CORSICAN CO ROMANCE 180CROATIAN HR SLAVIC 181CZECH CS SLAVIC 182DANISH DA GERMANIC 183DUTCH NL GERMANIC 184ENGLISH EN GERMANIC 185ESPERANTO EO INTERNATIONAL AUX. 186ESTONIAN ET FINNO-UGRIC 187FAROESE FO GERMANIC 188FIJI FJ OCEANIC/INDONESIAN 189FINNISH FI FINNO-UGRIC 190FRENCH FR ROMANCE 191FRISIAN FY GERMANIC 192GALICIAN GL ROMANCE 193GEORGIAN KA IBERO-CAUCASIAN 194GERMAN DE GERMANIC 195GREEK EL LATIN/GREEK 196GREENLANDIC KL ESKIMO 197GUARANI GN AMERINDIAN 198GUJARATI GU INDIAN 199HAUSA HA NEGRO-AFRICAN 200HEBREW HE SEMITIC 201HINDI HI INDIAN 202HUNGARIAN HU FINNO-UGRIC 203ICELANDIC IS GERMANIC 204INDONESIAN ID OCEANIC/INDONESIAN 205INTERLINGUA IA INTERNATIONAL AUX. 206INTERLINGUE IE INTERNATIONAL AUX. 207INUKTITUT IU 208INUPIAK IK ESKIMO 209IRISH GA CELTIC 210ITALIAN IT ROMANCE 211JAPANESE JA ASIAN 212JAVANESE JV OCEANIC/INDONESIAN 213KANNADA KN DRAVIDIAN 214KASHMIRI KS INDIAN 215KAZAKH KK TURKIC/ALTAIC 216KINYARWANDA RW NEGRO-AFRICAN 217KIRGHIZ KY TURKIC/ALTAIC 218KURUNDI RN NEGRO-AFRICAN 219KOREAN KO ASIAN 220KURDISH KU IRANIAN 221LAOTHIAN LO ASIAN 222LATIN LA LATIN/GREEK 223LATVIAN LV BALTIC 224LINGALA LN NEGRO-AFRICAN 225LITHUANIAN LT BALTIC 226MACEDONIAN MK SLAVIC 227MALAGASY MG OCEANIC/INDONESIAN 228MALAY MS OCEANIC/INDONESIAN 229MALAYALAM ML DRAVIDIAN 230MALTESE MT SEMITIC 231MAORI MI OCEANIC/INDONESIAN 232MARATHI MR INDIAN 233MOLDAVIAN MO ROMANCE 234MONGOLIAN MN 235NAURU NA 236NEPALI NE INDIAN 237NORWEGIAN NO GERMANIC 238OCCITAN OC ROMANCE 239ORIYA OR INDIAN 240PASHTO PS IRANIAN 241PERSIAN (farsi) FA IRANIAN 242POLISH PL SLAVIC 243PORTUGUESE PT ROMANCE 244PUNJABI PA INDIAN 245QUECHUA QU AMERINDIAN 246RHAETO-ROMANCE RM ROMANCE 247ROMANIAN RO ROMANCE 248RUSSIAN RU SLAVIC 249SAMOAN SM OCEANIC/INDONESIAN 250SANGHO SG NEGRO-AFRICAN 251SANSKRIT SA INDIAN 252SCOTS GAELIC GD CELTIC 253SERBIAN SR SLAVIC 254SERBO-CROATIAN SH SLAVIC 255SESOTHO ST NEGRO-AFRICAN 256SETSWANA TN NEGRO-AFRICAN 257SHONA SN NEGRO-AFRICAN 258SINDHI SD INDIAN 259SINGHALESE SI INDIAN 260SISWATI SS NEGRO-AFRICAN 261SLOVAK SK SLAVIC 262SLOVENIAN SL SLAVIC 263SOMALI SO HAMITIC 264SPANISH ES ROMANCE 265SUNDANESE SU OCEANIC/INDONESIAN 266SWAHILI SW NEGRO-AFRICAN 267SWEDISH SV GERMANIC 268TAGALOG TL OCEANIC/INDONESIAN 269TAJIK TG IRANIAN 270TAMIL TA DRAVIDIAN 271TATAR TT TURKIC/ALTAIC 272TELUGU TE DRAVIDIAN 273THAI TH ASIAN 274TIBETAN BO ASIAN 275TIGRINYA TI SEMITIC 276TONGA TO OCEANIC/INDONESIAN 277TSONGA TS NEGRO-AFRICAN 278TURKISH TR TURKIC/ALTAIC 279TURKMEN TK TURKIC/ALTAIC 280TWI TW NEGRO-AFRICAN 281UIGUR UG 282UKRAINIAN UK SLAVIC 283URDU UR INDIAN 284UZBEK UZ TURKIC/ALTAIC 285VIETNAMESE VI ASIAN 286VOLAPUK VO INTERNATIONAL AUX. 287WELSH CY CELTIC 288WOLOF WO NEGRO-AFRICAN 289XHOSA XH NEGRO-AFRICAN 290YIDDISH YI GERMANIC 291YORUBA YO NEGRO-AFRICAN 292ZHUANG ZA 293ZULU ZU NEGRO-AFRICAN 294.ta.fi 295.Pp 296For example, the locale for the Danish language spoken in Denmark 297using the ISO8859-1 character set is da_DK.ISO8859-1. 298The da stands for the Danish language and the DK stands for Denmark. 299The short form of da_DK is sufficient to indicate this locale. 300.Pp 301The environment variable settings are queried by their priority level 302in the following manner: 303.Pp 304.Bl -bullet 305.It 306If the 307.Ev LC_ALL 308environment variable is set, all six categories use the locale it 309specifies. 310.It 311If the 312.Ev LC_ALL 313environment variable is not set, each individual category uses the 314locale specified by its corresponding environment variable. 315.It 316If the 317.Ev LC_ALL 318environment variable is not set, and a value for a particular 319.Ev LC_* 320environment variable is not set, the value of the 321.Ev LANG 322environment variable specifies the default locale for all categories. 323Only the 324.Ev LANG 325environment variable should be set in /etc/profile, since it makes it 326most easy for the user to override the system default using the individual 327.Ev LC_* 328variables. 329.It 330If the 331.Ev LC_ALL 332environment variable is not set, a value for a particular 333.Ev LC_* 334environment variable is not set, and the value of the 335.Ev LANG 336environment variable is not set, the locale for that specific 337category defaults to the C locale. 338The C or POSIX locale assumes the 7-bit ASCII character set and defines 339information for the six categories. 340.El 341.Ss Character Sets 342A character is any symbol used for the organization, control, or 343representation of data. 344A group of such symbols used to describe a 345particular language make up a character set. 346It is the encoding values in a character set that provide 347the interface between the system and its input and output devices. 348.Pp 349The following character sets are supported in 350.Nx 351.Bl -tag -width ISO8859_family 352.It ISO8859 family 353Industry-standard character sets are provided by means of the ISO8859 354family of character sets, which provide a range of single-byte character set 355support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew, 356Greek, and Turkish. 357The eucJP character set is the industry-standard character set used to support 358the Japanese locale. 359.It Unicode 360A Unicode environment based on the UTF-8 character set is supported for all 361supported language/territories. 362UTF-8 provides character support for most of the major languages of the 363world and can be used in environments where multiple languages must be 364processed simultaneously. 365.El 366.Ss Font Sets 367A font set contains the glyphs to be displayed on the screen for a 368corresponding character in a character set. 369A display must support a suitable font to display a character set. 370If suitable fonts are available to the X server, then X clients can 371include support for different character sets. 372.Xr xterm 1 373includes support for UTF-8 character sets. 374.Xr xfd 1 375is useful for displaying all the characters in an X font. 376.Pp 377The 378.Nx 379.Xr wscons 4 380console provides support for loading fonts using the 381.Xr wsfontload 8 382utility. 383Currently, only fonts for the ISO8859-1 family of character sets are 384supported. 385.Ss Internationalization for Programmers 386To facilitate translations of messages into various languages and to 387make the translated messages available to the program based on a 388user's locale, it is necessary to keep messages separate from the 389programs and provide them in the form of message catalogs that a 390program can access at run time. 391.Pp 392Access to locale information is provided through the 393.Xr setlocale 3 394and 395.Xr nl_langinfo 3 396interfaces. 397See their respective man pages for further information. 398.Pp 399Message source files containing application messages are created by 400the programmer and converted to message catalogs. 401These catalogs are used by the application to retrieve and display 402messages, as needed. 403.Pp 404.Nx 405supports two message catalog interfaces: the X/Open 406.Xr catgets 3 407interface and the Uniforum 408.Xr gettext 3 409interface. 410The 411.Xr catgets 3 412interface has the advantage that it belongs to a standard which is 413well supported. 414Unfortunately the interface is complicated to use and 415maintenance of the catalogs is difficult. 416The implementation also doesn't support different character sets. 417The 418.Xr gettext 3 419interface has not been standardized yet, however it is being supported 420by an increasing number of systems. 421It also provides many additional tools which make programming and 422catalog maintenance much easier. 423.Ss Support for Multibyte Characters and Wide Characters 424Character sets with multibyte characters may be difficult to decode, or may 425contain state (i.e., adjacent characters are dependent). 426ISO C specifies a set of functions using 'wide characters' which can handle 427multibyte characters properly. 428A wide character is specified in ISO C 429as being a fixed number of bits wide and is stateless. 430.Pp 431There are two types for wide characters: 432.Em wchar_t 433and 434.Em wint_t . 435.Em wchar_t 436is a type which can contain one wide character and operates like 437'char' type does for one character. 438.Em wint_t 439can contain one wide character or WEOF (wide EOF). 440.Pp 441There are functions that operate on 442.Em wchar_t , 443and substitute for functions operating on 'char'. 444See 445.Xr wmemchr 3 446and 447.Xr towlower 3 448for details. 449There are some additional functions that operate on 450.Em wchar_t . 451See 452.Xr wctype 3 453and 454.Xr wctran 3 455for details. 456.Pp 457Wide characters should be used for all I/O processing which may rely 458on locale-specific strings. 459The two primary issues requiring special use of wide characters are: 460.Bl -bullet -indent 461.It 462All I/O is performed using multibyte characters. 463Input data is converted into wide characters immediately after 464reading and data for output is converted from wide characters to 465multibyte characters immediately before writing. 466Conversion is achieved using 467.Xr mbstowcs 3 , 468.Xr mbsrtowcs 3 , 469.Xr wcstombs 3 , 470.Xr wcsrtombs 3 , 471.Xr mblen 3 , 472.Xr mbrlen 3 , 473and 474.Xr mbsinit 3 . 475.It 476Wide characters are used directly for I/O, using 477.Xr getwchar 3 , 478.Xr fgetwc 3 , 479.Xr getwc 3 , 480.Xr ungetwc 3 , 481.Xr fgetws 3 , 482.Xr putwchar 3 , 483.Xr fputwc 3 , 484.Xr putwc 3 , 485and 486.Xr fputws 3 . 487They are also used for formatted I/O functions for wide characters 488such as 489.Xr fwscanf 3 , 490.Xr wscanf 3 , 491.Xr swscanf 3 , 492.Xr fwprintf 3 , 493.Xr wprintf 3 , 494.Xr swprintf 3 , 495.Xr vfwprintf 3 , 496.Xr vwprintf 3 , 497and 498.Xr vswprintf 3 , 499and wide character identifier of %lc, %C, %ls, %S for conventional 500formatted I/O functions. 501.El 502.Sh SEE ALSO 503.Xr gencat 1 , 504.Xr xfd 1 , 505.Xr xterm 1 , 506.Xr catgets 3 , 507.Xr gettext 3 , 508.Xr nl_langinfo 3 , 509.Xr setlocale 3 , 510.Xr wsfontload 8 511.Sh BUGS 512This man page is incomplete. 513