1.\" $NetBSD: nls.7,v 1.14 2008/04/30 13:10:57 martin Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.Dd February 21, 2007 31.Dt NLS 7 32.Os 33.Sh NAME 34.Nm NLS 35.Nd Native Language Support Overview 36.Sh DESCRIPTION 37Native Language Support (NLS) provides commands for a single 38worldwide operating system base. 39An internationalized system has no built-in assumptions or dependencies 40on language-specific or cultural-specific conventions such as: 41.Pp 42.Bl -bullet -offset indent -compact 43.It 44Character classifications 45.It 46Character comparison rules 47.It 48Character collation order 49.It 50Numeric and monetary formatting 51.It 52Date and time formatting 53.It 54Message-text language 55.It 56Character sets 57.El 58.Pp 59All information pertaining to cultural conventions and language is 60obtained at program run time. 61.Pp 62.Dq Internationalization 63(often abbreviated 64.Dq i18n ) 65refers to the operation by which system software is developed to support 66multiple cultural-specific and language-specific conventions. 67This is a generalization process by which the system is untied from 68calling only English strings or other English-specific conventions. 69.Dq Localization 70(often abbreviated 71.Dq l10n ) 72refers to the operations by which the user environment is customized to 73handle its input and output appropriate for specific language and cultural 74conventions. 75This is a specialization process, by which generic methods already 76implemented in an internationalized system are used in specific ways. 77The formal description of cultural conventions for some country, together 78with all associated translations targeted to the native language, is 79called the 80.Dq locale . 81.Pp 82.Nx 83provides extensive support to programmers and system developers to 84enable internationalized software to be developed. 85.Nx 86also supplies a large variety of locales for system localization. 87.Ss Localization of Information 88All locale information is accessible to programs at run time so that 89data is processed and displayed correctly for specific cultural 90conventions and language. 91.Pp 92A locale is divided into categories. 93A category is a group of language-specific and culture-specific conventions 94as outlined in the list above. 95ISO C specifies the following six standard categories supported by 96.Nx : 97.Pp 98.Bl -tag -compact -width LC_MONETARYXX 99.It Ev LC_COLLATE 100string-collation order information 101.It Ev LC_CTYPE 102character classification, case conversion, and other character attributes 103.It Ev LC_MESSAGES 104the format for affirmative and negative responses 105.It Ev LC_MONETARY 106rules and symbols for formatting monetary numeric information 107.It Ev LC_NUMERIC 108rules and symbols for formatting nonmonetary numeric information 109.It Ev LC_TIME 110rules and symbols for formatting time and date information 111.El 112.Pp 113Localization of the system is achieved by setting appropriate values 114in environment variables to identify which locale should be used. 115The environment variables have the same names as their respective 116locale categories. 117Additionally, the 118.Ev LANG , 119.Ev LC_ALL , 120and 121.Ev NLSPATH 122environment variables are used. 123The 124.Ev NLSPATH 125environment variable specifies a colon-separated list of directory names 126where the message catalog files of the NLS database are located. 127The 128.Ev LC_ALL 129and 130.Ev LANG 131environment variables also determine the current locale. 132.Pp 133The values of these environment variables contains a string format as: 134.Pp 135.Bd -literal 136 language[_territory][.codeset][@modifier] 137.Ed 138.Pp 139Valid values for the language field come from the ISO639 standard which 140defines two-character codes for many languages. 141Some common language codes are: 142.Pp 143.nf 144.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 145\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 146.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 147.sp 5p 148ABKHAZIAN AB IBERO-CAUCASIAN 149AFAN (OROMO) OM HAMITIC 150AFAR AA HAMITIC 151AFRIKAANS AF GERMANIC 152ALBANIAN SQ INDO-EUROPEAN (OTHER) 153AMHARIC AM SEMITIC 154ARABIC AR SEMITIC 155ARMENIAN HY INDO-EUROPEAN (OTHER) 156ASSAMESE AS INDIAN 157AYMARA AY AMERINDIAN 158AZERBAIJANI AZ TURKIC/ALTAIC 159BASHKIR BA TURKIC/ALTAIC 160BASQUE EU BASQUE 161BENGALI BN INDIAN 162BHUTANI DZ ASIAN 163BIHARI BH INDIAN 164BISLAMA BI 165BRETON BR CELTIC 166BULGARIAN BG SLAVIC 167BURMESE MY ASIAN 168BYELORUSSIAN BE SLAVIC 169CAMBODIAN KM ASIAN 170CATALAN CA ROMANCE 171CHINESE ZH ASIAN 172CORSICAN CO ROMANCE 173CROATIAN HR SLAVIC 174CZECH CS SLAVIC 175DANISH DA GERMANIC 176DUTCH NL GERMANIC 177ENGLISH EN GERMANIC 178ESPERANTO EO INTERNATIONAL AUX. 179ESTONIAN ET FINNO-UGRIC 180FAROESE FO GERMANIC 181FIJI FJ OCEANIC/INDONESIAN 182FINNISH FI FINNO-UGRIC 183FRENCH FR ROMANCE 184FRISIAN FY GERMANIC 185GALICIAN GL ROMANCE 186GEORGIAN KA IBERO-CAUCASIAN 187GERMAN DE GERMANIC 188GREEK EL LATIN/GREEK 189GREENLANDIC KL ESKIMO 190GUARANI GN AMERINDIAN 191GUJARATI GU INDIAN 192HAUSA HA NEGRO-AFRICAN 193HEBREW HE SEMITIC 194HINDI HI INDIAN 195HUNGARIAN HU FINNO-UGRIC 196ICELANDIC IS GERMANIC 197INDONESIAN ID OCEANIC/INDONESIAN 198INTERLINGUA IA INTERNATIONAL AUX. 199INTERLINGUE IE INTERNATIONAL AUX. 200INUKTITUT IU 201INUPIAK IK ESKIMO 202IRISH GA CELTIC 203ITALIAN IT ROMANCE 204JAPANESE JA ASIAN 205JAVANESE JV OCEANIC/INDONESIAN 206KANNADA KN DRAVIDIAN 207KASHMIRI KS INDIAN 208KAZAKH KK TURKIC/ALTAIC 209KINYARWANDA RW NEGRO-AFRICAN 210KIRGHIZ KY TURKIC/ALTAIC 211KURUNDI RN NEGRO-AFRICAN 212KOREAN KO ASIAN 213KURDISH KU IRANIAN 214LAOTHIAN LO ASIAN 215LATIN LA LATIN/GREEK 216LATVIAN LV BALTIC 217LINGALA LN NEGRO-AFRICAN 218LITHUANIAN LT BALTIC 219MACEDONIAN MK SLAVIC 220MALAGASY MG OCEANIC/INDONESIAN 221MALAY MS OCEANIC/INDONESIAN 222MALAYALAM ML DRAVIDIAN 223MALTESE MT SEMITIC 224MAORI MI OCEANIC/INDONESIAN 225MARATHI MR INDIAN 226MOLDAVIAN MO ROMANCE 227MONGOLIAN MN 228NAURU NA 229NEPALI NE INDIAN 230NORWEGIAN NO GERMANIC 231OCCITAN OC ROMANCE 232ORIYA OR INDIAN 233PASHTO PS IRANIAN 234PERSIAN (farsi) FA IRANIAN 235POLISH PL SLAVIC 236PORTUGUESE PT ROMANCE 237PUNJABI PA INDIAN 238QUECHUA QU AMERINDIAN 239RHAETO-ROMANCE RM ROMANCE 240ROMANIAN RO ROMANCE 241RUSSIAN RU SLAVIC 242SAMOAN SM OCEANIC/INDONESIAN 243SANGHO SG NEGRO-AFRICAN 244SANSKRIT SA INDIAN 245SCOTS GAELIC GD CELTIC 246SERBIAN SR SLAVIC 247SERBO-CROATIAN SH SLAVIC 248SESOTHO ST NEGRO-AFRICAN 249SETSWANA TN NEGRO-AFRICAN 250SHONA SN NEGRO-AFRICAN 251SINDHI SD INDIAN 252SINGHALESE SI INDIAN 253SISWATI SS NEGRO-AFRICAN 254SLOVAK SK SLAVIC 255SLOVENIAN SL SLAVIC 256SOMALI SO HAMITIC 257SPANISH ES ROMANCE 258SUNDANESE SU OCEANIC/INDONESIAN 259SWAHILI SW NEGRO-AFRICAN 260SWEDISH SV GERMANIC 261TAGALOG TL OCEANIC/INDONESIAN 262TAJIK TG IRANIAN 263TAMIL TA DRAVIDIAN 264TATAR TT TURKIC/ALTAIC 265TELUGU TE DRAVIDIAN 266THAI TH ASIAN 267TIBETAN BO ASIAN 268TIGRINYA TI SEMITIC 269TONGA TO OCEANIC/INDONESIAN 270TSONGA TS NEGRO-AFRICAN 271TURKISH TR TURKIC/ALTAIC 272TURKMEN TK TURKIC/ALTAIC 273TWI TW NEGRO-AFRICAN 274UIGUR UG 275UKRAINIAN UK SLAVIC 276URDU UR INDIAN 277UZBEK UZ TURKIC/ALTAIC 278VIETNAMESE VI ASIAN 279VOLAPUK VO INTERNATIONAL AUX. 280WELSH CY CELTIC 281WOLOF WO NEGRO-AFRICAN 282XHOSA XH NEGRO-AFRICAN 283YIDDISH YI GERMANIC 284YORUBA YO NEGRO-AFRICAN 285ZHUANG ZA 286ZULU ZU NEGRO-AFRICAN 287.ta 288.fi 289.Pp 290For example, the locale for the Danish language spoken in Denmark 291using the ISO 8859-1 character set is da_DK.ISO8859-1. 292The da stands for the Danish language and the DK stands for Denmark. 293The short form of da_DK is sufficient to indicate this locale. 294.Pp 295The environment variable settings are queried by their priority level 296in the following manner: 297.Pp 298.Bl -bullet 299.It 300If the 301.Ev LC_ALL 302environment variable is set, all six categories use the locale it 303specifies. 304.It 305If the 306.Ev LC_ALL 307environment variable is not set, each individual category uses the 308locale specified by its corresponding environment variable. 309.It 310If the 311.Ev LC_ALL 312environment variable is not set, and a value for a particular 313.Ev LC_* 314environment variable is not set, the value of the 315.Ev LANG 316environment variable specifies the default locale for all categories. 317Only the 318.Ev LANG 319environment variable should be set in /etc/profile, since it makes it 320most easy for the user to override the system default using the individual 321.Ev LC_* 322variables. 323.It 324If the 325.Ev LC_ALL 326environment variable is not set, a value for a particular 327.Ev LC_* 328environment variable is not set, and the value of the 329.Ev LANG 330environment variable is not set, the locale for that specific 331category defaults to the C locale. 332The C or POSIX locale assumes the ASCII character set and defines 333information for the six categories. 334.El 335.Ss Character Sets 336A character is any symbol used for the organization, control, or 337representation of data. 338A group of such symbols used to describe a 339particular language make up a character set. 340It is the encoding values in a character set that provide 341the interface between the system and its input and output devices. 342.Pp 343The following character sets are supported in 344.Nx : 345.Bl -tag -width ISO_8859_family 346.It ASCII 347The American Standard Code for Information Exchange (ASCII) standard 348specifies 128 Roman characters and control codes, encoded in a 7-bit 349character encoding scheme. 350.It ISO 8859 family 351Industry-standard character sets specified by the ISO/IEC 8859 352standard. 353The standard is divided into 15 numbered parts, with each 354part specifying broad script similarities. 355Examples include Western European, Central European, Arabic, Cyrillic, 356Hebrew, Greek, and Turkish. 357The character sets use an 8-bit character encoding scheme which is 358compatible with the ASCII character set. 359.It Unicode 360The Unicode character set is the full set of known abstract characters of 361all real-world scripts. It can be used in environments where multiple 362scripts must be processed simultaneously. 363Unicode is compatible with ISO 8859-1 (Western European) and ASCII. 364Many character encoding schemes are available for Unicode, including UTF-8, 365UTF-16 and UTF-32. 366These encoding schemes are multi-byte encodings. 367The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is 368compatible with ASCII. 369The UTF-16 encoding scheme uses 16-bit, variable-width encodings. 370The UTF-32 encoding scheme using 32-bit, fixed-width encodings. 371.El 372.Ss Font Sets 373A font set contains the glyphs to be displayed on the screen for a 374corresponding character in a character set. 375A display must support a suitable font to display a character set. 376If suitable fonts are available to the X server, then X clients can 377include support for different character sets. 378.Xr xterm 1 379includes support for Unicode with UTF-8 encoding. 380.Xr xfd 1 381is useful for displaying all the characters in an X font. 382.Pp 383The 384.Nx 385.Xr wscons 4 386console provides support for loading fonts using the 387.Xr wsfontload 8 388utility. 389Currently, only fonts for the ISO8859-1 family of character sets are 390supported. 391.Ss Internationalization for Programmers 392To facilitate translations of messages into various languages and to 393make the translated messages available to the program based on a 394user's locale, it is necessary to keep messages separate from the 395programs and provide them in the form of message catalogs that a 396program can access at run time. 397.Pp 398Access to locale information is provided through the 399.Xr setlocale 3 400and 401.Xr nl_langinfo 3 402interfaces. 403See their respective man pages for further information. 404.Pp 405Message source files containing application messages are created by 406the programmer and converted to message catalogs. 407These catalogs are used by the application to retrieve and display 408messages, as needed. 409.Pp 410.Nx 411supports two message catalog interfaces: the X/Open 412.Xr catgets 3 413interface and the Uniforum 414.Xr gettext 3 415interface. 416The 417.Xr catgets 3 418interface has the advantage that it belongs to a standard which is 419well supported. 420Unfortunately the interface is complicated to use and 421maintenance of the catalogs is difficult. 422The implementation also doesn't support different character sets. 423The 424.Xr gettext 3 425interface has not been standardized yet, however it is being supported 426by an increasing number of systems. 427It also provides many additional tools which make programming and 428catalog maintenance much easier. 429.Ss Support for Multi-byte Encodings 430Some character sets with multi-byte encodings may be difficult to decode, 431or may contain state (i.e., adjacent characters are dependent). 432ISO C specifies a set of functions using 'wide characters' which can handle 433multi-byte encodings properly. 434The behaviour of these functions is affected 435by the 436.Ev LC_CTYPE 437category of the current locale. 438.Pp 439A wide character is specified in ISO C 440as being a fixed number of bits wide and is stateless. 441There are two types for wide characters: 442.Em wchar_t 443and 444.Em wint_t . 445.Em wchar_t 446is a type which can contain one wide character and operates like 'char' 447type does for one character. 448.Em wint_t 449can contain one wide character or WEOF (wide EOF). 450.Pp 451There are functions that operate on 452.Em wchar_t , 453and substitute for functions operating on 'char'. 454See 455.Xr wmemchr 3 456and 457.Xr towlower 3 458for details. 459There are some additional functions that operate on 460.Em wchar_t . 461See 462.Xr wctype 3 463and 464.Xr wctrans 3 465for details. 466.Pp 467Wide characters should be used for all I/O processing which may rely 468on locale-specific strings. 469The two primary issues requiring special use of wide characters are: 470.Bl -bullet -offset indent 471.It 472All I/O is performed using multibyte characters. 473Input data is converted into wide characters immediately after 474reading and data for output is converted from wide characters to 475multi-byte encoding immediately before writing. 476Conversion is controlled by the 477.Xr mbstowcs 3 , 478.Xr mbsrtowcs 3 , 479.Xr wcstombs 3 , 480.Xr wcsrtombs 3 , 481.Xr mblen 3 , 482.Xr mbrlen 3 , 483and 484.Xr mbsinit 3 . 485.It 486Wide characters are used directly for I/O, using 487.Xr getwchar 3 , 488.Xr fgetwc 3 , 489.Xr getwc 3 , 490.Xr ungetwc 3 , 491.Xr fgetws 3 , 492.Xr putwchar 3 , 493.Xr fputwc 3 , 494.Xr putwc 3 , 495and 496.Xr fputws 3 . 497They are also used for formatted I/O functions for wide characters 498such as 499.Xr fwscanf 3 , 500.Xr wscanf 3 , 501.Xr swscanf 3 , 502.Xr fwprintf 3 , 503.Xr wprintf 3 , 504.Xr swprintf 3 , 505.Xr vfwprintf 3 , 506.Xr vwprintf 3 , 507and 508.Xr vswprintf 3 , 509and wide character identifier of %lc, %C, %ls, %S for conventional 510formatted I/O functions. 511.El 512.Sh SEE ALSO 513.Xr gencat 1 , 514.Xr xfd 1 , 515.Xr xterm 1 , 516.Xr catgets 3 , 517.Xr gettext 3 , 518.Xr nl_langinfo 3 , 519.Xr setlocale 3 , 520.Xr wsfontload 8 521.Sh BUGS 522This man page is incomplete. 523