1.\" $NetBSD: nls.7,v 1.13 2007/03/02 20:28:54 wiz Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.Dd February 21, 2007 38.Dt NLS 7 39.Os 40.Sh NAME 41.Nm NLS 42.Nd Native Language Support Overview 43.Sh DESCRIPTION 44Native Language Support (NLS) provides commands for a single 45worldwide operating system base. 46An internationalized system has no built-in assumptions or dependencies 47on language-specific or cultural-specific conventions such as: 48.Pp 49.Bl -bullet -offset indent -compact 50.It 51Character classifications 52.It 53Character comparison rules 54.It 55Character collation order 56.It 57Numeric and monetary formatting 58.It 59Date and time formatting 60.It 61Message-text language 62.It 63Character sets 64.El 65.Pp 66All information pertaining to cultural conventions and language is 67obtained at program run time. 68.Pp 69.Dq Internationalization 70(often abbreviated 71.Dq i18n ) 72refers to the operation by which system software is developed to support 73multiple cultural-specific and language-specific conventions. 74This is a generalization process by which the system is untied from 75calling only English strings or other English-specific conventions. 76.Dq Localization 77(often abbreviated 78.Dq l10n ) 79refers to the operations by which the user environment is customized to 80handle its input and output appropriate for specific language and cultural 81conventions. 82This is a specialization process, by which generic methods already 83implemented in an internationalized system are used in specific ways. 84The formal description of cultural conventions for some country, together 85with all associated translations targeted to the native language, is 86called the 87.Dq locale . 88.Pp 89.Nx 90provides extensive support to programmers and system developers to 91enable internationalized software to be developed. 92.Nx 93also supplies a large variety of locales for system localization. 94.Ss Localization of Information 95All locale information is accessible to programs at run time so that 96data is processed and displayed correctly for specific cultural 97conventions and language. 98.Pp 99A locale is divided into categories. 100A category is a group of language-specific and culture-specific conventions 101as outlined in the list above. 102ISO C specifies the following six standard categories supported by 103.Nx : 104.Pp 105.Bl -tag -compact -width LC_MONETARYXX 106.It Ev LC_COLLATE 107string-collation order information 108.It Ev LC_CTYPE 109character classification, case conversion, and other character attributes 110.It Ev LC_MESSAGES 111the format for affirmative and negative responses 112.It Ev LC_MONETARY 113rules and symbols for formatting monetary numeric information 114.It Ev LC_NUMERIC 115rules and symbols for formatting nonmonetary numeric information 116.It Ev LC_TIME 117rules and symbols for formatting time and date information 118.El 119.Pp 120Localization of the system is achieved by setting appropriate values 121in environment variables to identify which locale should be used. 122The environment variables have the same names as their respective 123locale categories. 124Additionally, the 125.Ev LANG , 126.Ev LC_ALL , 127and 128.Ev NLSPATH 129environment variables are used. 130The 131.Ev NLSPATH 132environment variable specifies a colon-separated list of directory names 133where the message catalog files of the NLS database are located. 134The 135.Ev LC_ALL 136and 137.Ev LANG 138environment variables also determine the current locale. 139.Pp 140The values of these environment variables contains a string format as: 141.Pp 142.Bd -literal 143 language[_territory][.codeset][@modifier] 144.Ed 145.Pp 146Valid values for the language field come from the ISO639 standard which 147defines two-character codes for many languages. 148Some common language codes are: 149.Pp 150.nf 151.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 152\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 154.sp 5p 155ABKHAZIAN AB IBERO-CAUCASIAN 156AFAN (OROMO) OM HAMITIC 157AFAR AA HAMITIC 158AFRIKAANS AF GERMANIC 159ALBANIAN SQ INDO-EUROPEAN (OTHER) 160AMHARIC AM SEMITIC 161ARABIC AR SEMITIC 162ARMENIAN HY INDO-EUROPEAN (OTHER) 163ASSAMESE AS INDIAN 164AYMARA AY AMERINDIAN 165AZERBAIJANI AZ TURKIC/ALTAIC 166BASHKIR BA TURKIC/ALTAIC 167BASQUE EU BASQUE 168BENGALI BN INDIAN 169BHUTANI DZ ASIAN 170BIHARI BH INDIAN 171BISLAMA BI 172BRETON BR CELTIC 173BULGARIAN BG SLAVIC 174BURMESE MY ASIAN 175BYELORUSSIAN BE SLAVIC 176CAMBODIAN KM ASIAN 177CATALAN CA ROMANCE 178CHINESE ZH ASIAN 179CORSICAN CO ROMANCE 180CROATIAN HR SLAVIC 181CZECH CS SLAVIC 182DANISH DA GERMANIC 183DUTCH NL GERMANIC 184ENGLISH EN GERMANIC 185ESPERANTO EO INTERNATIONAL AUX. 186ESTONIAN ET FINNO-UGRIC 187FAROESE FO GERMANIC 188FIJI FJ OCEANIC/INDONESIAN 189FINNISH FI FINNO-UGRIC 190FRENCH FR ROMANCE 191FRISIAN FY GERMANIC 192GALICIAN GL ROMANCE 193GEORGIAN KA IBERO-CAUCASIAN 194GERMAN DE GERMANIC 195GREEK EL LATIN/GREEK 196GREENLANDIC KL ESKIMO 197GUARANI GN AMERINDIAN 198GUJARATI GU INDIAN 199HAUSA HA NEGRO-AFRICAN 200HEBREW HE SEMITIC 201HINDI HI INDIAN 202HUNGARIAN HU FINNO-UGRIC 203ICELANDIC IS GERMANIC 204INDONESIAN ID OCEANIC/INDONESIAN 205INTERLINGUA IA INTERNATIONAL AUX. 206INTERLINGUE IE INTERNATIONAL AUX. 207INUKTITUT IU 208INUPIAK IK ESKIMO 209IRISH GA CELTIC 210ITALIAN IT ROMANCE 211JAPANESE JA ASIAN 212JAVANESE JV OCEANIC/INDONESIAN 213KANNADA KN DRAVIDIAN 214KASHMIRI KS INDIAN 215KAZAKH KK TURKIC/ALTAIC 216KINYARWANDA RW NEGRO-AFRICAN 217KIRGHIZ KY TURKIC/ALTAIC 218KURUNDI RN NEGRO-AFRICAN 219KOREAN KO ASIAN 220KURDISH KU IRANIAN 221LAOTHIAN LO ASIAN 222LATIN LA LATIN/GREEK 223LATVIAN LV BALTIC 224LINGALA LN NEGRO-AFRICAN 225LITHUANIAN LT BALTIC 226MACEDONIAN MK SLAVIC 227MALAGASY MG OCEANIC/INDONESIAN 228MALAY MS OCEANIC/INDONESIAN 229MALAYALAM ML DRAVIDIAN 230MALTESE MT SEMITIC 231MAORI MI OCEANIC/INDONESIAN 232MARATHI MR INDIAN 233MOLDAVIAN MO ROMANCE 234MONGOLIAN MN 235NAURU NA 236NEPALI NE INDIAN 237NORWEGIAN NO GERMANIC 238OCCITAN OC ROMANCE 239ORIYA OR INDIAN 240PASHTO PS IRANIAN 241PERSIAN (farsi) FA IRANIAN 242POLISH PL SLAVIC 243PORTUGUESE PT ROMANCE 244PUNJABI PA INDIAN 245QUECHUA QU AMERINDIAN 246RHAETO-ROMANCE RM ROMANCE 247ROMANIAN RO ROMANCE 248RUSSIAN RU SLAVIC 249SAMOAN SM OCEANIC/INDONESIAN 250SANGHO SG NEGRO-AFRICAN 251SANSKRIT SA INDIAN 252SCOTS GAELIC GD CELTIC 253SERBIAN SR SLAVIC 254SERBO-CROATIAN SH SLAVIC 255SESOTHO ST NEGRO-AFRICAN 256SETSWANA TN NEGRO-AFRICAN 257SHONA SN NEGRO-AFRICAN 258SINDHI SD INDIAN 259SINGHALESE SI INDIAN 260SISWATI SS NEGRO-AFRICAN 261SLOVAK SK SLAVIC 262SLOVENIAN SL SLAVIC 263SOMALI SO HAMITIC 264SPANISH ES ROMANCE 265SUNDANESE SU OCEANIC/INDONESIAN 266SWAHILI SW NEGRO-AFRICAN 267SWEDISH SV GERMANIC 268TAGALOG TL OCEANIC/INDONESIAN 269TAJIK TG IRANIAN 270TAMIL TA DRAVIDIAN 271TATAR TT TURKIC/ALTAIC 272TELUGU TE DRAVIDIAN 273THAI TH ASIAN 274TIBETAN BO ASIAN 275TIGRINYA TI SEMITIC 276TONGA TO OCEANIC/INDONESIAN 277TSONGA TS NEGRO-AFRICAN 278TURKISH TR TURKIC/ALTAIC 279TURKMEN TK TURKIC/ALTAIC 280TWI TW NEGRO-AFRICAN 281UIGUR UG 282UKRAINIAN UK SLAVIC 283URDU UR INDIAN 284UZBEK UZ TURKIC/ALTAIC 285VIETNAMESE VI ASIAN 286VOLAPUK VO INTERNATIONAL AUX. 287WELSH CY CELTIC 288WOLOF WO NEGRO-AFRICAN 289XHOSA XH NEGRO-AFRICAN 290YIDDISH YI GERMANIC 291YORUBA YO NEGRO-AFRICAN 292ZHUANG ZA 293ZULU ZU NEGRO-AFRICAN 294.ta 295.fi 296.Pp 297For example, the locale for the Danish language spoken in Denmark 298using the ISO 8859-1 character set is da_DK.ISO8859-1. 299The da stands for the Danish language and the DK stands for Denmark. 300The short form of da_DK is sufficient to indicate this locale. 301.Pp 302The environment variable settings are queried by their priority level 303in the following manner: 304.Pp 305.Bl -bullet 306.It 307If the 308.Ev LC_ALL 309environment variable is set, all six categories use the locale it 310specifies. 311.It 312If the 313.Ev LC_ALL 314environment variable is not set, each individual category uses the 315locale specified by its corresponding environment variable. 316.It 317If the 318.Ev LC_ALL 319environment variable is not set, and a value for a particular 320.Ev LC_* 321environment variable is not set, the value of the 322.Ev LANG 323environment variable specifies the default locale for all categories. 324Only the 325.Ev LANG 326environment variable should be set in /etc/profile, since it makes it 327most easy for the user to override the system default using the individual 328.Ev LC_* 329variables. 330.It 331If the 332.Ev LC_ALL 333environment variable is not set, a value for a particular 334.Ev LC_* 335environment variable is not set, and the value of the 336.Ev LANG 337environment variable is not set, the locale for that specific 338category defaults to the C locale. 339The C or POSIX locale assumes the ASCII character set and defines 340information for the six categories. 341.El 342.Ss Character Sets 343A character is any symbol used for the organization, control, or 344representation of data. 345A group of such symbols used to describe a 346particular language make up a character set. 347It is the encoding values in a character set that provide 348the interface between the system and its input and output devices. 349.Pp 350The following character sets are supported in 351.Nx : 352.Bl -tag -width ISO_8859_family 353.It ASCII 354The American Standard Code for Information Exchange (ASCII) standard 355specifies 128 Roman characters and control codes, encoded in a 7-bit 356character encoding scheme. 357.It ISO 8859 family 358Industry-standard character sets specified by the ISO/IEC 8859 359standard. 360The standard is divided into 15 numbered parts, with each 361part specifying broad script similarities. 362Examples include Western European, Central European, Arabic, Cyrillic, 363Hebrew, Greek, and Turkish. 364The character sets use an 8-bit character encoding scheme which is 365compatible with the ASCII character set. 366.It Unicode 367The Unicode character set is the full set of known abstract characters of 368all real-world scripts. It can be used in environments where multiple 369scripts must be processed simultaneously. 370Unicode is compatible with ISO 8859-1 (Western European) and ASCII. 371Many character encoding schemes are available for Unicode, including UTF-8, 372UTF-16 and UTF-32. 373These encoding schemes are multi-byte encodings. 374The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is 375compatible with ASCII. 376The UTF-16 encoding scheme uses 16-bit, variable-width encodings. 377The UTF-32 encoding scheme using 32-bit, fixed-width encodings. 378.El 379.Ss Font Sets 380A font set contains the glyphs to be displayed on the screen for a 381corresponding character in a character set. 382A display must support a suitable font to display a character set. 383If suitable fonts are available to the X server, then X clients can 384include support for different character sets. 385.Xr xterm 1 386includes support for Unicode with UTF-8 encoding. 387.Xr xfd 1 388is useful for displaying all the characters in an X font. 389.Pp 390The 391.Nx 392.Xr wscons 4 393console provides support for loading fonts using the 394.Xr wsfontload 8 395utility. 396Currently, only fonts for the ISO8859-1 family of character sets are 397supported. 398.Ss Internationalization for Programmers 399To facilitate translations of messages into various languages and to 400make the translated messages available to the program based on a 401user's locale, it is necessary to keep messages separate from the 402programs and provide them in the form of message catalogs that a 403program can access at run time. 404.Pp 405Access to locale information is provided through the 406.Xr setlocale 3 407and 408.Xr nl_langinfo 3 409interfaces. 410See their respective man pages for further information. 411.Pp 412Message source files containing application messages are created by 413the programmer and converted to message catalogs. 414These catalogs are used by the application to retrieve and display 415messages, as needed. 416.Pp 417.Nx 418supports two message catalog interfaces: the X/Open 419.Xr catgets 3 420interface and the Uniforum 421.Xr gettext 3 422interface. 423The 424.Xr catgets 3 425interface has the advantage that it belongs to a standard which is 426well supported. 427Unfortunately the interface is complicated to use and 428maintenance of the catalogs is difficult. 429The implementation also doesn't support different character sets. 430The 431.Xr gettext 3 432interface has not been standardized yet, however it is being supported 433by an increasing number of systems. 434It also provides many additional tools which make programming and 435catalog maintenance much easier. 436.Ss Support for Multi-byte Encodings 437Some character sets with multi-byte encodings may be difficult to decode, 438or may contain state (i.e., adjacent characters are dependent). 439ISO C specifies a set of functions using 'wide characters' which can handle 440multi-byte encodings properly. 441The behaviour of these functions is affected 442by the 443.Ev LC_CTYPE 444category of the current locale. 445.Pp 446A wide character is specified in ISO C 447as being a fixed number of bits wide and is stateless. 448There are two types for wide characters: 449.Em wchar_t 450and 451.Em wint_t . 452.Em wchar_t 453is a type which can contain one wide character and operates like 'char' 454type does for one character. 455.Em wint_t 456can contain one wide character or WEOF (wide EOF). 457.Pp 458There are functions that operate on 459.Em wchar_t , 460and substitute for functions operating on 'char'. 461See 462.Xr wmemchr 3 463and 464.Xr towlower 3 465for details. 466There are some additional functions that operate on 467.Em wchar_t . 468See 469.Xr wctype 3 470and 471.Xr wctrans 3 472for details. 473.Pp 474Wide characters should be used for all I/O processing which may rely 475on locale-specific strings. 476The two primary issues requiring special use of wide characters are: 477.Bl -bullet -offset indent 478.It 479All I/O is performed using multibyte characters. 480Input data is converted into wide characters immediately after 481reading and data for output is converted from wide characters to 482multi-byte encoding immediately before writing. 483Conversion is controlled by the 484.Xr mbstowcs 3 , 485.Xr mbsrtowcs 3 , 486.Xr wcstombs 3 , 487.Xr wcsrtombs 3 , 488.Xr mblen 3 , 489.Xr mbrlen 3 , 490and 491.Xr mbsinit 3 . 492.It 493Wide characters are used directly for I/O, using 494.Xr getwchar 3 , 495.Xr fgetwc 3 , 496.Xr getwc 3 , 497.Xr ungetwc 3 , 498.Xr fgetws 3 , 499.Xr putwchar 3 , 500.Xr fputwc 3 , 501.Xr putwc 3 , 502and 503.Xr fputws 3 . 504They are also used for formatted I/O functions for wide characters 505such as 506.Xr fwscanf 3 , 507.Xr wscanf 3 , 508.Xr swscanf 3 , 509.Xr fwprintf 3 , 510.Xr wprintf 3 , 511.Xr swprintf 3 , 512.Xr vfwprintf 3 , 513.Xr vwprintf 3 , 514and 515.Xr vswprintf 3 , 516and wide character identifier of %lc, %C, %ls, %S for conventional 517formatted I/O functions. 518.El 519.Sh SEE ALSO 520.Xr gencat 1 , 521.Xr xfd 1 , 522.Xr xterm 1 , 523.Xr catgets 3 , 524.Xr gettext 3 , 525.Xr nl_langinfo 3 , 526.Xr setlocale 3 , 527.Xr wsfontload 8 528.Sh BUGS 529This man page is incomplete. 530