1.\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.Dd May 17, 2003 38.Dt NLS 7 39.Os 40.Sh NAME 41.Nm NLS 42.Nd Native Language Support Overview 43.Sh DESCRIPTION 44Native Language Support (NLS) provides commands for a single 45worldwide operating system base. 46An internationalized system has no built-in assumptions or dependencies 47on language-specific or cultural-specific conventions such as: 48.Pp 49.Bl -bullet -offset indent -compact 50.It 51Character classifications 52.It 53Character comparison rules 54.It 55Character collation order 56.It 57Numeric and monetary formatting 58.It 59Date and time formatting 60.It 61Message-text language 62.It 63Character sets 64.El 65.Pp 66All information pertaining to cultural conventions and language is 67obtained at program run time. 68.Pp 69.Dq Internationalization 70(often abbreviated 71.Dq i18n ) 72refers to the operation by which system software is developed to support 73multiple cultural-specific and language-specific conventions. 74This is a generalization process by which the system is untied from 75calling only English strings or other English-specific conventions. 76.Dq Localization 77(often abbreviated 78.Dq l10n ) 79refers to the operations by which the user environment is customized to 80handle its input and output appropriate for specific language and cultural 81conventions. 82This is a specialization process, by which generic methods already 83implemented in an internationalized system are used in specific ways. 84The formal description of cultural conventions for some country, together 85with all associated translations targeted to the native language, is 86called the 87.Dq locale . 88.Pp 89.Nx 90provides extensive support to programmers and system developers to 91enable internationalized software to be developed. 92.Nx 93also supplies a large variety of locales for system localization. 94.Ss Localization of Information 95All locale information is accessible to programs at run time so that 96data is processed and displayed correctly for specific cultural 97conventions and language. 98.Pp 99A locale is divided into categories. 100A category is a group of language-specific and culture-specific conventions 101as outlined in the list above. 102ISO C specifies the following six standard categories supported by 103.Nx : 104.Pp 105.Bl -tag -compact -width LC_MONETARYXX 106.It LC_COLLATE 107string-collation order information 108.It LC_CTYPE 109character classification, case conversion, and other character attributes 110.It LC_MESSAGES 111the format for affirmative and negative responses 112.It LC_MONETARY 113rules and symbols for formatting monetary numeric information 114.It LC_NUMERIC 115rules and symbols for formatting nonmonetary numeric information 116.It LC_TIME 117rules and symbols for formatting time and date information 118.El 119.Pp 120Localization of the system is achieved by setting appropriate values 121in environment variables to identify which locale should be used. 122The environment variables have the same names as their respective 123locale categories. 124Additionally, the 125.Ev LANG , 126.Ev LC_ALL , 127and 128.Ev NLSPATH 129environment variables are used. 130The 131.Ev NLSPATH 132environment variable specifies a colon-separated list of directory names 133where the message catalog files of the NLS database are located. 134The 135.Ev LC_ALL 136and 137.Ev LANG 138environment variables also determine the current locale. 139.Pp 140The values of these environment variables contains a string format as: 141.Pp 142.Bd -literal 143 language[_territory][.codeset][@modifier] 144.Ed 145.Pp 146Valid values for the language field come from the ISO639 standard which 147defines two-character codes for many languages. 148Some common language codes are: 149.Pp 150.nf 151.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 152\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 154.sp 5p 155ABKHAZIAN AB IBERO-CAUCASIAN 156AFAN (OROMO) OM HAMITIC 157AFAR AA HAMITIC 158AFRIKAANS AF GERMANIC 159ALBANIAN SQ INDO-EUROPEAN (OTHER) 160AMHARIC AM SEMITIC 161ARABIC AR SEMITIC 162ARMENIAN HY INDO-EUROPEAN (OTHER) 163ASSAMESE AS INDIAN 164AYMARA AY AMERINDIAN 165AZERBAIJANI AZ TURKIC/ALTAIC 166BASHKIR BA TURKIC/ALTAIC 167BASQUE EU BASQUE 168BENGALI BN INDIAN 169BHUTANI DZ ASIAN 170BIHARI BH INDIAN 171BISLAMA BI 172BRETON BR CELTIC 173BULGARIAN BG SLAVIC 174BURMESE MY ASIAN 175BYELORUSSIAN BE SLAVIC 176CAMBODIAN KM ASIAN 177CATALAN CA ROMANCE 178CHINESE ZH ASIAN 179CORSICAN CO ROMANCE 180CROATIAN HR SLAVIC 181CZECH CS SLAVIC 182DANISH DA GERMANIC 183DUTCH NL GERMANIC 184ENGLISH EN GERMANIC 185ESPERANTO EO INTERNATIONAL AUX. 186ESTONIAN ET FINNO-UGRIC 187FAROESE FO GERMANIC 188FIJI FJ OCEANIC/INDONESIAN 189FINNISH FI FINNO-UGRIC 190FRENCH FR ROMANCE 191FRISIAN FY GERMANIC 192GALICIAN GL ROMANCE 193GEORGIAN KA IBERO-CAUCASIAN 194GERMAN DE GERMANIC 195GREEK EL LATIN/GREEK 196GREENLANDIC KL ESKIMO 197GUARANI GN AMERINDIAN 198GUJARATI GU INDIAN 199HAUSA HA NEGRO-AFRICAN 200HEBREW HE SEMITIC 201HINDI HI INDIAN 202HUNGARIAN HU FINNO-UGRIC 203ICELANDIC IS GERMANIC 204INDONESIAN ID OCEANIC/INDONESIAN 205INTERLINGUA IA INTERNATIONAL AUX. 206INTERLINGUE IE INTERNATIONAL AUX. 207INUKTITUT IU 208INUPIAK IK ESKIMO 209IRISH GA CELTIC 210ITALIAN IT ROMANCE 211JAPANESE JA ASIAN 212JAVANESE JV OCEANIC/INDONESIAN 213KANNADA KN DRAVIDIAN 214KASHMIRI KS INDIAN 215KAZAKH KK TURKIC/ALTAIC 216KINYARWANDA RW NEGRO-AFRICAN 217KIRGHIZ KY TURKIC/ALTAIC 218KURUNDI RN NEGRO-AFRICAN 219KOREAN KO ASIAN 220KURDISH KU IRANIAN 221LAOTHIAN LO ASIAN 222LATIN LA LATIN/GREEK 223LATVIAN LV BALTIC 224LINGALA LN NEGRO-AFRICAN 225LITHUANIAN LT BALTIC 226MACEDONIAN MK SLAVIC 227MALAGASY MG OCEANIC/INDONESIAN 228MALAY MS OCEANIC/INDONESIAN 229MALAYALAM ML DRAVIDIAN 230MALTESE MT SEMITIC 231MAORI MI OCEANIC/INDONESIAN 232MARATHI MR INDIAN 233MOLDAVIAN MO ROMANCE 234MONGOLIAN MN 235NAURU NA 236NEPALI NE INDIAN 237NORWEGIAN NO GERMANIC 238OCCITAN OC ROMANCE 239ORIYA OR INDIAN 240PASHTO PS IRANIAN 241PERSIAN (farsi) FA IRANIAN 242POLISH PL SLAVIC 243PORTUGUESE PT ROMANCE 244PUNJABI PA INDIAN 245QUECHUA QU AMERINDIAN 246RHAETO-ROMANCE RM ROMANCE 247ROMANIAN RO ROMANCE 248RUSSIAN RU SLAVIC 249SAMOAN SM OCEANIC/INDONESIAN 250SANGHO SG NEGRO-AFRICAN 251SANSKRIT SA INDIAN 252SCOTS GAELIC GD CELTIC 253SERBIAN SR SLAVIC 254SERBO-CROATIAN SH SLAVIC 255SESOTHO ST NEGRO-AFRICAN 256SETSWANA TN NEGRO-AFRICAN 257SHONA SN NEGRO-AFRICAN 258SINDHI SD INDIAN 259SINGHALESE SI INDIAN 260SISWATI SS NEGRO-AFRICAN 261SLOVAK SK SLAVIC 262SLOVENIAN SL SLAVIC 263SOMALI SO HAMITIC 264SPANISH ES ROMANCE 265SUNDANESE SU OCEANIC/INDONESIAN 266SWAHILI SW NEGRO-AFRICAN 267SWEDISH SV GERMANIC 268TAGALOG TL OCEANIC/INDONESIAN 269TAJIK TG IRANIAN 270TAMIL TA DRAVIDIAN 271TATAR TT TURKIC/ALTAIC 272TELUGU TE DRAVIDIAN 273THAI TH ASIAN 274TIBETAN BO ASIAN 275TIGRINYA TI SEMITIC 276TONGA TO OCEANIC/INDONESIAN 277TSONGA TS NEGRO-AFRICAN 278TURKISH TR TURKIC/ALTAIC 279TURKMEN TK TURKIC/ALTAIC 280TWI TW NEGRO-AFRICAN 281UIGUR UG 282UKRAINIAN UK SLAVIC 283URDU UR INDIAN 284UZBEK UZ TURKIC/ALTAIC 285VIETNAMESE VI ASIAN 286VOLAPUK VO INTERNATIONAL AUX. 287WELSH CY CELTIC 288WOLOF WO NEGRO-AFRICAN 289XHOSA XH NEGRO-AFRICAN 290YIDDISH YI GERMANIC 291YORUBA YO NEGRO-AFRICAN 292ZHUANG ZA 293ZULU ZU NEGRO-AFRICAN 294.ta 295.fi 296.Pp 297For example, the locale for the Danish language spoken in Denmark 298using the ISO8859-1 character set is da_DK.ISO8859-1. 299The da stands for the Danish language and the DK stands for Denmark. 300The short form of da_DK is sufficient to indicate this locale. 301.Pp 302The environment variable settings are queried by their priority level 303in the following manner: 304.Pp 305.Bl -bullet 306.It 307If the 308.Ev LC_ALL 309environment variable is set, all six categories use the locale it 310specifies. 311.It 312If the 313.Ev LC_ALL 314environment variable is not set, each individual category uses the 315locale specified by its corresponding environment variable. 316.It 317If the 318.Ev LC_ALL 319environment variable is not set, and a value for a particular 320.Ev LC_* 321environment variable is not set, the value of the 322.Ev LANG 323environment variable specifies the default locale for all categories. 324Only the 325.Ev LANG 326environment variable should be set in /etc/profile, since it makes it 327most easy for the user to override the system default using the individual 328.Ev LC_* 329variables. 330.It 331If the 332.Ev LC_ALL 333environment variable is not set, a value for a particular 334.Ev LC_* 335environment variable is not set, and the value of the 336.Ev LANG 337environment variable is not set, the locale for that specific 338category defaults to the C locale. 339The C or POSIX locale assumes the 7-bit ASCII character set and defines 340information for the six categories. 341.El 342.Ss Character Sets 343A character is any symbol used for the organization, control, or 344representation of data. 345A group of such symbols used to describe a 346particular language make up a character set. 347It is the encoding values in a character set that provide 348the interface between the system and its input and output devices. 349.Pp 350The following character sets are supported in 351.Nx 352.Bl -tag -width ISO8859_family 353.It ISO8859 family 354Industry-standard character sets are provided by means of the ISO8859 355family of character sets, which provide a range of single-byte character set 356support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew, 357Greek, and Turkish. 358The eucJP character set is the industry-standard character set used to support 359the Japanese locale. 360.It Unicode 361A Unicode environment based on the UTF-8 character set is supported for all 362supported language/territories. 363UTF-8 provides character support for most of the major languages of the 364world and can be used in environments where multiple languages must be 365processed simultaneously. 366.El 367.Ss Font Sets 368A font set contains the glyphs to be displayed on the screen for a 369corresponding character in a character set. 370A display must support a suitable font to display a character set. 371If suitable fonts are available to the X server, then X clients can 372include support for different character sets. 373.Xr xterm 1 374includes support for UTF-8 character sets. 375.Xr xfd 1 376is useful for displaying all the characters in an X font. 377.Pp 378The 379.Nx 380.Xr wscons 4 381console provides support for loading fonts using the 382.Xr wsfontload 8 383utility. 384Currently, only fonts for the ISO8859-1 family of character sets are 385supported. 386.Ss Internationalization for Programmers 387To facilitate translations of messages into various languages and to 388make the translated messages available to the program based on a 389user's locale, it is necessary to keep messages separate from the 390programs and provide them in the form of message catalogs that a 391program can access at run time. 392.Pp 393Access to locale information is provided through the 394.Xr setlocale 3 395and 396.Xr nl_langinfo 3 397interfaces. 398See their respective man pages for further information. 399.Pp 400Message source files containing application messages are created by 401the programmer and converted to message catalogs. 402These catalogs are used by the application to retrieve and display 403messages, as needed. 404.Pp 405.Nx 406supports two message catalog interfaces: the X/Open 407.Xr catgets 3 408interface and the Uniforum 409.Xr gettext 3 410interface. 411The 412.Xr catgets 3 413interface has the advantage that it belongs to a standard which is 414well supported. 415Unfortunately the interface is complicated to use and 416maintenance of the catalogs is difficult. 417The implementation also doesn't support different character sets. 418The 419.Xr gettext 3 420interface has not been standardized yet, however it is being supported 421by an increasing number of systems. 422It also provides many additional tools which make programming and 423catalog maintenance much easier. 424.Ss Support for Multibyte Characters and Wide Characters 425Character sets with multibyte characters may be difficult to decode, or may 426contain state (i.e., adjacent characters are dependent). 427ISO C specifies a set of functions using 'wide characters' which can handle 428multibyte characters properly. 429A wide character is specified in ISO C 430as being a fixed number of bits wide and is stateless. 431.Pp 432There are two types for wide characters: 433.Em wchar_t 434and 435.Em wint_t . 436.Em wchar_t 437is a type which can contain one wide character and operates like 'char' 438type does for one character. 439.Em wint_t 440can contain one wide character or WEOF (wide EOF). 441.Pp 442There are functions that operate on 443.Em wchar_t , 444and substitute for functions operating on 'char'. 445See 446.Xr wmemchr 3 447and 448.Xr towlower 3 449for details. 450There are some additional functions that operate on 451.Em wchar_t . 452See 453.Xr wctype 3 454and 455.Xr wctran 3 456for details. 457.Pp 458Wide characters should be used for all I/O processing which may rely 459on locale-specific strings. 460The two primary issues requiring special use of wide characters are: 461.Bl -bullet -offset indent 462.It 463All I/O is performed using multibyte characters. 464Input data is converted into wide characters immediately after 465reading and data for output is converted from wide characters to 466multibyte characters immediately before writing. 467Conversion is achieved using 468.Xr mbstowcs 3 , 469.Xr mbsrtowcs 3 , 470.Xr wcstombs 3 , 471.Xr wcsrtombs 3 , 472.Xr mblen 3 , 473.Xr mbrlen 3 , 474and 475.Xr mbsinit 3 . 476.It 477Wide characters are used directly for I/O, using 478.Xr getwchar 3 , 479.Xr fgetwc 3 , 480.Xr getwc 3 , 481.Xr ungetwc 3 , 482.Xr fgetws 3 , 483.Xr putwchar 3 , 484.Xr fputwc 3 , 485.Xr putwc 3 , 486and 487.Xr fputws 3 . 488They are also used for formatted I/O functions for wide characters 489such as 490.Xr fwscanf 3 , 491.Xr wscanf 3 , 492.Xr swscanf 3 , 493.Xr fwprintf 3 , 494.Xr wprintf 3 , 495.Xr swprintf 3 , 496.Xr vfwprintf 3 , 497.Xr vwprintf 3 , 498and 499.Xr vswprintf 3 , 500and wide character identifier of %lc, %C, %ls, %S for conventional 501formatted I/O functions. 502.El 503.Sh SEE ALSO 504.Xr gencat 1 , 505.Xr xfd 1 , 506.Xr xterm 1 , 507.Xr catgets 3 , 508.Xr gettext 3 , 509.Xr nl_langinfo 3 , 510.Xr setlocale 3 , 511.Xr wsfontload 8 512.Sh BUGS 513This man page is incomplete. 514