xref: /netbsd-src/share/man/man7/nls.7 (revision d710132b4b8ce7f7cccaaf660cb16aa16b4077a0)
1.\"     $NetBSD: nls.7,v 1.9 2003/05/18 09:10:51 wiz Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.Dd May 17, 2003
38.Dt NLS 7
39.Os
40.Sh NAME
41.Nm NLS
42.Nd Native Language Support Overview
43.Sh DESCRIPTION
44Native Language Support (NLS) provides commands for a single
45worldwide operating system base.
46An internationalized system has no built-in assumptions or dependencies
47on language-specific or cultural-specific conventions such as:
48.Pp
49.Bl -bullet -indent -compact
50.It
51Character classifications
52.It
53Character comparison rules
54.It
55Character collation order
56.It
57Numeric and monetary formatting
58.It
59Date and time formatting
60.It
61Message-text language
62.It
63Character sets
64.El
65.Pp
66All information pertaining to cultural conventions and language is
67obtained at program run time.
68.Pp
69.Dq Internationalization
70(often abbreviated
71.Dq i18n )
72refers to the operation by which system software is developed to support
73multiple cultural-specific and language-specific conventions.
74This is a generalization process by which the system is untied from
75calling only English strings or other English-specific conventions.
76.Dq Localization
77(often abbreviated
78.Dq l10n )
79refers to the operations by which the user environment is customized to
80handle its input and output appropriate for specific language and cultural
81conventions.
82This is a specialization process, by which generic methods already
83implemented in an internationalized system are used in specific ways.
84The formal description of cultural conventions for some country, together
85with all associated translations targeted to the native language, is
86called the
87.Dq locale .
88.Pp
89.Nx
90provides extensive support to programmers and system developers to
91enable internationalized software to be developed.
92.Nx
93also supplies a large variety of locales for system localization.
94.Ss Localization of Information
95All locale information is accessible to programs at run time so that
96data is processed and displayed correctly for specific cultural
97conventions and language.
98.Pp
99A locale is divided into categories.
100A category is a group of language-specific and culture-specific conventions
101as outlined in the list above.
102ISO C specifies the following six standard categories supported by
103.Nx :
104.Pp
105.Bl -tag -compact -width LC_MONETARYXX
106.It LC_COLLATE
107string-collation order information
108.It LC_CTYPE
109character classification, case conversion, and other character attributes
110.It LC_MESSAGES
111the format for affirmative and negative responses
112.It LC_MONETARY
113rules and symbols for formatting monetary numeric information
114.It LC_NUMERIC
115rules and symbols for formatting nonmonetary numeric information
116.It LC_TIME
117rules and symbols for formatting time and date information
118.El
119.Pp
120Localization of the system is achieved by setting appropriate values
121in environment variables to identify which locale should be used.
122The environment variables have the same names as their respective
123locale categories.
124Additionally, the
125.Ev LANG ,
126.Ev LC_ALL ,
127and
128.Ev NLSPATH
129environment variables are used.
130The
131.Ev NLSPATH
132environment variable specifies a colon-separated list of directory names
133where the message catalog files of the NLS database are located.
134The
135.Ev LC_ALL
136and
137.Ev LANG
138environment variables also determine the current locale.
139.Pp
140The values of these environment variables contains a string format as:
141.Pp
142.Bd -literal
143	language[_territory][.codeset][@modifier]
144.Ed
145.Pp
146Valid values for the language field come from the ISO639 standard which
147defines two-character codes for many languages.
148Some common language codes are:
149.Pp
150.nf
151.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
152\fILanguage Name\fP	\fICode\fP	\fILanguage Family\fP
153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
154.sp 5p
155ABKHAZIAN	AB	IBERO-CAUCASIAN
156AFAN (OROMO)	OM	HAMITIC
157AFAR	AA	HAMITIC
158AFRIKAANS	AF	GERMANIC
159ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
160AMHARIC	AM	SEMITIC
161ARABIC	AR	SEMITIC
162ARMENIAN	HY	INDO-EUROPEAN (OTHER)
163ASSAMESE	AS	INDIAN
164AYMARA	AY	AMERINDIAN
165AZERBAIJANI	AZ	TURKIC/ALTAIC
166BASHKIR	BA	TURKIC/ALTAIC
167BASQUE	EU	BASQUE
168BENGALI	BN	INDIAN
169BHUTANI	DZ	ASIAN
170BIHARI	BH	INDIAN
171BISLAMA	BI
172BRETON	BR	CELTIC
173BULGARIAN	BG	SLAVIC
174BURMESE	MY	ASIAN
175BYELORUSSIAN	BE	SLAVIC
176CAMBODIAN	KM	ASIAN
177CATALAN	CA	ROMANCE
178CHINESE	ZH	ASIAN
179CORSICAN	CO	ROMANCE
180CROATIAN	HR	SLAVIC
181CZECH	CS	SLAVIC
182DANISH	DA	GERMANIC
183DUTCH	NL	GERMANIC
184ENGLISH	EN	GERMANIC
185ESPERANTO	EO	INTERNATIONAL AUX.
186ESTONIAN	ET	FINNO-UGRIC
187FAROESE	FO	GERMANIC
188FIJI	FJ	OCEANIC/INDONESIAN
189FINNISH	FI	FINNO-UGRIC
190FRENCH	FR	ROMANCE
191FRISIAN	FY	GERMANIC
192GALICIAN	GL	ROMANCE
193GEORGIAN	KA	IBERO-CAUCASIAN
194GERMAN	DE	GERMANIC
195GREEK	EL	LATIN/GREEK
196GREENLANDIC	KL	ESKIMO
197GUARANI	GN	AMERINDIAN
198GUJARATI	GU	INDIAN
199HAUSA	HA	NEGRO-AFRICAN
200HEBREW	HE	SEMITIC
201HINDI	HI	INDIAN
202HUNGARIAN	HU	FINNO-UGRIC
203ICELANDIC	IS	GERMANIC
204INDONESIAN	ID	OCEANIC/INDONESIAN
205INTERLINGUA	IA	INTERNATIONAL AUX.
206INTERLINGUE	IE	INTERNATIONAL AUX.
207INUKTITUT	IU
208INUPIAK	IK	ESKIMO
209IRISH	GA	CELTIC
210ITALIAN	IT	ROMANCE
211JAPANESE	JA	ASIAN
212JAVANESE	JV	OCEANIC/INDONESIAN
213KANNADA	KN	DRAVIDIAN
214KASHMIRI	KS	INDIAN
215KAZAKH	KK	TURKIC/ALTAIC
216KINYARWANDA	RW	NEGRO-AFRICAN
217KIRGHIZ	KY	TURKIC/ALTAIC
218KURUNDI	RN	NEGRO-AFRICAN
219KOREAN	KO	ASIAN
220KURDISH	KU	IRANIAN
221LAOTHIAN	LO	ASIAN
222LATIN	LA	LATIN/GREEK
223LATVIAN	LV	BALTIC
224LINGALA	LN	NEGRO-AFRICAN
225LITHUANIAN	LT	BALTIC
226MACEDONIAN	MK	SLAVIC
227MALAGASY	MG	OCEANIC/INDONESIAN
228MALAY	MS	OCEANIC/INDONESIAN
229MALAYALAM	ML	DRAVIDIAN
230MALTESE	MT	SEMITIC
231MAORI	MI	OCEANIC/INDONESIAN
232MARATHI	MR	INDIAN
233MOLDAVIAN	MO	ROMANCE
234MONGOLIAN	MN
235NAURU	NA
236NEPALI	NE	INDIAN
237NORWEGIAN	NO	GERMANIC
238OCCITAN	OC	ROMANCE
239ORIYA	OR	INDIAN
240PASHTO	PS	IRANIAN
241PERSIAN (farsi)	FA	IRANIAN
242POLISH	PL	SLAVIC
243PORTUGUESE	PT	ROMANCE
244PUNJABI	PA	INDIAN
245QUECHUA	QU	AMERINDIAN
246RHAETO-ROMANCE  RM	ROMANCE
247ROMANIAN	RO	ROMANCE
248RUSSIAN	RU	SLAVIC
249SAMOAN	SM	OCEANIC/INDONESIAN
250SANGHO	SG	NEGRO-AFRICAN
251SANSKRIT	SA	INDIAN
252SCOTS GAELIC	GD	CELTIC
253SERBIAN	SR	SLAVIC
254SERBO-CROATIAN  SH	SLAVIC
255SESOTHO	ST	NEGRO-AFRICAN
256SETSWANA	TN	NEGRO-AFRICAN
257SHONA	SN	NEGRO-AFRICAN
258SINDHI	SD	INDIAN
259SINGHALESE	SI	INDIAN
260SISWATI	SS	NEGRO-AFRICAN
261SLOVAK	SK	SLAVIC
262SLOVENIAN	SL	SLAVIC
263SOMALI	SO	HAMITIC
264SPANISH	ES	ROMANCE
265SUNDANESE	SU	OCEANIC/INDONESIAN
266SWAHILI	SW	NEGRO-AFRICAN
267SWEDISH	SV	GERMANIC
268TAGALOG	TL	OCEANIC/INDONESIAN
269TAJIK	TG	IRANIAN
270TAMIL	TA	DRAVIDIAN
271TATAR	TT	TURKIC/ALTAIC
272TELUGU	TE	DRAVIDIAN
273THAI	TH	ASIAN
274TIBETAN	BO	ASIAN
275TIGRINYA	TI	SEMITIC
276TONGA	TO	OCEANIC/INDONESIAN
277TSONGA	TS	NEGRO-AFRICAN
278TURKISH	TR	TURKIC/ALTAIC
279TURKMEN	TK	TURKIC/ALTAIC
280TWI	TW	NEGRO-AFRICAN
281UIGUR	UG
282UKRAINIAN	UK	SLAVIC
283URDU	UR	INDIAN
284UZBEK	UZ	TURKIC/ALTAIC
285VIETNAMESE	VI	ASIAN
286VOLAPUK	VO	INTERNATIONAL AUX.
287WELSH	CY	CELTIC
288WOLOF	WO	NEGRO-AFRICAN
289XHOSA	XH	NEGRO-AFRICAN
290YIDDISH	YI	GERMANIC
291YORUBA	YO	NEGRO-AFRICAN
292ZHUANG	ZA
293ZULU	ZU	NEGRO-AFRICAN
294.ta.fi
295.Pp
296For example, the locale for the Danish language spoken in Denmark
297using the ISO8859-1 character set is da_DK.ISO8859-1.
298The da stands for the Danish language and the DK stands for Denmark.
299The short form of da_DK is sufficient to indicate this locale.
300.Pp
301The environment variable settings are queried by their priority level
302in the following manner:
303.Pp
304.Bl -bullet
305.It
306If the
307.Ev LC_ALL
308environment variable is set, all six categories use the locale it
309specifies.
310.It
311If the
312.Ev LC_ALL
313environment variable is not set, each individual category uses the
314locale specified by its corresponding environment variable.
315.It
316If the
317.Ev LC_ALL
318environment variable is not set, and a value for a particular
319.Ev LC_*
320environment variable is not set, the value of the
321.Ev LANG
322environment variable specifies the default locale for all categories.
323Only the
324.Ev LANG
325environment variable should be set in /etc/profile, since it makes it
326most easy for the user to override the system default using the individual
327.Ev LC_*
328variables.
329.It
330If the
331.Ev LC_ALL
332environment variable is not set, a value for a particular
333.Ev LC_*
334environment variable is not set, and the value of the
335.Ev LANG
336environment variable is not set, the locale for that specific
337category defaults to the C locale.
338The C or POSIX locale assumes the 7-bit ASCII character set and defines
339information for the six categories.
340.El
341.Ss Character Sets
342A character is any symbol used for the organization, control, or
343representation of data.
344A group of such symbols used to describe a
345particular language make up a character set.
346It is the encoding values in a character set that provide
347the interface between the system and its input and output devices.
348.Pp
349The following character sets are supported in
350.Nx
351.Bl -tag -width ISO8859_family
352.It ISO8859 family
353Industry-standard character sets are provided by means of the ISO8859
354family of character sets, which provide a range of single-byte character set
355support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
356Greek, and Turkish.
357The eucJP character set is the industry-standard character set used to support
358the Japanese locale.
359.It Unicode
360A Unicode environment based on the UTF-8 character set is supported for all
361supported language/territories.
362UTF-8 provides character support for most of the major languages of the
363world and can be used in environments where multiple languages must be
364processed simultaneously.
365.El
366.Ss Font Sets
367A font set contains the glyphs to be displayed on the screen for a
368corresponding character in a character set.
369A display must support a suitable font to display a character set.
370If suitable fonts are available to the X server, then X clients can
371include support for different character sets.
372.Xr xterm 1
373includes support for UTF-8 character sets.
374.Xr xfd 1
375is useful for displaying all the characters in an X font.
376.Pp
377The
378.Nx
379.Xr wscons 4
380console provides support for loading fonts using the
381.Xr wsfontload 8
382utility.
383Currently, only fonts for the ISO8859-1 family of character sets are
384supported.
385.Ss Internationalization for Programmers
386To facilitate translations of messages into various languages and to
387make the translated messages available to the program based on a
388user's locale, it is necessary to keep messages separate from the
389programs and provide them in the form of message catalogs that a
390program can access at run time.
391.Pp
392Access to locale information is provided through the
393.Xr setlocale 3
394and
395.Xr nl_langinfo 3
396interfaces.
397See their respective man pages for further information.
398.Pp
399Message source files containing application messages are created by
400the programmer and converted to message catalogs.
401These catalogs are used by the application to retrieve and display
402messages, as needed.
403.Pp
404.Nx
405supports two message catalog interfaces: the X/Open
406.Xr catgets 3
407interface and the Uniforum
408.Xr gettext 3
409interface.
410The
411.Xr catgets 3
412interface has the advantage that it belongs to a standard which is
413well supported.
414Unfortunately the interface is complicated to use and
415maintenance of the catalogs is difficult.
416The implementation also doesn't support different character sets.
417The
418.Xr gettext 3
419interface has not been standardized yet, however it is being supported
420by an increasing number of systems.
421It also provides many additional tools which make programming and
422catalog maintenance much easier.
423.Ss Support for Multibyte Characters and Wide Characters
424Character sets with multibyte characters may be difficult to decode, or may
425contain state (i.e., adjacent characters are dependent).
426ISO C specifies a set of functions using 'wide characters' which can handle
427multibyte characters properly.
428A wide character is specified in ISO C
429as being a fixed number of bits wide and is stateless.
430.Pp
431There are two types for wide characters:
432.Em wchar_t
433and
434.Em wint_t .
435.Em wchar_t
436is a type which can contain one wide character and operates like
437'char' type does for one character.
438.Em wint_t
439can contain one wide character or WEOF (wide EOF).
440.Pp
441There are functions that operate on
442.Em wchar_t ,
443and substitute for functions operating on 'char'.
444See
445.Xr wmemchr 3
446and
447.Xr towlower 3
448for details.
449There are some additional functions that operate on
450.Em wchar_t .
451See
452.Xr wctype 3
453and
454.Xr wctran 3
455for details.
456.Pp
457Wide characters should be used for all I/O processing which may rely
458on locale-specific strings.
459The two primary issues requiring special use of wide characters are:
460.Bl -bullet -indent
461.It
462All I/O is performed using multibyte characters.
463Input data is converted into wide characters immediately after
464reading and data for output is converted from wide characters to
465multibyte characters immediately before writing.
466Conversion is achieved using
467.Xr mbstowcs 3 ,
468.Xr mbsrtowcs 3 ,
469.Xr wcstombs 3 ,
470.Xr wcsrtombs 3 ,
471.Xr mblen 3 ,
472.Xr mbrlen 3 ,
473and
474.Xr  mbsinit 3 .
475.It
476Wide characters are used directly for I/O, using
477.Xr getwchar 3 ,
478.Xr fgetwc 3 ,
479.Xr getwc 3 ,
480.Xr ungetwc 3 ,
481.Xr fgetws 3 ,
482.Xr putwchar 3 ,
483.Xr fputwc 3 ,
484.Xr putwc 3 ,
485and
486.Xr fputws 3 .
487They are also used for formatted I/O functions for wide characters
488such as
489.Xr fwscanf 3 ,
490.Xr wscanf 3 ,
491.Xr swscanf 3 ,
492.Xr fwprintf 3 ,
493.Xr wprintf 3 ,
494.Xr swprintf 3 ,
495.Xr vfwprintf 3 ,
496.Xr vwprintf 3 ,
497and
498.Xr vswprintf 3 ,
499and wide character identifier of %lc, %C, %ls, %S for conventional
500formatted I/O functions.
501.El
502.Sh SEE ALSO
503.Xr gencat 1 ,
504.Xr xfd 1 ,
505.Xr xterm 1 ,
506.Xr catgets 3 ,
507.Xr gettext 3 ,
508.Xr nl_langinfo 3 ,
509.Xr setlocale 3 ,
510.Xr wsfontload 8
511.Sh BUGS
512This man page is incomplete.
513