xref: /netbsd-src/share/man/man7/nls.7 (revision 23c8222edbfb0f0932d88a8351d3a0cf817dfb9e)
1.\"     $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.Dd May 17, 2003
38.Dt NLS 7
39.Os
40.Sh NAME
41.Nm NLS
42.Nd Native Language Support Overview
43.Sh DESCRIPTION
44Native Language Support (NLS) provides commands for a single
45worldwide operating system base.
46An internationalized system has no built-in assumptions or dependencies
47on language-specific or cultural-specific conventions such as:
48.Pp
49.Bl -bullet -offset indent -compact
50.It
51Character classifications
52.It
53Character comparison rules
54.It
55Character collation order
56.It
57Numeric and monetary formatting
58.It
59Date and time formatting
60.It
61Message-text language
62.It
63Character sets
64.El
65.Pp
66All information pertaining to cultural conventions and language is
67obtained at program run time.
68.Pp
69.Dq Internationalization
70(often abbreviated
71.Dq i18n )
72refers to the operation by which system software is developed to support
73multiple cultural-specific and language-specific conventions.
74This is a generalization process by which the system is untied from
75calling only English strings or other English-specific conventions.
76.Dq Localization
77(often abbreviated
78.Dq l10n )
79refers to the operations by which the user environment is customized to
80handle its input and output appropriate for specific language and cultural
81conventions.
82This is a specialization process, by which generic methods already
83implemented in an internationalized system are used in specific ways.
84The formal description of cultural conventions for some country, together
85with all associated translations targeted to the native language, is
86called the
87.Dq locale .
88.Pp
89.Nx
90provides extensive support to programmers and system developers to
91enable internationalized software to be developed.
92.Nx
93also supplies a large variety of locales for system localization.
94.Ss Localization of Information
95All locale information is accessible to programs at run time so that
96data is processed and displayed correctly for specific cultural
97conventions and language.
98.Pp
99A locale is divided into categories.
100A category is a group of language-specific and culture-specific conventions
101as outlined in the list above.
102ISO C specifies the following six standard categories supported by
103.Nx :
104.Pp
105.Bl -tag -compact -width LC_MONETARYXX
106.It LC_COLLATE
107string-collation order information
108.It LC_CTYPE
109character classification, case conversion, and other character attributes
110.It LC_MESSAGES
111the format for affirmative and negative responses
112.It LC_MONETARY
113rules and symbols for formatting monetary numeric information
114.It LC_NUMERIC
115rules and symbols for formatting nonmonetary numeric information
116.It LC_TIME
117rules and symbols for formatting time and date information
118.El
119.Pp
120Localization of the system is achieved by setting appropriate values
121in environment variables to identify which locale should be used.
122The environment variables have the same names as their respective
123locale categories.
124Additionally, the
125.Ev LANG ,
126.Ev LC_ALL ,
127and
128.Ev NLSPATH
129environment variables are used.
130The
131.Ev NLSPATH
132environment variable specifies a colon-separated list of directory names
133where the message catalog files of the NLS database are located.
134The
135.Ev LC_ALL
136and
137.Ev LANG
138environment variables also determine the current locale.
139.Pp
140The values of these environment variables contains a string format as:
141.Pp
142.Bd -literal
143	language[_territory][.codeset][@modifier]
144.Ed
145.Pp
146Valid values for the language field come from the ISO639 standard which
147defines two-character codes for many languages.
148Some common language codes are:
149.Pp
150.nf
151.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
152\fILanguage Name\fP	\fICode\fP	\fILanguage Family\fP
153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
154.sp 5p
155ABKHAZIAN	AB	IBERO-CAUCASIAN
156AFAN (OROMO)	OM	HAMITIC
157AFAR	AA	HAMITIC
158AFRIKAANS	AF	GERMANIC
159ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
160AMHARIC	AM	SEMITIC
161ARABIC	AR	SEMITIC
162ARMENIAN	HY	INDO-EUROPEAN (OTHER)
163ASSAMESE	AS	INDIAN
164AYMARA	AY	AMERINDIAN
165AZERBAIJANI	AZ	TURKIC/ALTAIC
166BASHKIR	BA	TURKIC/ALTAIC
167BASQUE	EU	BASQUE
168BENGALI	BN	INDIAN
169BHUTANI	DZ	ASIAN
170BIHARI	BH	INDIAN
171BISLAMA	BI
172BRETON	BR	CELTIC
173BULGARIAN	BG	SLAVIC
174BURMESE	MY	ASIAN
175BYELORUSSIAN	BE	SLAVIC
176CAMBODIAN	KM	ASIAN
177CATALAN	CA	ROMANCE
178CHINESE	ZH	ASIAN
179CORSICAN	CO	ROMANCE
180CROATIAN	HR	SLAVIC
181CZECH	CS	SLAVIC
182DANISH	DA	GERMANIC
183DUTCH	NL	GERMANIC
184ENGLISH	EN	GERMANIC
185ESPERANTO	EO	INTERNATIONAL AUX.
186ESTONIAN	ET	FINNO-UGRIC
187FAROESE	FO	GERMANIC
188FIJI	FJ	OCEANIC/INDONESIAN
189FINNISH	FI	FINNO-UGRIC
190FRENCH	FR	ROMANCE
191FRISIAN	FY	GERMANIC
192GALICIAN	GL	ROMANCE
193GEORGIAN	KA	IBERO-CAUCASIAN
194GERMAN	DE	GERMANIC
195GREEK	EL	LATIN/GREEK
196GREENLANDIC	KL	ESKIMO
197GUARANI	GN	AMERINDIAN
198GUJARATI	GU	INDIAN
199HAUSA	HA	NEGRO-AFRICAN
200HEBREW	HE	SEMITIC
201HINDI	HI	INDIAN
202HUNGARIAN	HU	FINNO-UGRIC
203ICELANDIC	IS	GERMANIC
204INDONESIAN	ID	OCEANIC/INDONESIAN
205INTERLINGUA	IA	INTERNATIONAL AUX.
206INTERLINGUE	IE	INTERNATIONAL AUX.
207INUKTITUT	IU
208INUPIAK	IK	ESKIMO
209IRISH	GA	CELTIC
210ITALIAN	IT	ROMANCE
211JAPANESE	JA	ASIAN
212JAVANESE	JV	OCEANIC/INDONESIAN
213KANNADA	KN	DRAVIDIAN
214KASHMIRI	KS	INDIAN
215KAZAKH	KK	TURKIC/ALTAIC
216KINYARWANDA	RW	NEGRO-AFRICAN
217KIRGHIZ	KY	TURKIC/ALTAIC
218KURUNDI	RN	NEGRO-AFRICAN
219KOREAN	KO	ASIAN
220KURDISH	KU	IRANIAN
221LAOTHIAN	LO	ASIAN
222LATIN	LA	LATIN/GREEK
223LATVIAN	LV	BALTIC
224LINGALA	LN	NEGRO-AFRICAN
225LITHUANIAN	LT	BALTIC
226MACEDONIAN	MK	SLAVIC
227MALAGASY	MG	OCEANIC/INDONESIAN
228MALAY	MS	OCEANIC/INDONESIAN
229MALAYALAM	ML	DRAVIDIAN
230MALTESE	MT	SEMITIC
231MAORI	MI	OCEANIC/INDONESIAN
232MARATHI	MR	INDIAN
233MOLDAVIAN	MO	ROMANCE
234MONGOLIAN	MN
235NAURU	NA
236NEPALI	NE	INDIAN
237NORWEGIAN	NO	GERMANIC
238OCCITAN	OC	ROMANCE
239ORIYA	OR	INDIAN
240PASHTO	PS	IRANIAN
241PERSIAN (farsi)	FA	IRANIAN
242POLISH	PL	SLAVIC
243PORTUGUESE	PT	ROMANCE
244PUNJABI	PA	INDIAN
245QUECHUA	QU	AMERINDIAN
246RHAETO-ROMANCE  RM	ROMANCE
247ROMANIAN	RO	ROMANCE
248RUSSIAN	RU	SLAVIC
249SAMOAN	SM	OCEANIC/INDONESIAN
250SANGHO	SG	NEGRO-AFRICAN
251SANSKRIT	SA	INDIAN
252SCOTS GAELIC	GD	CELTIC
253SERBIAN	SR	SLAVIC
254SERBO-CROATIAN  SH	SLAVIC
255SESOTHO	ST	NEGRO-AFRICAN
256SETSWANA	TN	NEGRO-AFRICAN
257SHONA	SN	NEGRO-AFRICAN
258SINDHI	SD	INDIAN
259SINGHALESE	SI	INDIAN
260SISWATI	SS	NEGRO-AFRICAN
261SLOVAK	SK	SLAVIC
262SLOVENIAN	SL	SLAVIC
263SOMALI	SO	HAMITIC
264SPANISH	ES	ROMANCE
265SUNDANESE	SU	OCEANIC/INDONESIAN
266SWAHILI	SW	NEGRO-AFRICAN
267SWEDISH	SV	GERMANIC
268TAGALOG	TL	OCEANIC/INDONESIAN
269TAJIK	TG	IRANIAN
270TAMIL	TA	DRAVIDIAN
271TATAR	TT	TURKIC/ALTAIC
272TELUGU	TE	DRAVIDIAN
273THAI	TH	ASIAN
274TIBETAN	BO	ASIAN
275TIGRINYA	TI	SEMITIC
276TONGA	TO	OCEANIC/INDONESIAN
277TSONGA	TS	NEGRO-AFRICAN
278TURKISH	TR	TURKIC/ALTAIC
279TURKMEN	TK	TURKIC/ALTAIC
280TWI	TW	NEGRO-AFRICAN
281UIGUR	UG
282UKRAINIAN	UK	SLAVIC
283URDU	UR	INDIAN
284UZBEK	UZ	TURKIC/ALTAIC
285VIETNAMESE	VI	ASIAN
286VOLAPUK	VO	INTERNATIONAL AUX.
287WELSH	CY	CELTIC
288WOLOF	WO	NEGRO-AFRICAN
289XHOSA	XH	NEGRO-AFRICAN
290YIDDISH	YI	GERMANIC
291YORUBA	YO	NEGRO-AFRICAN
292ZHUANG	ZA
293ZULU	ZU	NEGRO-AFRICAN
294.ta
295.fi
296.Pp
297For example, the locale for the Danish language spoken in Denmark
298using the ISO8859-1 character set is da_DK.ISO8859-1.
299The da stands for the Danish language and the DK stands for Denmark.
300The short form of da_DK is sufficient to indicate this locale.
301.Pp
302The environment variable settings are queried by their priority level
303in the following manner:
304.Pp
305.Bl -bullet
306.It
307If the
308.Ev LC_ALL
309environment variable is set, all six categories use the locale it
310specifies.
311.It
312If the
313.Ev LC_ALL
314environment variable is not set, each individual category uses the
315locale specified by its corresponding environment variable.
316.It
317If the
318.Ev LC_ALL
319environment variable is not set, and a value for a particular
320.Ev LC_*
321environment variable is not set, the value of the
322.Ev LANG
323environment variable specifies the default locale for all categories.
324Only the
325.Ev LANG
326environment variable should be set in /etc/profile, since it makes it
327most easy for the user to override the system default using the individual
328.Ev LC_*
329variables.
330.It
331If the
332.Ev LC_ALL
333environment variable is not set, a value for a particular
334.Ev LC_*
335environment variable is not set, and the value of the
336.Ev LANG
337environment variable is not set, the locale for that specific
338category defaults to the C locale.
339The C or POSIX locale assumes the 7-bit ASCII character set and defines
340information for the six categories.
341.El
342.Ss Character Sets
343A character is any symbol used for the organization, control, or
344representation of data.
345A group of such symbols used to describe a
346particular language make up a character set.
347It is the encoding values in a character set that provide
348the interface between the system and its input and output devices.
349.Pp
350The following character sets are supported in
351.Nx
352.Bl -tag -width ISO8859_family
353.It ISO8859 family
354Industry-standard character sets are provided by means of the ISO8859
355family of character sets, which provide a range of single-byte character set
356support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
357Greek, and Turkish.
358The eucJP character set is the industry-standard character set used to support
359the Japanese locale.
360.It Unicode
361A Unicode environment based on the UTF-8 character set is supported for all
362supported language/territories.
363UTF-8 provides character support for most of the major languages of the
364world and can be used in environments where multiple languages must be
365processed simultaneously.
366.El
367.Ss Font Sets
368A font set contains the glyphs to be displayed on the screen for a
369corresponding character in a character set.
370A display must support a suitable font to display a character set.
371If suitable fonts are available to the X server, then X clients can
372include support for different character sets.
373.Xr xterm 1
374includes support for UTF-8 character sets.
375.Xr xfd 1
376is useful for displaying all the characters in an X font.
377.Pp
378The
379.Nx
380.Xr wscons 4
381console provides support for loading fonts using the
382.Xr wsfontload 8
383utility.
384Currently, only fonts for the ISO8859-1 family of character sets are
385supported.
386.Ss Internationalization for Programmers
387To facilitate translations of messages into various languages and to
388make the translated messages available to the program based on a
389user's locale, it is necessary to keep messages separate from the
390programs and provide them in the form of message catalogs that a
391program can access at run time.
392.Pp
393Access to locale information is provided through the
394.Xr setlocale 3
395and
396.Xr nl_langinfo 3
397interfaces.
398See their respective man pages for further information.
399.Pp
400Message source files containing application messages are created by
401the programmer and converted to message catalogs.
402These catalogs are used by the application to retrieve and display
403messages, as needed.
404.Pp
405.Nx
406supports two message catalog interfaces: the X/Open
407.Xr catgets 3
408interface and the Uniforum
409.Xr gettext 3
410interface.
411The
412.Xr catgets 3
413interface has the advantage that it belongs to a standard which is
414well supported.
415Unfortunately the interface is complicated to use and
416maintenance of the catalogs is difficult.
417The implementation also doesn't support different character sets.
418The
419.Xr gettext 3
420interface has not been standardized yet, however it is being supported
421by an increasing number of systems.
422It also provides many additional tools which make programming and
423catalog maintenance much easier.
424.Ss Support for Multibyte Characters and Wide Characters
425Character sets with multibyte characters may be difficult to decode, or may
426contain state (i.e., adjacent characters are dependent).
427ISO C specifies a set of functions using 'wide characters' which can handle
428multibyte characters properly.
429A wide character is specified in ISO C
430as being a fixed number of bits wide and is stateless.
431.Pp
432There are two types for wide characters:
433.Em wchar_t
434and
435.Em wint_t .
436.Em wchar_t
437is a type which can contain one wide character and operates like 'char'
438type does for one character.
439.Em wint_t
440can contain one wide character or WEOF (wide EOF).
441.Pp
442There are functions that operate on
443.Em wchar_t ,
444and substitute for functions operating on 'char'.
445See
446.Xr wmemchr 3
447and
448.Xr towlower 3
449for details.
450There are some additional functions that operate on
451.Em wchar_t .
452See
453.Xr wctype 3
454and
455.Xr wctran 3
456for details.
457.Pp
458Wide characters should be used for all I/O processing which may rely
459on locale-specific strings.
460The two primary issues requiring special use of wide characters are:
461.Bl -bullet -offset indent
462.It
463All I/O is performed using multibyte characters.
464Input data is converted into wide characters immediately after
465reading and data for output is converted from wide characters to
466multibyte characters immediately before writing.
467Conversion is achieved using
468.Xr mbstowcs 3 ,
469.Xr mbsrtowcs 3 ,
470.Xr wcstombs 3 ,
471.Xr wcsrtombs 3 ,
472.Xr mblen 3 ,
473.Xr mbrlen 3 ,
474and
475.Xr  mbsinit 3 .
476.It
477Wide characters are used directly for I/O, using
478.Xr getwchar 3 ,
479.Xr fgetwc 3 ,
480.Xr getwc 3 ,
481.Xr ungetwc 3 ,
482.Xr fgetws 3 ,
483.Xr putwchar 3 ,
484.Xr fputwc 3 ,
485.Xr putwc 3 ,
486and
487.Xr fputws 3 .
488They are also used for formatted I/O functions for wide characters
489such as
490.Xr fwscanf 3 ,
491.Xr wscanf 3 ,
492.Xr swscanf 3 ,
493.Xr fwprintf 3 ,
494.Xr wprintf 3 ,
495.Xr swprintf 3 ,
496.Xr vfwprintf 3 ,
497.Xr vwprintf 3 ,
498and
499.Xr vswprintf 3 ,
500and wide character identifier of %lc, %C, %ls, %S for conventional
501formatted I/O functions.
502.El
503.Sh SEE ALSO
504.Xr gencat 1 ,
505.Xr xfd 1 ,
506.Xr xterm 1 ,
507.Xr catgets 3 ,
508.Xr gettext 3 ,
509.Xr nl_langinfo 3 ,
510.Xr setlocale 3 ,
511.Xr wsfontload 8
512.Sh BUGS
513This man page is incomplete.
514