xref: /netbsd-src/share/man/man7/nls.7 (revision 404fbe5fb94ca1e054339640cabb2801ce52dd30)
1.\"     $NetBSD: nls.7,v 1.14 2008/04/30 13:10:57 martin Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.Dd February 21, 2007
31.Dt NLS 7
32.Os
33.Sh NAME
34.Nm NLS
35.Nd Native Language Support Overview
36.Sh DESCRIPTION
37Native Language Support (NLS) provides commands for a single
38worldwide operating system base.
39An internationalized system has no built-in assumptions or dependencies
40on language-specific or cultural-specific conventions such as:
41.Pp
42.Bl -bullet -offset indent -compact
43.It
44Character classifications
45.It
46Character comparison rules
47.It
48Character collation order
49.It
50Numeric and monetary formatting
51.It
52Date and time formatting
53.It
54Message-text language
55.It
56Character sets
57.El
58.Pp
59All information pertaining to cultural conventions and language is
60obtained at program run time.
61.Pp
62.Dq Internationalization
63(often abbreviated
64.Dq i18n )
65refers to the operation by which system software is developed to support
66multiple cultural-specific and language-specific conventions.
67This is a generalization process by which the system is untied from
68calling only English strings or other English-specific conventions.
69.Dq Localization
70(often abbreviated
71.Dq l10n )
72refers to the operations by which the user environment is customized to
73handle its input and output appropriate for specific language and cultural
74conventions.
75This is a specialization process, by which generic methods already
76implemented in an internationalized system are used in specific ways.
77The formal description of cultural conventions for some country, together
78with all associated translations targeted to the native language, is
79called the
80.Dq locale .
81.Pp
82.Nx
83provides extensive support to programmers and system developers to
84enable internationalized software to be developed.
85.Nx
86also supplies a large variety of locales for system localization.
87.Ss Localization of Information
88All locale information is accessible to programs at run time so that
89data is processed and displayed correctly for specific cultural
90conventions and language.
91.Pp
92A locale is divided into categories.
93A category is a group of language-specific and culture-specific conventions
94as outlined in the list above.
95ISO C specifies the following six standard categories supported by
96.Nx :
97.Pp
98.Bl -tag -compact -width LC_MONETARYXX
99.It Ev LC_COLLATE
100string-collation order information
101.It Ev LC_CTYPE
102character classification, case conversion, and other character attributes
103.It Ev LC_MESSAGES
104the format for affirmative and negative responses
105.It Ev LC_MONETARY
106rules and symbols for formatting monetary numeric information
107.It Ev LC_NUMERIC
108rules and symbols for formatting nonmonetary numeric information
109.It Ev LC_TIME
110rules and symbols for formatting time and date information
111.El
112.Pp
113Localization of the system is achieved by setting appropriate values
114in environment variables to identify which locale should be used.
115The environment variables have the same names as their respective
116locale categories.
117Additionally, the
118.Ev LANG ,
119.Ev LC_ALL ,
120and
121.Ev NLSPATH
122environment variables are used.
123The
124.Ev NLSPATH
125environment variable specifies a colon-separated list of directory names
126where the message catalog files of the NLS database are located.
127The
128.Ev LC_ALL
129and
130.Ev LANG
131environment variables also determine the current locale.
132.Pp
133The values of these environment variables contains a string format as:
134.Pp
135.Bd -literal
136	language[_territory][.codeset][@modifier]
137.Ed
138.Pp
139Valid values for the language field come from the ISO639 standard which
140defines two-character codes for many languages.
141Some common language codes are:
142.Pp
143.nf
144.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
145\fILanguage Name\fP	\fICode\fP	\fILanguage Family\fP
146.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
147.sp 5p
148ABKHAZIAN	AB	IBERO-CAUCASIAN
149AFAN (OROMO)	OM	HAMITIC
150AFAR	AA	HAMITIC
151AFRIKAANS	AF	GERMANIC
152ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
153AMHARIC	AM	SEMITIC
154ARABIC	AR	SEMITIC
155ARMENIAN	HY	INDO-EUROPEAN (OTHER)
156ASSAMESE	AS	INDIAN
157AYMARA	AY	AMERINDIAN
158AZERBAIJANI	AZ	TURKIC/ALTAIC
159BASHKIR	BA	TURKIC/ALTAIC
160BASQUE	EU	BASQUE
161BENGALI	BN	INDIAN
162BHUTANI	DZ	ASIAN
163BIHARI	BH	INDIAN
164BISLAMA	BI
165BRETON	BR	CELTIC
166BULGARIAN	BG	SLAVIC
167BURMESE	MY	ASIAN
168BYELORUSSIAN	BE	SLAVIC
169CAMBODIAN	KM	ASIAN
170CATALAN	CA	ROMANCE
171CHINESE	ZH	ASIAN
172CORSICAN	CO	ROMANCE
173CROATIAN	HR	SLAVIC
174CZECH	CS	SLAVIC
175DANISH	DA	GERMANIC
176DUTCH	NL	GERMANIC
177ENGLISH	EN	GERMANIC
178ESPERANTO	EO	INTERNATIONAL AUX.
179ESTONIAN	ET	FINNO-UGRIC
180FAROESE	FO	GERMANIC
181FIJI	FJ	OCEANIC/INDONESIAN
182FINNISH	FI	FINNO-UGRIC
183FRENCH	FR	ROMANCE
184FRISIAN	FY	GERMANIC
185GALICIAN	GL	ROMANCE
186GEORGIAN	KA	IBERO-CAUCASIAN
187GERMAN	DE	GERMANIC
188GREEK	EL	LATIN/GREEK
189GREENLANDIC	KL	ESKIMO
190GUARANI	GN	AMERINDIAN
191GUJARATI	GU	INDIAN
192HAUSA	HA	NEGRO-AFRICAN
193HEBREW	HE	SEMITIC
194HINDI	HI	INDIAN
195HUNGARIAN	HU	FINNO-UGRIC
196ICELANDIC	IS	GERMANIC
197INDONESIAN	ID	OCEANIC/INDONESIAN
198INTERLINGUA	IA	INTERNATIONAL AUX.
199INTERLINGUE	IE	INTERNATIONAL AUX.
200INUKTITUT	IU
201INUPIAK	IK	ESKIMO
202IRISH	GA	CELTIC
203ITALIAN	IT	ROMANCE
204JAPANESE	JA	ASIAN
205JAVANESE	JV	OCEANIC/INDONESIAN
206KANNADA	KN	DRAVIDIAN
207KASHMIRI	KS	INDIAN
208KAZAKH	KK	TURKIC/ALTAIC
209KINYARWANDA	RW	NEGRO-AFRICAN
210KIRGHIZ	KY	TURKIC/ALTAIC
211KURUNDI	RN	NEGRO-AFRICAN
212KOREAN	KO	ASIAN
213KURDISH	KU	IRANIAN
214LAOTHIAN	LO	ASIAN
215LATIN	LA	LATIN/GREEK
216LATVIAN	LV	BALTIC
217LINGALA	LN	NEGRO-AFRICAN
218LITHUANIAN	LT	BALTIC
219MACEDONIAN	MK	SLAVIC
220MALAGASY	MG	OCEANIC/INDONESIAN
221MALAY	MS	OCEANIC/INDONESIAN
222MALAYALAM	ML	DRAVIDIAN
223MALTESE	MT	SEMITIC
224MAORI	MI	OCEANIC/INDONESIAN
225MARATHI	MR	INDIAN
226MOLDAVIAN	MO	ROMANCE
227MONGOLIAN	MN
228NAURU	NA
229NEPALI	NE	INDIAN
230NORWEGIAN	NO	GERMANIC
231OCCITAN	OC	ROMANCE
232ORIYA	OR	INDIAN
233PASHTO	PS	IRANIAN
234PERSIAN (farsi)	FA	IRANIAN
235POLISH	PL	SLAVIC
236PORTUGUESE	PT	ROMANCE
237PUNJABI	PA	INDIAN
238QUECHUA	QU	AMERINDIAN
239RHAETO-ROMANCE  RM	ROMANCE
240ROMANIAN	RO	ROMANCE
241RUSSIAN	RU	SLAVIC
242SAMOAN	SM	OCEANIC/INDONESIAN
243SANGHO	SG	NEGRO-AFRICAN
244SANSKRIT	SA	INDIAN
245SCOTS GAELIC	GD	CELTIC
246SERBIAN	SR	SLAVIC
247SERBO-CROATIAN  SH	SLAVIC
248SESOTHO	ST	NEGRO-AFRICAN
249SETSWANA	TN	NEGRO-AFRICAN
250SHONA	SN	NEGRO-AFRICAN
251SINDHI	SD	INDIAN
252SINGHALESE	SI	INDIAN
253SISWATI	SS	NEGRO-AFRICAN
254SLOVAK	SK	SLAVIC
255SLOVENIAN	SL	SLAVIC
256SOMALI	SO	HAMITIC
257SPANISH	ES	ROMANCE
258SUNDANESE	SU	OCEANIC/INDONESIAN
259SWAHILI	SW	NEGRO-AFRICAN
260SWEDISH	SV	GERMANIC
261TAGALOG	TL	OCEANIC/INDONESIAN
262TAJIK	TG	IRANIAN
263TAMIL	TA	DRAVIDIAN
264TATAR	TT	TURKIC/ALTAIC
265TELUGU	TE	DRAVIDIAN
266THAI	TH	ASIAN
267TIBETAN	BO	ASIAN
268TIGRINYA	TI	SEMITIC
269TONGA	TO	OCEANIC/INDONESIAN
270TSONGA	TS	NEGRO-AFRICAN
271TURKISH	TR	TURKIC/ALTAIC
272TURKMEN	TK	TURKIC/ALTAIC
273TWI	TW	NEGRO-AFRICAN
274UIGUR	UG
275UKRAINIAN	UK	SLAVIC
276URDU	UR	INDIAN
277UZBEK	UZ	TURKIC/ALTAIC
278VIETNAMESE	VI	ASIAN
279VOLAPUK	VO	INTERNATIONAL AUX.
280WELSH	CY	CELTIC
281WOLOF	WO	NEGRO-AFRICAN
282XHOSA	XH	NEGRO-AFRICAN
283YIDDISH	YI	GERMANIC
284YORUBA	YO	NEGRO-AFRICAN
285ZHUANG	ZA
286ZULU	ZU	NEGRO-AFRICAN
287.ta
288.fi
289.Pp
290For example, the locale for the Danish language spoken in Denmark
291using the ISO 8859-1 character set is da_DK.ISO8859-1.
292The da stands for the Danish language and the DK stands for Denmark.
293The short form of da_DK is sufficient to indicate this locale.
294.Pp
295The environment variable settings are queried by their priority level
296in the following manner:
297.Pp
298.Bl -bullet
299.It
300If the
301.Ev LC_ALL
302environment variable is set, all six categories use the locale it
303specifies.
304.It
305If the
306.Ev LC_ALL
307environment variable is not set, each individual category uses the
308locale specified by its corresponding environment variable.
309.It
310If the
311.Ev LC_ALL
312environment variable is not set, and a value for a particular
313.Ev LC_*
314environment variable is not set, the value of the
315.Ev LANG
316environment variable specifies the default locale for all categories.
317Only the
318.Ev LANG
319environment variable should be set in /etc/profile, since it makes it
320most easy for the user to override the system default using the individual
321.Ev LC_*
322variables.
323.It
324If the
325.Ev LC_ALL
326environment variable is not set, a value for a particular
327.Ev LC_*
328environment variable is not set, and the value of the
329.Ev LANG
330environment variable is not set, the locale for that specific
331category defaults to the C locale.
332The C or POSIX locale assumes the ASCII character set and defines
333information for the six categories.
334.El
335.Ss Character Sets
336A character is any symbol used for the organization, control, or
337representation of data.
338A group of such symbols used to describe a
339particular language make up a character set.
340It is the encoding values in a character set that provide
341the interface between the system and its input and output devices.
342.Pp
343The following character sets are supported in
344.Nx :
345.Bl -tag -width ISO_8859_family
346.It ASCII
347The American Standard Code for Information Exchange (ASCII) standard
348specifies 128 Roman characters and control codes, encoded in a 7-bit
349character encoding scheme.
350.It ISO 8859 family
351Industry-standard character sets specified by the ISO/IEC 8859
352standard.
353The standard is divided into 15 numbered parts, with each
354part specifying broad script similarities.
355Examples include Western European, Central European, Arabic, Cyrillic,
356Hebrew, Greek, and Turkish.
357The character sets use an 8-bit character encoding scheme which is
358compatible with the ASCII character set.
359.It Unicode
360The Unicode character set is the full set of known abstract characters of
361all real-world scripts.  It can be used in environments where multiple
362scripts must be processed simultaneously.
363Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
364Many character encoding schemes are available for Unicode, including UTF-8,
365UTF-16 and UTF-32.
366These encoding schemes are multi-byte encodings.
367The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
368compatible with ASCII.
369The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
370The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
371.El
372.Ss Font Sets
373A font set contains the glyphs to be displayed on the screen for a
374corresponding character in a character set.
375A display must support a suitable font to display a character set.
376If suitable fonts are available to the X server, then X clients can
377include support for different character sets.
378.Xr xterm 1
379includes support for Unicode with UTF-8 encoding.
380.Xr xfd 1
381is useful for displaying all the characters in an X font.
382.Pp
383The
384.Nx
385.Xr wscons 4
386console provides support for loading fonts using the
387.Xr wsfontload 8
388utility.
389Currently, only fonts for the ISO8859-1 family of character sets are
390supported.
391.Ss Internationalization for Programmers
392To facilitate translations of messages into various languages and to
393make the translated messages available to the program based on a
394user's locale, it is necessary to keep messages separate from the
395programs and provide them in the form of message catalogs that a
396program can access at run time.
397.Pp
398Access to locale information is provided through the
399.Xr setlocale 3
400and
401.Xr nl_langinfo 3
402interfaces.
403See their respective man pages for further information.
404.Pp
405Message source files containing application messages are created by
406the programmer and converted to message catalogs.
407These catalogs are used by the application to retrieve and display
408messages, as needed.
409.Pp
410.Nx
411supports two message catalog interfaces: the X/Open
412.Xr catgets 3
413interface and the Uniforum
414.Xr gettext 3
415interface.
416The
417.Xr catgets 3
418interface has the advantage that it belongs to a standard which is
419well supported.
420Unfortunately the interface is complicated to use and
421maintenance of the catalogs is difficult.
422The implementation also doesn't support different character sets.
423The
424.Xr gettext 3
425interface has not been standardized yet, however it is being supported
426by an increasing number of systems.
427It also provides many additional tools which make programming and
428catalog maintenance much easier.
429.Ss Support for Multi-byte Encodings
430Some character sets with multi-byte encodings may be difficult to decode,
431or may contain state (i.e., adjacent characters are dependent).
432ISO C specifies a set of functions using 'wide characters' which can handle
433multi-byte encodings properly.
434The behaviour of these functions is affected
435by the
436.Ev LC_CTYPE
437category of the current locale.
438.Pp
439A wide character is specified in ISO C
440as being a fixed number of bits wide and is stateless.
441There are two types for wide characters:
442.Em wchar_t
443and
444.Em wint_t .
445.Em wchar_t
446is a type which can contain one wide character and operates like 'char'
447type does for one character.
448.Em wint_t
449can contain one wide character or WEOF (wide EOF).
450.Pp
451There are functions that operate on
452.Em wchar_t ,
453and substitute for functions operating on 'char'.
454See
455.Xr wmemchr 3
456and
457.Xr towlower 3
458for details.
459There are some additional functions that operate on
460.Em wchar_t .
461See
462.Xr wctype 3
463and
464.Xr wctrans 3
465for details.
466.Pp
467Wide characters should be used for all I/O processing which may rely
468on locale-specific strings.
469The two primary issues requiring special use of wide characters are:
470.Bl -bullet -offset indent
471.It
472All I/O is performed using multibyte characters.
473Input data is converted into wide characters immediately after
474reading and data for output is converted from wide characters to
475multi-byte encoding immediately before writing.
476Conversion is controlled by the
477.Xr mbstowcs 3 ,
478.Xr mbsrtowcs 3 ,
479.Xr wcstombs 3 ,
480.Xr wcsrtombs 3 ,
481.Xr mblen 3 ,
482.Xr mbrlen 3 ,
483and
484.Xr  mbsinit 3 .
485.It
486Wide characters are used directly for I/O, using
487.Xr getwchar 3 ,
488.Xr fgetwc 3 ,
489.Xr getwc 3 ,
490.Xr ungetwc 3 ,
491.Xr fgetws 3 ,
492.Xr putwchar 3 ,
493.Xr fputwc 3 ,
494.Xr putwc 3 ,
495and
496.Xr fputws 3 .
497They are also used for formatted I/O functions for wide characters
498such as
499.Xr fwscanf 3 ,
500.Xr wscanf 3 ,
501.Xr swscanf 3 ,
502.Xr fwprintf 3 ,
503.Xr wprintf 3 ,
504.Xr swprintf 3 ,
505.Xr vfwprintf 3 ,
506.Xr vwprintf 3 ,
507and
508.Xr vswprintf 3 ,
509and wide character identifier of %lc, %C, %ls, %S for conventional
510formatted I/O functions.
511.El
512.Sh SEE ALSO
513.Xr gencat 1 ,
514.Xr xfd 1 ,
515.Xr xterm 1 ,
516.Xr catgets 3 ,
517.Xr gettext 3 ,
518.Xr nl_langinfo 3 ,
519.Xr setlocale 3 ,
520.Xr wsfontload 8
521.Sh BUGS
522This man page is incomplete.
523