xref: /csrg-svn/lib/libc/locale/multibyte.3 (revision 58391)
1*58391Sbostic.\" Copyright (c) 1993 The Regents of the University of California.
2*58391Sbostic.\" All rights reserved.
3*58391Sbostic.\"
4*58391Sbostic.\" This code is derived from software contributed to Berkeley by
5*58391Sbostic.\" Donn Seeley of BSDI.
6*58391Sbostic.\"
7*58391Sbostic.\" %sccs.include.redist.roff%
8*58391Sbostic.\"
9*58391Sbostic.\"	@(#)multibyte.3	5.1 (Berkeley) 03/02/93
10*58391Sbostic.\"
11*58391Sbostic.Dd ""
12*58391Sbostic.Dt MULTIBYTE 3
13*58391Sbostic.Os
14*58391Sbostic.Sh NAME
15*58391Sbostic.Nm mblen ,
16*58391Sbostic.Nm mbstowcs ,
17*58391Sbostic.Nm mbtowc ,
18*58391Sbostic.Nm wcstombs ,
19*58391Sbostic.Nm wctomb
20*58391Sbostic.Nd multibyte character support for C
21*58391Sbostic.Sh SYNOPSIS
22*58391Sbostic.Fd #include <stdlib.h>
23*58391Sbostic.Ft int
24*58391Sbostic.Fn mblen "const char *mbchar" "int nbytes"
25*58391Sbostic.Ft size_t
26*58391Sbostic.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
27*58391Sbostic.Ft int
28*58391Sbostic.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
29*58391Sbostic.Ft size_t
30*58391Sbostic.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
31*58391Sbostic.Ft int
32*58391Sbostic.Fn wctomb "char *mbchar" "wchar_t wchar"
33*58391Sbostic.Sh DESCRIPTION
34*58391SbosticThe basic elements of some written natural languages such as Chinese
35*58391Sbosticcannot be represented uniquely with single C
36*58391Sbostic.Va char Ns s .
37*58391SbosticThe C standard supports two different ways of dealing with
38*58391Sbosticextended natural language encodings,
39*58391Sbostic.Em wide
40*58391Sbosticcharacters and
41*58391Sbostic.Em multibyte
42*58391Sbosticcharacters.
43*58391SbosticWide characters are an internal representation
44*58391Sbosticwhich allows each basic element to map
45*58391Sbosticto a single object of type
46*58391Sbostic.Va wchar_t .
47*58391SbosticMultibyte characters are used for input and output
48*58391Sbosticand code each basic element as a sequence of C
49*58391Sbostic.Va char Ns s .
50*58391SbosticIndividual basic elements may map into one or more
51*58391Sbostic.Pq up to Dv MB_CHAR_MAX
52*58391Sbosticbytes in a multibyte character.
53*58391Sbostic.Pp
54*58391SbosticThe current locale
55*58391Sbostic.Pq Xr setlocale 3
56*58391Sbosticgoverns the interpretation of wide and multibyte characters.
57*58391SbosticThe locale category
58*58391Sbostic.Dv LC_CTYPE
59*58391Sbosticspecifically controls this interpretation.
60*58391SbosticThe
61*58391Sbostic.Va wchar_t
62*58391Sbostictype is wide enough to hold the largest value
63*58391Sbosticin the wide character representations for all locales.
64*58391Sbostic.Pp
65*58391SbosticMultibyte strings may contain
66*58391Sbostic.Sq shift
67*58391Sbosticindicators to switch to and from
68*58391Sbosticparticular modes within the given representation.
69*58391SbosticIf explicit bytes are used to signal shifting,
70*58391Sbosticthese are not recognized as separate characters
71*58391Sbosticbut are lumped with a neighboring character.
72*58391SbosticThere is always a distinguished
73*58391Sbostic.Sq initial
74*58391Sbosticshift state.
75*58391SbosticThe
76*58391Sbostic.Fn mbstowcs
77*58391Sbosticand
78*58391Sbostic.Fn wcstombs
79*58391Sbosticfunctions assume that multibyte strings are interpreted
80*58391Sbosticstarting from the initial shift state.
81*58391SbosticThe
82*58391Sbostic.Fn mblen ,
83*58391Sbostic.Fn mbtowc
84*58391Sbosticand
85*58391Sbostic.Fn wctomb
86*58391Sbosticfunctions maintain static shift state internally.
87*58391SbosticA call with a null
88*58391Sbostic.Fa mbchar
89*58391Sbosticpointer returns nonzero if the current locale requires shift states,
90*58391Sbosticzero otherwise;
91*58391Sbosticif shift states are required, the shift state is reset to the initial state.
92*58391SbosticThe internal shift states are undefined after a call to
93*58391Sbostic.Fn setlocale
94*58391Sbosticwith the
95*58391Sbostic.Dv LC_CTYPE
96*58391Sbosticor
97*58391Sbostic.Dv LC_ALL
98*58391Sbosticcategories.
99*58391Sbostic.Pp
100*58391SbosticFor convenience in processing,
101*58391Sbosticthe wide character with value 0
102*58391Sbostic.Pq the null wide character
103*58391Sbosticis recognized as the wide character string terminator,
104*58391Sbosticand the character with value 0
105*58391Sbostic.Pq the null byte
106*58391Sbosticis recognized as the multibyte character string terminator.
107*58391SbosticNull bytes are not permitted within multibyte characters.
108*58391Sbostic.Pp
109*58391SbosticThe
110*58391Sbostic.Fn mblen
111*58391Sbosticfunction computes the length in bytes
112*58391Sbosticof a multibyte character
113*58391Sbostic.Fa mbchar .
114*58391SbosticUp to
115*58391Sbostic.Fa nbytes
116*58391Sbosticbytes are examined.
117*58391Sbostic.Pp
118*58391SbosticThe
119*58391Sbostic.Fn mbtowc
120*58391Sbosticfunction converts a multibyte character
121*58391Sbostic.Fa mbchar
122*58391Sbosticinto a wide character and stores the result
123*58391Sbosticin the object pointed to by
124*58391Sbostic.Fa wcharp.
125*58391SbosticUp to
126*58391Sbostic.Fa nbytes
127*58391Sbosticbytes are examined.
128*58391Sbostic.Pp
129*58391SbosticThe
130*58391Sbostic.Fn wctomb
131*58391Sbosticfunction converts a wide character
132*58391Sbostic.Fa wchar
133*58391Sbosticinto a multibyte character and stores
134*58391Sbosticthe result in
135*58391Sbostic.Fa mbchar .
136*58391SbosticThe object pointed to by
137*58391Sbostic.Fa mbchar
138*58391Sbosticmust be large enough to accommodate the multibyte character.
139*58391Sbostic.Pp
140*58391SbosticThe
141*58391Sbostic.Fn mbstowcs
142*58391Sbosticfunction converts a multibyte character string
143*58391Sbostic.Fa mbstring
144*58391Sbosticinto a wide character string
145*58391Sbostic.Fa wcstring .
146*58391SbosticNo more than
147*58391Sbostic.Fa nwchars
148*58391Sbosticwide characters are stored.
149*58391SbosticA terminating null wide character is appended if there is room.
150*58391Sbostic.Pp
151*58391SbosticThe
152*58391Sbostic.Fn wcstombs
153*58391Sbosticfunction converts a wide character string
154*58391Sbostic.Fa wcstring
155*58391Sbosticinto a multibyte character string
156*58391Sbostic.Fa mbstring .
157*58391SbosticUp to
158*58391Sbostic.Fa nbytes
159*58391Sbosticbytes are stored in
160*58391Sbostic.Fa mbstring .
161*58391SbosticPartial multibyte characters at the end of the string are not stored.
162*58391SbosticThe multibyte character string is null terminated if there is room.
163*58391Sbostic.Sh "RETURN VALUES
164*58391SbosticIf multibyte characters are not supported in the current locale,
165*58391Sbosticall of these functions will return \-1 if characters can be processed,
166*58391Sbosticotherwise 0.
167*58391Sbostic.Pp
168*58391SbosticIf
169*58391Sbostic.Fa mbchar
170*58391Sbosticis
171*58391Sbostic.Dv NULL ,
172*58391Sbosticthe
173*58391Sbostic.Fn mblen ,
174*58391Sbostic.Fn mbtowc
175*58391Sbosticand
176*58391Sbostic.Fn wctomb
177*58391Sbosticfunctions return nonzero if shift states are supported,
178*58391Sbosticzero otherwise.
179*58391SbosticIf
180*58391Sbostic.Fa mbchar
181*58391Sbosticis valid,
182*58391Sbosticthen these functions return
183*58391Sbosticthe number of bytes processed in
184*58391Sbostic.Fa mbchar ,
185*58391Sbosticor \-1 if no multibyte character
186*58391Sbosticcould be recognized or converted.
187*58391Sbostic.Pp
188*58391SbosticThe
189*58391Sbostic.Fn mbstowcs
190*58391Sbosticfunction returns the number of wide characters converted,
191*58391Sbosticnot counting any terminating null wide character.
192*58391SbosticThe
193*58391Sbostic.Fn wcstombs
194*58391Sbosticfunction returns the number of bytes converted,
195*58391Sbosticnot counting any terminating null byte.
196*58391SbosticIf any invalid multibyte characters are encountered,
197*58391Sbosticboth functions return \-1.
198*58391Sbostic.Sh "SEE ALSO
199*58391Sbostic.Xr setlocale 3
200*58391Sbostic.Sh STANDARDS
201*58391SbosticThe
202*58391Sbostic.Fn mblen ,
203*58391Sbostic.Fn mbstowcs ,
204*58391Sbostic.Fn mbtowc ,
205*58391Sbostic.Fn wcstombs
206*58391Sbosticand
207*58391Sbostic.Fn wctomb
208*58391Sbosticfunctions conform to
209*58391Sbostic.St -ansiC .
210*58391Sbostic.Sh HISTORY
211*58391SbosticThe
212*58391Sbostic.Fn mblen ,
213*58391Sbostic.Fn mbstowcs ,
214*58391Sbostic.Fn mbtowc ,
215*58391Sbostic.Fn wcstombs
216*58391Sbosticand
217*58391Sbostic.Fn wctomb
218*58391Sbosticfunctions are
219*58391Sbostic.Ud
220*58391Sbostic.Sh BUGS
221*58391SbosticThe current implementation supports only the
222*58391Sbostic.Li "\&""C""
223*58391Sbosticlocale.
224*58391SbosticNo multibyte or wide character encodings are recognized.
225