xref: /csrg-svn/lib/libc/locale/multibyte.3 (revision 61137)
1*61137Sbostic.\" Copyright (c) 1993
2*61137Sbostic.\"	The Regents of the University of California.  All rights reserved.
358391Sbostic.\"
458391Sbostic.\" This code is derived from software contributed to Berkeley by
558391Sbostic.\" Donn Seeley of BSDI.
658391Sbostic.\"
758391Sbostic.\" %sccs.include.redist.roff%
858391Sbostic.\"
9*61137Sbostic.\"	@(#)multibyte.3	8.1 (Berkeley) 06/04/93
1058391Sbostic.\"
1158391Sbostic.Dd ""
1258391Sbostic.Dt MULTIBYTE 3
1358391Sbostic.Os
1458391Sbostic.Sh NAME
1558391Sbostic.Nm mblen ,
1658391Sbostic.Nm mbstowcs ,
1758391Sbostic.Nm mbtowc ,
1858391Sbostic.Nm wcstombs ,
1958391Sbostic.Nm wctomb
2058391Sbostic.Nd multibyte character support for C
2158391Sbostic.Sh SYNOPSIS
2258391Sbostic.Fd #include <stdlib.h>
2358391Sbostic.Ft int
2458391Sbostic.Fn mblen "const char *mbchar" "int nbytes"
2558391Sbostic.Ft size_t
2658391Sbostic.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
2758391Sbostic.Ft int
2858391Sbostic.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
2958391Sbostic.Ft size_t
3058391Sbostic.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
3158391Sbostic.Ft int
3258391Sbostic.Fn wctomb "char *mbchar" "wchar_t wchar"
3358391Sbostic.Sh DESCRIPTION
3458391SbosticThe basic elements of some written natural languages such as Chinese
3558391Sbosticcannot be represented uniquely with single C
3658391Sbostic.Va char Ns s .
3758391SbosticThe C standard supports two different ways of dealing with
3858391Sbosticextended natural language encodings,
3958391Sbostic.Em wide
4058391Sbosticcharacters and
4158391Sbostic.Em multibyte
4258391Sbosticcharacters.
4358391SbosticWide characters are an internal representation
4458391Sbosticwhich allows each basic element to map
4558391Sbosticto a single object of type
4658391Sbostic.Va wchar_t .
4758391SbosticMultibyte characters are used for input and output
4858391Sbosticand code each basic element as a sequence of C
4958391Sbostic.Va char Ns s .
5058391SbosticIndividual basic elements may map into one or more
5158391Sbostic.Pq up to Dv MB_CHAR_MAX
5258391Sbosticbytes in a multibyte character.
5358391Sbostic.Pp
5458391SbosticThe current locale
5558391Sbostic.Pq Xr setlocale 3
5658391Sbosticgoverns the interpretation of wide and multibyte characters.
5758391SbosticThe locale category
5858391Sbostic.Dv LC_CTYPE
5958391Sbosticspecifically controls this interpretation.
6058391SbosticThe
6158391Sbostic.Va wchar_t
6258391Sbostictype is wide enough to hold the largest value
6358391Sbosticin the wide character representations for all locales.
6458391Sbostic.Pp
6558391SbosticMultibyte strings may contain
6658391Sbostic.Sq shift
6758391Sbosticindicators to switch to and from
6858391Sbosticparticular modes within the given representation.
6958391SbosticIf explicit bytes are used to signal shifting,
7058391Sbosticthese are not recognized as separate characters
7158391Sbosticbut are lumped with a neighboring character.
7258391SbosticThere is always a distinguished
7358391Sbostic.Sq initial
7458391Sbosticshift state.
7558391SbosticThe
7658391Sbostic.Fn mbstowcs
7758391Sbosticand
7858391Sbostic.Fn wcstombs
7958391Sbosticfunctions assume that multibyte strings are interpreted
8058391Sbosticstarting from the initial shift state.
8158391SbosticThe
8258391Sbostic.Fn mblen ,
8358391Sbostic.Fn mbtowc
8458391Sbosticand
8558391Sbostic.Fn wctomb
8658391Sbosticfunctions maintain static shift state internally.
8758391SbosticA call with a null
8858391Sbostic.Fa mbchar
8958391Sbosticpointer returns nonzero if the current locale requires shift states,
9058391Sbosticzero otherwise;
9158391Sbosticif shift states are required, the shift state is reset to the initial state.
9258391SbosticThe internal shift states are undefined after a call to
9358391Sbostic.Fn setlocale
9458391Sbosticwith the
9558391Sbostic.Dv LC_CTYPE
9658391Sbosticor
9758391Sbostic.Dv LC_ALL
9858391Sbosticcategories.
9958391Sbostic.Pp
10058391SbosticFor convenience in processing,
10158391Sbosticthe wide character with value 0
10258391Sbostic.Pq the null wide character
10358391Sbosticis recognized as the wide character string terminator,
10458391Sbosticand the character with value 0
10558391Sbostic.Pq the null byte
10658391Sbosticis recognized as the multibyte character string terminator.
10758391SbosticNull bytes are not permitted within multibyte characters.
10858391Sbostic.Pp
10958391SbosticThe
11058391Sbostic.Fn mblen
11158391Sbosticfunction computes the length in bytes
11258391Sbosticof a multibyte character
11358391Sbostic.Fa mbchar .
11458391SbosticUp to
11558391Sbostic.Fa nbytes
11658391Sbosticbytes are examined.
11758391Sbostic.Pp
11858391SbosticThe
11958391Sbostic.Fn mbtowc
12058391Sbosticfunction converts a multibyte character
12158391Sbostic.Fa mbchar
12258391Sbosticinto a wide character and stores the result
12358391Sbosticin the object pointed to by
12458391Sbostic.Fa wcharp.
12558391SbosticUp to
12658391Sbostic.Fa nbytes
12758391Sbosticbytes are examined.
12858391Sbostic.Pp
12958391SbosticThe
13058391Sbostic.Fn wctomb
13158391Sbosticfunction converts a wide character
13258391Sbostic.Fa wchar
13358391Sbosticinto a multibyte character and stores
13458391Sbosticthe result in
13558391Sbostic.Fa mbchar .
13658391SbosticThe object pointed to by
13758391Sbostic.Fa mbchar
13858391Sbosticmust be large enough to accommodate the multibyte character.
13958391Sbostic.Pp
14058391SbosticThe
14158391Sbostic.Fn mbstowcs
14258391Sbosticfunction converts a multibyte character string
14358391Sbostic.Fa mbstring
14458391Sbosticinto a wide character string
14558391Sbostic.Fa wcstring .
14658391SbosticNo more than
14758391Sbostic.Fa nwchars
14858391Sbosticwide characters are stored.
14958391SbosticA terminating null wide character is appended if there is room.
15058391Sbostic.Pp
15158391SbosticThe
15258391Sbostic.Fn wcstombs
15358391Sbosticfunction converts a wide character string
15458391Sbostic.Fa wcstring
15558391Sbosticinto a multibyte character string
15658391Sbostic.Fa mbstring .
15758391SbosticUp to
15858391Sbostic.Fa nbytes
15958391Sbosticbytes are stored in
16058391Sbostic.Fa mbstring .
16158391SbosticPartial multibyte characters at the end of the string are not stored.
16258391SbosticThe multibyte character string is null terminated if there is room.
16358391Sbostic.Sh "RETURN VALUES
16458391SbosticIf multibyte characters are not supported in the current locale,
16558391Sbosticall of these functions will return \-1 if characters can be processed,
16658391Sbosticotherwise 0.
16758391Sbostic.Pp
16858391SbosticIf
16958391Sbostic.Fa mbchar
17058391Sbosticis
17158391Sbostic.Dv NULL ,
17258391Sbosticthe
17358391Sbostic.Fn mblen ,
17458391Sbostic.Fn mbtowc
17558391Sbosticand
17658391Sbostic.Fn wctomb
17758391Sbosticfunctions return nonzero if shift states are supported,
17858391Sbosticzero otherwise.
17958391SbosticIf
18058391Sbostic.Fa mbchar
18158391Sbosticis valid,
18258391Sbosticthen these functions return
18358391Sbosticthe number of bytes processed in
18458391Sbostic.Fa mbchar ,
18558391Sbosticor \-1 if no multibyte character
18658391Sbosticcould be recognized or converted.
18758391Sbostic.Pp
18858391SbosticThe
18958391Sbostic.Fn mbstowcs
19058391Sbosticfunction returns the number of wide characters converted,
19158391Sbosticnot counting any terminating null wide character.
19258391SbosticThe
19358391Sbostic.Fn wcstombs
19458391Sbosticfunction returns the number of bytes converted,
19558391Sbosticnot counting any terminating null byte.
19658391SbosticIf any invalid multibyte characters are encountered,
19758391Sbosticboth functions return \-1.
19858391Sbostic.Sh "SEE ALSO
19961078Sbostic.Xr euc 4 ,
20061078Sbostic.Xr mbrune 3 ,
20161078Sbostic.Xr rune 3 ,
20261078Sbostic.Xr setlocale 3 ,
20361078Sbostic.Xr utf2 4
20458391Sbostic.Sh STANDARDS
20558391SbosticThe
20658391Sbostic.Fn mblen ,
20758391Sbostic.Fn mbstowcs ,
20858391Sbostic.Fn mbtowc ,
20958391Sbostic.Fn wcstombs
21058391Sbosticand
21158391Sbostic.Fn wctomb
21258391Sbosticfunctions conform to
21358391Sbostic.St -ansiC .
21458391Sbostic.Sh BUGS
21561078SbosticThe current implementation does not support shift states.
216