1*61137Sbostic.\" Copyright (c) 1993 2*61137Sbostic.\" The Regents of the University of California. All rights reserved. 358391Sbostic.\" 458391Sbostic.\" This code is derived from software contributed to Berkeley by 558391Sbostic.\" Donn Seeley of BSDI. 658391Sbostic.\" 758391Sbostic.\" %sccs.include.redist.roff% 858391Sbostic.\" 9*61137Sbostic.\" @(#)multibyte.3 8.1 (Berkeley) 06/04/93 1058391Sbostic.\" 1158391Sbostic.Dd "" 1258391Sbostic.Dt MULTIBYTE 3 1358391Sbostic.Os 1458391Sbostic.Sh NAME 1558391Sbostic.Nm mblen , 1658391Sbostic.Nm mbstowcs , 1758391Sbostic.Nm mbtowc , 1858391Sbostic.Nm wcstombs , 1958391Sbostic.Nm wctomb 2058391Sbostic.Nd multibyte character support for C 2158391Sbostic.Sh SYNOPSIS 2258391Sbostic.Fd #include <stdlib.h> 2358391Sbostic.Ft int 2458391Sbostic.Fn mblen "const char *mbchar" "int nbytes" 2558391Sbostic.Ft size_t 2658391Sbostic.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 2758391Sbostic.Ft int 2858391Sbostic.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 2958391Sbostic.Ft size_t 3058391Sbostic.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 3158391Sbostic.Ft int 3258391Sbostic.Fn wctomb "char *mbchar" "wchar_t wchar" 3358391Sbostic.Sh DESCRIPTION 3458391SbosticThe basic elements of some written natural languages such as Chinese 3558391Sbosticcannot be represented uniquely with single C 3658391Sbostic.Va char Ns s . 3758391SbosticThe C standard supports two different ways of dealing with 3858391Sbosticextended natural language encodings, 3958391Sbostic.Em wide 4058391Sbosticcharacters and 4158391Sbostic.Em multibyte 4258391Sbosticcharacters. 4358391SbosticWide characters are an internal representation 4458391Sbosticwhich allows each basic element to map 4558391Sbosticto a single object of type 4658391Sbostic.Va wchar_t . 4758391SbosticMultibyte characters are used for input and output 4858391Sbosticand code each basic element as a sequence of C 4958391Sbostic.Va char Ns s . 5058391SbosticIndividual basic elements may map into one or more 5158391Sbostic.Pq up to Dv MB_CHAR_MAX 5258391Sbosticbytes in a multibyte character. 5358391Sbostic.Pp 5458391SbosticThe current locale 5558391Sbostic.Pq Xr setlocale 3 5658391Sbosticgoverns the interpretation of wide and multibyte characters. 5758391SbosticThe locale category 5858391Sbostic.Dv LC_CTYPE 5958391Sbosticspecifically controls this interpretation. 6058391SbosticThe 6158391Sbostic.Va wchar_t 6258391Sbostictype is wide enough to hold the largest value 6358391Sbosticin the wide character representations for all locales. 6458391Sbostic.Pp 6558391SbosticMultibyte strings may contain 6658391Sbostic.Sq shift 6758391Sbosticindicators to switch to and from 6858391Sbosticparticular modes within the given representation. 6958391SbosticIf explicit bytes are used to signal shifting, 7058391Sbosticthese are not recognized as separate characters 7158391Sbosticbut are lumped with a neighboring character. 7258391SbosticThere is always a distinguished 7358391Sbostic.Sq initial 7458391Sbosticshift state. 7558391SbosticThe 7658391Sbostic.Fn mbstowcs 7758391Sbosticand 7858391Sbostic.Fn wcstombs 7958391Sbosticfunctions assume that multibyte strings are interpreted 8058391Sbosticstarting from the initial shift state. 8158391SbosticThe 8258391Sbostic.Fn mblen , 8358391Sbostic.Fn mbtowc 8458391Sbosticand 8558391Sbostic.Fn wctomb 8658391Sbosticfunctions maintain static shift state internally. 8758391SbosticA call with a null 8858391Sbostic.Fa mbchar 8958391Sbosticpointer returns nonzero if the current locale requires shift states, 9058391Sbosticzero otherwise; 9158391Sbosticif shift states are required, the shift state is reset to the initial state. 9258391SbosticThe internal shift states are undefined after a call to 9358391Sbostic.Fn setlocale 9458391Sbosticwith the 9558391Sbostic.Dv LC_CTYPE 9658391Sbosticor 9758391Sbostic.Dv LC_ALL 9858391Sbosticcategories. 9958391Sbostic.Pp 10058391SbosticFor convenience in processing, 10158391Sbosticthe wide character with value 0 10258391Sbostic.Pq the null wide character 10358391Sbosticis recognized as the wide character string terminator, 10458391Sbosticand the character with value 0 10558391Sbostic.Pq the null byte 10658391Sbosticis recognized as the multibyte character string terminator. 10758391SbosticNull bytes are not permitted within multibyte characters. 10858391Sbostic.Pp 10958391SbosticThe 11058391Sbostic.Fn mblen 11158391Sbosticfunction computes the length in bytes 11258391Sbosticof a multibyte character 11358391Sbostic.Fa mbchar . 11458391SbosticUp to 11558391Sbostic.Fa nbytes 11658391Sbosticbytes are examined. 11758391Sbostic.Pp 11858391SbosticThe 11958391Sbostic.Fn mbtowc 12058391Sbosticfunction converts a multibyte character 12158391Sbostic.Fa mbchar 12258391Sbosticinto a wide character and stores the result 12358391Sbosticin the object pointed to by 12458391Sbostic.Fa wcharp. 12558391SbosticUp to 12658391Sbostic.Fa nbytes 12758391Sbosticbytes are examined. 12858391Sbostic.Pp 12958391SbosticThe 13058391Sbostic.Fn wctomb 13158391Sbosticfunction converts a wide character 13258391Sbostic.Fa wchar 13358391Sbosticinto a multibyte character and stores 13458391Sbosticthe result in 13558391Sbostic.Fa mbchar . 13658391SbosticThe object pointed to by 13758391Sbostic.Fa mbchar 13858391Sbosticmust be large enough to accommodate the multibyte character. 13958391Sbostic.Pp 14058391SbosticThe 14158391Sbostic.Fn mbstowcs 14258391Sbosticfunction converts a multibyte character string 14358391Sbostic.Fa mbstring 14458391Sbosticinto a wide character string 14558391Sbostic.Fa wcstring . 14658391SbosticNo more than 14758391Sbostic.Fa nwchars 14858391Sbosticwide characters are stored. 14958391SbosticA terminating null wide character is appended if there is room. 15058391Sbostic.Pp 15158391SbosticThe 15258391Sbostic.Fn wcstombs 15358391Sbosticfunction converts a wide character string 15458391Sbostic.Fa wcstring 15558391Sbosticinto a multibyte character string 15658391Sbostic.Fa mbstring . 15758391SbosticUp to 15858391Sbostic.Fa nbytes 15958391Sbosticbytes are stored in 16058391Sbostic.Fa mbstring . 16158391SbosticPartial multibyte characters at the end of the string are not stored. 16258391SbosticThe multibyte character string is null terminated if there is room. 16358391Sbostic.Sh "RETURN VALUES 16458391SbosticIf multibyte characters are not supported in the current locale, 16558391Sbosticall of these functions will return \-1 if characters can be processed, 16658391Sbosticotherwise 0. 16758391Sbostic.Pp 16858391SbosticIf 16958391Sbostic.Fa mbchar 17058391Sbosticis 17158391Sbostic.Dv NULL , 17258391Sbosticthe 17358391Sbostic.Fn mblen , 17458391Sbostic.Fn mbtowc 17558391Sbosticand 17658391Sbostic.Fn wctomb 17758391Sbosticfunctions return nonzero if shift states are supported, 17858391Sbosticzero otherwise. 17958391SbosticIf 18058391Sbostic.Fa mbchar 18158391Sbosticis valid, 18258391Sbosticthen these functions return 18358391Sbosticthe number of bytes processed in 18458391Sbostic.Fa mbchar , 18558391Sbosticor \-1 if no multibyte character 18658391Sbosticcould be recognized or converted. 18758391Sbostic.Pp 18858391SbosticThe 18958391Sbostic.Fn mbstowcs 19058391Sbosticfunction returns the number of wide characters converted, 19158391Sbosticnot counting any terminating null wide character. 19258391SbosticThe 19358391Sbostic.Fn wcstombs 19458391Sbosticfunction returns the number of bytes converted, 19558391Sbosticnot counting any terminating null byte. 19658391SbosticIf any invalid multibyte characters are encountered, 19758391Sbosticboth functions return \-1. 19858391Sbostic.Sh "SEE ALSO 19961078Sbostic.Xr euc 4 , 20061078Sbostic.Xr mbrune 3 , 20161078Sbostic.Xr rune 3 , 20261078Sbostic.Xr setlocale 3 , 20361078Sbostic.Xr utf2 4 20458391Sbostic.Sh STANDARDS 20558391SbosticThe 20658391Sbostic.Fn mblen , 20758391Sbostic.Fn mbstowcs , 20858391Sbostic.Fn mbtowc , 20958391Sbostic.Fn wcstombs 21058391Sbosticand 21158391Sbostic.Fn wctomb 21258391Sbosticfunctions conform to 21358391Sbostic.St -ansiC . 21458391Sbostic.Sh BUGS 21561078SbosticThe current implementation does not support shift states. 216