1*58391Sbostic.\" Copyright (c) 1993 The Regents of the University of California. 2*58391Sbostic.\" All rights reserved. 3*58391Sbostic.\" 4*58391Sbostic.\" This code is derived from software contributed to Berkeley by 5*58391Sbostic.\" Donn Seeley of BSDI. 6*58391Sbostic.\" 7*58391Sbostic.\" %sccs.include.redist.roff% 8*58391Sbostic.\" 9*58391Sbostic.\" @(#)multibyte.3 5.1 (Berkeley) 03/02/93 10*58391Sbostic.\" 11*58391Sbostic.Dd "" 12*58391Sbostic.Dt MULTIBYTE 3 13*58391Sbostic.Os 14*58391Sbostic.Sh NAME 15*58391Sbostic.Nm mblen , 16*58391Sbostic.Nm mbstowcs , 17*58391Sbostic.Nm mbtowc , 18*58391Sbostic.Nm wcstombs , 19*58391Sbostic.Nm wctomb 20*58391Sbostic.Nd multibyte character support for C 21*58391Sbostic.Sh SYNOPSIS 22*58391Sbostic.Fd #include <stdlib.h> 23*58391Sbostic.Ft int 24*58391Sbostic.Fn mblen "const char *mbchar" "int nbytes" 25*58391Sbostic.Ft size_t 26*58391Sbostic.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 27*58391Sbostic.Ft int 28*58391Sbostic.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 29*58391Sbostic.Ft size_t 30*58391Sbostic.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 31*58391Sbostic.Ft int 32*58391Sbostic.Fn wctomb "char *mbchar" "wchar_t wchar" 33*58391Sbostic.Sh DESCRIPTION 34*58391SbosticThe basic elements of some written natural languages such as Chinese 35*58391Sbosticcannot be represented uniquely with single C 36*58391Sbostic.Va char Ns s . 37*58391SbosticThe C standard supports two different ways of dealing with 38*58391Sbosticextended natural language encodings, 39*58391Sbostic.Em wide 40*58391Sbosticcharacters and 41*58391Sbostic.Em multibyte 42*58391Sbosticcharacters. 43*58391SbosticWide characters are an internal representation 44*58391Sbosticwhich allows each basic element to map 45*58391Sbosticto a single object of type 46*58391Sbostic.Va wchar_t . 47*58391SbosticMultibyte characters are used for input and output 48*58391Sbosticand code each basic element as a sequence of C 49*58391Sbostic.Va char Ns s . 50*58391SbosticIndividual basic elements may map into one or more 51*58391Sbostic.Pq up to Dv MB_CHAR_MAX 52*58391Sbosticbytes in a multibyte character. 53*58391Sbostic.Pp 54*58391SbosticThe current locale 55*58391Sbostic.Pq Xr setlocale 3 56*58391Sbosticgoverns the interpretation of wide and multibyte characters. 57*58391SbosticThe locale category 58*58391Sbostic.Dv LC_CTYPE 59*58391Sbosticspecifically controls this interpretation. 60*58391SbosticThe 61*58391Sbostic.Va wchar_t 62*58391Sbostictype is wide enough to hold the largest value 63*58391Sbosticin the wide character representations for all locales. 64*58391Sbostic.Pp 65*58391SbosticMultibyte strings may contain 66*58391Sbostic.Sq shift 67*58391Sbosticindicators to switch to and from 68*58391Sbosticparticular modes within the given representation. 69*58391SbosticIf explicit bytes are used to signal shifting, 70*58391Sbosticthese are not recognized as separate characters 71*58391Sbosticbut are lumped with a neighboring character. 72*58391SbosticThere is always a distinguished 73*58391Sbostic.Sq initial 74*58391Sbosticshift state. 75*58391SbosticThe 76*58391Sbostic.Fn mbstowcs 77*58391Sbosticand 78*58391Sbostic.Fn wcstombs 79*58391Sbosticfunctions assume that multibyte strings are interpreted 80*58391Sbosticstarting from the initial shift state. 81*58391SbosticThe 82*58391Sbostic.Fn mblen , 83*58391Sbostic.Fn mbtowc 84*58391Sbosticand 85*58391Sbostic.Fn wctomb 86*58391Sbosticfunctions maintain static shift state internally. 87*58391SbosticA call with a null 88*58391Sbostic.Fa mbchar 89*58391Sbosticpointer returns nonzero if the current locale requires shift states, 90*58391Sbosticzero otherwise; 91*58391Sbosticif shift states are required, the shift state is reset to the initial state. 92*58391SbosticThe internal shift states are undefined after a call to 93*58391Sbostic.Fn setlocale 94*58391Sbosticwith the 95*58391Sbostic.Dv LC_CTYPE 96*58391Sbosticor 97*58391Sbostic.Dv LC_ALL 98*58391Sbosticcategories. 99*58391Sbostic.Pp 100*58391SbosticFor convenience in processing, 101*58391Sbosticthe wide character with value 0 102*58391Sbostic.Pq the null wide character 103*58391Sbosticis recognized as the wide character string terminator, 104*58391Sbosticand the character with value 0 105*58391Sbostic.Pq the null byte 106*58391Sbosticis recognized as the multibyte character string terminator. 107*58391SbosticNull bytes are not permitted within multibyte characters. 108*58391Sbostic.Pp 109*58391SbosticThe 110*58391Sbostic.Fn mblen 111*58391Sbosticfunction computes the length in bytes 112*58391Sbosticof a multibyte character 113*58391Sbostic.Fa mbchar . 114*58391SbosticUp to 115*58391Sbostic.Fa nbytes 116*58391Sbosticbytes are examined. 117*58391Sbostic.Pp 118*58391SbosticThe 119*58391Sbostic.Fn mbtowc 120*58391Sbosticfunction converts a multibyte character 121*58391Sbostic.Fa mbchar 122*58391Sbosticinto a wide character and stores the result 123*58391Sbosticin the object pointed to by 124*58391Sbostic.Fa wcharp. 125*58391SbosticUp to 126*58391Sbostic.Fa nbytes 127*58391Sbosticbytes are examined. 128*58391Sbostic.Pp 129*58391SbosticThe 130*58391Sbostic.Fn wctomb 131*58391Sbosticfunction converts a wide character 132*58391Sbostic.Fa wchar 133*58391Sbosticinto a multibyte character and stores 134*58391Sbosticthe result in 135*58391Sbostic.Fa mbchar . 136*58391SbosticThe object pointed to by 137*58391Sbostic.Fa mbchar 138*58391Sbosticmust be large enough to accommodate the multibyte character. 139*58391Sbostic.Pp 140*58391SbosticThe 141*58391Sbostic.Fn mbstowcs 142*58391Sbosticfunction converts a multibyte character string 143*58391Sbostic.Fa mbstring 144*58391Sbosticinto a wide character string 145*58391Sbostic.Fa wcstring . 146*58391SbosticNo more than 147*58391Sbostic.Fa nwchars 148*58391Sbosticwide characters are stored. 149*58391SbosticA terminating null wide character is appended if there is room. 150*58391Sbostic.Pp 151*58391SbosticThe 152*58391Sbostic.Fn wcstombs 153*58391Sbosticfunction converts a wide character string 154*58391Sbostic.Fa wcstring 155*58391Sbosticinto a multibyte character string 156*58391Sbostic.Fa mbstring . 157*58391SbosticUp to 158*58391Sbostic.Fa nbytes 159*58391Sbosticbytes are stored in 160*58391Sbostic.Fa mbstring . 161*58391SbosticPartial multibyte characters at the end of the string are not stored. 162*58391SbosticThe multibyte character string is null terminated if there is room. 163*58391Sbostic.Sh "RETURN VALUES 164*58391SbosticIf multibyte characters are not supported in the current locale, 165*58391Sbosticall of these functions will return \-1 if characters can be processed, 166*58391Sbosticotherwise 0. 167*58391Sbostic.Pp 168*58391SbosticIf 169*58391Sbostic.Fa mbchar 170*58391Sbosticis 171*58391Sbostic.Dv NULL , 172*58391Sbosticthe 173*58391Sbostic.Fn mblen , 174*58391Sbostic.Fn mbtowc 175*58391Sbosticand 176*58391Sbostic.Fn wctomb 177*58391Sbosticfunctions return nonzero if shift states are supported, 178*58391Sbosticzero otherwise. 179*58391SbosticIf 180*58391Sbostic.Fa mbchar 181*58391Sbosticis valid, 182*58391Sbosticthen these functions return 183*58391Sbosticthe number of bytes processed in 184*58391Sbostic.Fa mbchar , 185*58391Sbosticor \-1 if no multibyte character 186*58391Sbosticcould be recognized or converted. 187*58391Sbostic.Pp 188*58391SbosticThe 189*58391Sbostic.Fn mbstowcs 190*58391Sbosticfunction returns the number of wide characters converted, 191*58391Sbosticnot counting any terminating null wide character. 192*58391SbosticThe 193*58391Sbostic.Fn wcstombs 194*58391Sbosticfunction returns the number of bytes converted, 195*58391Sbosticnot counting any terminating null byte. 196*58391SbosticIf any invalid multibyte characters are encountered, 197*58391Sbosticboth functions return \-1. 198*58391Sbostic.Sh "SEE ALSO 199*58391Sbostic.Xr setlocale 3 200*58391Sbostic.Sh STANDARDS 201*58391SbosticThe 202*58391Sbostic.Fn mblen , 203*58391Sbostic.Fn mbstowcs , 204*58391Sbostic.Fn mbtowc , 205*58391Sbostic.Fn wcstombs 206*58391Sbosticand 207*58391Sbostic.Fn wctomb 208*58391Sbosticfunctions conform to 209*58391Sbostic.St -ansiC . 210*58391Sbostic.Sh HISTORY 211*58391SbosticThe 212*58391Sbostic.Fn mblen , 213*58391Sbostic.Fn mbstowcs , 214*58391Sbostic.Fn mbtowc , 215*58391Sbostic.Fn wcstombs 216*58391Sbosticand 217*58391Sbostic.Fn wctomb 218*58391Sbosticfunctions are 219*58391Sbostic.Ud 220*58391Sbostic.Sh BUGS 221*58391SbosticThe current implementation supports only the 222*58391Sbostic.Li "\&""C"" 223*58391Sbosticlocale. 224*58391SbosticNo multibyte or wide character encodings are recognized. 225