1.\" $OpenBSD: mbrtoc16.3,v 1.1 2023/08/20 15:02:51 schwarze Exp $ 2.\" 3.\" Copyright 2023 Ingo Schwarze <schwarze@openbsd.org> 4.\" Copyright 2010 Stefan Sperling <stsp@openbsd.org> 5.\" 6.\" Permission to use, copy, modify, and distribute this software for any 7.\" purpose with or without fee is hereby granted, provided that the above 8.\" copyright notice and this permission notice appear in all copies. 9.\" 10.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 11.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 12.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 13.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 14.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 15.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 16.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 17.\" 18.Dd $Mdocdate: August 20 2023 $ 19.Dt MBRTOC16 3 20.Os 21.Sh NAME 22.Nm mbrtoc16 23.Nd convert one UTF-8 encoded character to UTF-16 24.Sh SYNOPSIS 25.In uchar.h 26.Ft size_t 27.Fo mbrtoc16 28.Fa "char16_t * restrict pc16" 29.Fa "const char * restrict s" 30.Fa "size_t n" 31.Fa "mbstate_t * restrict mbs" 32.Fc 33.Sh DESCRIPTION 34The 35.Fn mbrtoc16 36function examines at most 37.Fa n 38bytes of the multibyte character byte string pointed to by 39.Fa s , 40converts those bytes to a wide character, 41and encodes the wide character using UTF-16. 42In some cases, it is necessary to call this function 43twice to convert a single character. 44.Pp 45Conversion happens in accordance with the conversion state 46.Pf * Fa mbs , 47which must be initialized to zero before the application's first call to 48.Fn mbrtoc16 . 49For this function, 50.Pf * Fa mbs 51stores information about both the state of the UTF-8 input encoding 52and the state of the UTF-16 output encoding. 53If the previous call did not return 54.Po Vt size_t Pc Ns \-1 , 55.Fa mbs 56can safely be reused without reinitialization. 57.Pp 58The input encoding that 59.Fn mbrtoc16 60uses for 61.Fa s 62is determined by the 63.Dv LC_CTYPE 64category of the current locale. 65If the locale is changed without reinitialization of 66.Pf * Fa mbs , 67the behaviour is undefined. 68.Pp 69Unlike 70.Xr mbtowc 3 , 71.Fn mbrtoc16 72accepts an incomplete byte sequence pointed to by 73.Fa s 74which does not form a complete character but is potentially part of 75a valid character. 76In this case, the function consumes all such bytes. 77The conversion state saved in 78.Pf * Fa mbs 79will be used to restart the suspended conversion during the next call. 80.Pp 81On systems other than 82.Ox 83that support state-dependent encodings, 84.Fa s 85may point to a special sequence of bytes called a 86.Dq shift sequence ; 87see 88.Xr mbrtowc 3 89for details. 90.Pp 91The following arguments cause special processing: 92.Bl -tag -width 012345678901 93.It Fa pc16 No == Dv NULL 94The conversion from a multibyte character to a wide character is performed 95and the conversion state may be affected, but the resulting wide character 96is discarded. 97.It Fa s No == Dv NULL 98The arguments 99.Fa pc16 100and 101.Fa n 102are ignored and starting or continuing the conversion with an empty string 103is attempted, discarding the conversion result. 104.It Fa mbs No == Dv NULL 105An internal 106.Vt mbstate_t 107object specific to the 108.Fn mbrtoc16 109function is used instead of the 110.Fa mbs 111argument. 112This internal object is automatically initialized at program startup 113and never changed by any 114.Em libc 115function except 116.Fn mbrtoc16 . 117.Pp 118If 119.Fn mbrtoc16 120is called with a 121.Dv NULL 122.Fa mbs 123argument and that call returns 124.Po Vt size_t Pc Ns \-1 , 125the internal conversion state of 126.Fn mbrtoc16 127becomes permanently undefined and there is no way 128to reset it to any defined state. 129Consequently, after such a mishap, it is not safe to call 130.Fn mbrtoc16 131with a 132.Dv NULL 133.Fa mbs 134argument ever again until the program is terminated. 135.El 136.Sh RETURN VALUES 137.Bl -tag -width 012345678901 138.It 0 139The bytes pointed to by 140.Fa s 141form a terminating NUL character. 142If 143.Fa pc16 144is not 145.Dv NULL , 146a NUL wide character has been stored in 147.Pf * Fa pc16 . 148.It positive 149.Fa s 150points to a valid character, and the value returned is the number of 151bytes completing the character. 152If 153.Fa pc16 154is not 155.Dv NULL , 156the first UTF-16 code unit of the corresponding wide character 157has been stored in 158.Pf * Fa pc16 . 159If it is an UTF-16 high surrogate, the function needs to be called 160again to retrieve a second UTF-16 code unit, the low surrogate. 161On 162.Ox , 163this happens if and only if the return value is 4, 164but this equivalence does not hold on other operating systems 165that support input encodings other than UTF-8. 166.It Po Vt size_t Pc Ns \-1 167.Fa s 168points to an illegal byte sequence which does not form a valid multibyte 169character in the current locale, or 170.Fa mbs 171points to an invalid or uninitialized object. 172.Va errno 173is set to 174.Er EILSEQ 175or 176.Er EINVAL , 177respectively. 178The conversion state object pointed to by 179.Fa mbs 180is left in an undefined state and must be reinitialized before being 181used again. 182.It Po Vt size_t Pc Ns \-2 183.Fa s 184points to an incomplete byte sequence of length 185.Fa n 186which has been consumed and contains part of a valid multibyte character. 187The character may be completed by calling the same function again with 188.Fa s 189pointing to one or more subsequent bytes of the multibyte character and 190.Fa mbs 191pointing to the conversion state object used during conversion of the 192incomplete byte sequence. 193.It Po Vt size_t Pc Ns \-3 194The second 16-bit code unit resulting from a previous call 195has been stored into 196.Pf * Fa pc16 , 197without consuming any additional bytes from 198.Fa s . 199.El 200.Sh ERRORS 201.Fn mbrtoc16 202causes an error in the following cases: 203.Bl -tag -width Er 204.It Bq Er EILSEQ 205.Fa s 206points to an invalid multibyte character. 207.It Bq Er EINVAL 208.Fa mbs 209points to an invalid or uninitialized 210.Vt mbstate_t 211object. 212.El 213.Sh SEE ALSO 214.Xr c16rtomb 3 , 215.Xr mbrtowc 3 , 216.Xr setlocale 3 217.Sh STANDARDS 218.Fn mbrtoc16 219conforms to 220.St -isoC-2011 . 221.Sh HISTORY 222.Fn mbrtoc16 223has been available since 224.Ox 7.4 . 225.Sh CAVEATS 226On operating systems other than 227.Ox 228that support input encodings other than UTF-8, inspecting the return value 229is insufficient to tell whether the function needs to be called again. 230If the return value is positive, inspecting 231.Pf * Fa pc16 232is also required to make that decision. 233Consequently, passing a 234.Dv NULL 235pointer for the 236.Fa pc16 237argument is discouraged because it can result 238in a well-defined but unknown output encoding state. 239The simplest way to recover from such an unknown state is to 240reinitialize the object pointed to by 241.Fa mbs . 242.Pp 243The C11 standard only requires the 244.Fa pc16 245argument to be encoded according to UTF-16 246if the predefined environment macro 247.Dv __STDC_UTF_16__ 248is defined with a value of 1. 249On 250.Ox , 251.In uchar.h 252provides this definition. 253Other operating systems which do not define 254.Dv __STDC_UTF_16__ 255could theoretically use a different, 256implementation-defined output encoding for 257.Fa pc16 258instead of UTF-16. 259Writing portable code for an arbitrary output encoding is impossible 260because the rules when and how often the function needs to be called 261again depend on the output encoding; the rules explained above are 262specific to UTF-16. 263Using UTF-16 as the output encoding of 264.Fn wcrtoc16 265becomes mandatory in C23. 266