1.\" $NetBSD: mbrtoc16.3,v 1.10 2024/08/23 12:59:49 riastradh Exp $ 2.\" 3.\" Copyright (c) 2024 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 16.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 17.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 18.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 19.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 20.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 21.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 22.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 23.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 24.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 25.\" POSSIBILITY OF SUCH DAMAGE. 26.\" 27.Dd August 14, 2024 28.Dt MBRTOC16 3 29.Os 30.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 31.Sh NAME 32.Nm mbrtoc16 33.Nd Restartable multibyte to UTF-16 conversion 34.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 35.Sh LIBRARY 36.Lb libc 37.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 38.Sh SYNOPSIS 39. 40.In uchar.h 41. 42.Ft size_t 43.Fo mbrtoc16 44.Fa "char16_t * restrict pc16" 45.Fa "const char * restrict s" 46.Fa "size_t n" 47.Fa "mbstate_t * restrict ps" 48.Fc 49.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 50.Sh DESCRIPTION 51The 52.Nm 53function decodes multibyte characters in the current locale and 54converts them to UTF-16, keeping state so it can restart after 55incremental progress. 56.Pp 57Each call to 58.Nm : 59.Bl -enum -compact 60.It 61examines up to 62.Fa n 63bytes starting at 64.Fa s , 65.It 66yields a UTF-16 code unit if available by storing it at 67.Li * Ns Fa pc16 , 68.It 69saves state at 70.Fa ps , 71and 72.It 73returns either the number of bytes consumed if any or a special return 74value. 75.El 76.Pp 77Specifically: 78.Bl -bullet 79.It 80If the multibyte sequence at 81.Fa s 82is invalid after any previous input saved at 83.Fa ps , 84or if an error occurs in decoding, 85.Nm 86returns 87.Li (size_t)-1 88and sets 89.Xr errno 2 90to indicate the error. 91.It 92If the multibyte sequence at 93.Fa s 94is still incomplete after 95.Fa n 96bytes, including any previous input saved in 97.Fa ps , 98.Nm 99saves its state in 100.Fa ps 101after all the input so far and returns 102.Li "(size_t)-2". 103.Sy All 104.Fa n 105bytes of input are consumed in this case. 106.It 107If 108.Nm 109had previously decoded a multibyte character but has not yet yielded 110all the code units of its UTF-16 encoding, it stores the next UTF-16 111code unit at 112.Li * Ns Fa pc16 113and returns 114.Li "(size_t)-3" . 115.Sy \&No 116bytes of input are consumed in this case. 117.It 118If 119.Nm 120decodes the null multibyte character, then it stores zero at 121.Li * Ns Fa pc16 122and returns zero. 123.It 124Otherwise, 125.Nm 126decodes a single multibyte character, stores the first (and possibly 127only) code unit in its UTF-16 encoding at 128.Li * Ns Fa pc16 , 129and returns the number of bytes consumed to decode the first multibyte 130character. 131.El 132.Pp 133If 134.Fa pc16 135is a null pointer, nothing is stored, but the effects on 136.Fa ps 137and the return value are unchanged. 138.Pp 139If 140.Fa s 141is a null pointer, the 142.Nm 143call is equivalent to: 144.Bd -ragged -offset indent 145.Fo mbrtoc16 146.Li NULL , 147.Li \*q\*q , 148.Li 1 , 149.Fa ps 150.Fc 151.Ed 152.Pp 153This always returns zero, and has the effect of resetting 154.Fa ps 155to the initial conversion state, without writing to 156.Fa pc16 , 157even if it is nonnull. 158.Pp 159If 160.Fa ps 161is a null pointer, 162.Nm 163uses an internal 164.Vt mbstate_t 165object with static storage duration, distinct from all other 166.Vt mbstate_t 167objects 168.Po 169including those used by 170.Xr mbrtoc8 3 , 171.Xr mbrtoc32 3 , 172.Xr c8rtomb 3 , 173.Xr c16rtomb 3 , 174and 175.Xr c32rtomb 3 176.Pc , 177which is initialized at program startup to the initial conversion 178state. 179.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 180.Sh IMPLEMENTATION NOTES 181On well-formed input, the 182.Nm 183function yields either a Unicode scalar value in the Basic Multilingual 184Plane (BMP), i.e., a 16-bit Unicode code point that is not a surrogate 185code point, or, over two successive calls, yields the high and low 186surrogate code points (in that order) of a Unicode scalar value outside 187the BMP. 188.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 189.Sh RETURN VALUES 190The 191.Nm 192function returns: 193.Bl -tag -width Li 194.It Li 0 195.Bq null 196if 197.Nm 198decoded a null multibyte character. 199.It Ar i 200.Bq code unit 201where 202.Li 1 203\*(Le 204.Ar i 205\*(Le 206.Fa n , 207if 208.Nm 209consumed 210.Ar i 211bytes of input to decode the next multibyte character, yielding a 212UTF-16 code unit. 213.It Li (size_t)-3 214.Bq continuation 215if 216.Nm 217consumed no new bytes of input but yielded a UTF-16 code unit that was 218pending from previous input. 219.It Li (size_t)-2 220.Bq incomplete 221if 222.Nm 223found only an incomplete multibyte sequence after all 224.Fa n 225bytes of input and any previous input, and saved its state to restart 226in the next call with 227.Fa ps . 228.It Li (size_t)-1 229.Bq error 230if any encoding error was detected; 231.Xr errno 2 232is set to reflect the error. 233.El 234.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 235.Sh EXAMPLES 236Print the UTF-16 code units of a multibyte string in hexadecimal text: 237.Bd -literal -offset indent 238char *s = ...; 239size_t n = ...; 240mbstate_t mbs = {0}; /* initial conversion state */ 241 242while (n) { 243 char16_t c16; 244 size_t len; 245 246 len = mbrtoc16(&c16, s, n, &mbs); 247 switch (len) { 248 case 0: /* NUL terminator */ 249 assert(c16 == 0); 250 goto out; 251 default: /* scalar value or high surrogate */ 252 printf("U+%04"PRIx16"\en", (uint16_t)c16); 253 break; 254 case (size_t)-3: /* low surrogate */ 255 printf("continue U+%04"PRIx16"\en", (uint16_t)c16); 256 break; 257 case (size_t)-2: /* incomplete */ 258 printf("incomplete\en"); 259 goto readmore; 260 case (size_t)-1: /* error */ 261 printf("error: %d\en", errno); 262 goto out; 263 } 264 s += len; 265 n -= len; 266} 267.Ed 268.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 269.Sh ERRORS 270.Bl -tag -width Bq 271.It Bq Er EILSEQ 272The multibyte sequence cannot be decoded in the current locale as a 273Unicode scalar value. 274.It Bq Er EIO 275An error occurred in loading the locale's character conversions. 276.El 277.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 278.Sh SEE ALSO 279.Xr c16rtomb 3 , 280.Xr c32rtomb 3 , 281.Xr c8rtomb 3 , 282.Xr mbrtoc32 3 , 283.Xr mbrtoc8 3 , 284.Xr uchar 3 285.Rs 286.%B The Unicode Standard 287.%O Version 15.0 \(em Core Specification 288.%Q The Unicode Consortium 289.%D September 2022 290.%U https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf 291.Re 292.Rs 293.%A P. Hoffman 294.%A F. Yergeau 295.%T UTF-16, an encoding of ISO 10646 296.%R RFC 2781 297.%D February 2000 298.%I Internet Engineering Task Force 299.%U https://datatracker.ietf.org/doc/html/rfc2781 300.Re 301.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 302.Sh STANDARDS 303The 304.Nm 305function conforms to 306.St -isoC-2011 . 307.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 308.Sh HISTORY 309The 310.Nm 311function first appeared in 312.Nx 11.0 . 313