xref: /netbsd-src/lib/libc/locale/mbrtoc16.3 (revision fdd9db8a91c767e1b3e0b7be194f588935269cca)
1.\"	$NetBSD: mbrtoc16.3,v 1.10 2024/08/23 12:59:49 riastradh Exp $
2.\"
3.\" Copyright (c) 2024 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
16.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
17.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
19.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
20.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
21.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
22.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
23.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
24.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
25.\" POSSIBILITY OF SUCH DAMAGE.
26.\"
27.Dd August 14, 2024
28.Dt MBRTOC16 3
29.Os
30.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
31.Sh NAME
32.Nm mbrtoc16
33.Nd Restartable multibyte to UTF-16 conversion
34.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
35.Sh LIBRARY
36.Lb libc
37.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
38.Sh SYNOPSIS
39.
40.In uchar.h
41.
42.Ft size_t
43.Fo mbrtoc16
44.Fa "char16_t * restrict pc16"
45.Fa "const char * restrict s"
46.Fa "size_t n"
47.Fa "mbstate_t * restrict ps"
48.Fc
49.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
50.Sh DESCRIPTION
51The
52.Nm
53function decodes multibyte characters in the current locale and
54converts them to UTF-16, keeping state so it can restart after
55incremental progress.
56.Pp
57Each call to
58.Nm :
59.Bl -enum -compact
60.It
61examines up to
62.Fa n
63bytes starting at
64.Fa s ,
65.It
66yields a UTF-16 code unit if available by storing it at
67.Li * Ns Fa pc16 ,
68.It
69saves state at
70.Fa ps ,
71and
72.It
73returns either the number of bytes consumed if any or a special return
74value.
75.El
76.Pp
77Specifically:
78.Bl -bullet
79.It
80If the multibyte sequence at
81.Fa s
82is invalid after any previous input saved at
83.Fa ps ,
84or if an error occurs in decoding,
85.Nm
86returns
87.Li (size_t)-1
88and sets
89.Xr errno 2
90to indicate the error.
91.It
92If the multibyte sequence at
93.Fa s
94is still incomplete after
95.Fa n
96bytes, including any previous input saved in
97.Fa ps ,
98.Nm
99saves its state in
100.Fa ps
101after all the input so far and returns
102.Li "(size_t)-2".
103.Sy All
104.Fa n
105bytes of input are consumed in this case.
106.It
107If
108.Nm
109had previously decoded a multibyte character but has not yet yielded
110all the code units of its UTF-16 encoding, it stores the next UTF-16
111code unit at
112.Li * Ns Fa pc16
113and returns
114.Li "(size_t)-3" .
115.Sy \&No
116bytes of input are consumed in this case.
117.It
118If
119.Nm
120decodes the null multibyte character, then it stores zero at
121.Li * Ns Fa pc16
122and returns zero.
123.It
124Otherwise,
125.Nm
126decodes a single multibyte character, stores the first (and possibly
127only) code unit in its UTF-16 encoding at
128.Li * Ns Fa pc16 ,
129and returns the number of bytes consumed to decode the first multibyte
130character.
131.El
132.Pp
133If
134.Fa pc16
135is a null pointer, nothing is stored, but the effects on
136.Fa ps
137and the return value are unchanged.
138.Pp
139If
140.Fa s
141is a null pointer, the
142.Nm
143call is equivalent to:
144.Bd -ragged -offset indent
145.Fo mbrtoc16
146.Li NULL ,
147.Li \*q\*q ,
148.Li 1 ,
149.Fa ps
150.Fc
151.Ed
152.Pp
153This always returns zero, and has the effect of resetting
154.Fa ps
155to the initial conversion state, without writing to
156.Fa pc16 ,
157even if it is nonnull.
158.Pp
159If
160.Fa ps
161is a null pointer,
162.Nm
163uses an internal
164.Vt mbstate_t
165object with static storage duration, distinct from all other
166.Vt mbstate_t
167objects
168.Po
169including those used by
170.Xr mbrtoc8 3 ,
171.Xr mbrtoc32 3 ,
172.Xr c8rtomb 3 ,
173.Xr c16rtomb 3 ,
174and
175.Xr c32rtomb 3
176.Pc ,
177which is initialized at program startup to the initial conversion
178state.
179.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
180.Sh IMPLEMENTATION NOTES
181On well-formed input, the
182.Nm
183function yields either a Unicode scalar value in the Basic Multilingual
184Plane (BMP), i.e., a 16-bit Unicode code point that is not a surrogate
185code point, or, over two successive calls, yields the high and low
186surrogate code points (in that order) of a Unicode scalar value outside
187the BMP.
188.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
189.Sh RETURN VALUES
190The
191.Nm
192function returns:
193.Bl -tag -width Li
194.It Li 0
195.Bq null
196if
197.Nm
198decoded a null multibyte character.
199.It Ar i
200.Bq code unit
201where
202.Li 1
203\*(Le
204.Ar i
205\*(Le
206.Fa n ,
207if
208.Nm
209consumed
210.Ar i
211bytes of input to decode the next multibyte character, yielding a
212UTF-16 code unit.
213.It Li (size_t)-3
214.Bq continuation
215if
216.Nm
217consumed no new bytes of input but yielded a UTF-16 code unit that was
218pending from previous input.
219.It Li (size_t)-2
220.Bq incomplete
221if
222.Nm
223found only an incomplete multibyte sequence after all
224.Fa n
225bytes of input and any previous input, and saved its state to restart
226in the next call with
227.Fa ps .
228.It Li (size_t)-1
229.Bq error
230if any encoding error was detected;
231.Xr errno 2
232is set to reflect the error.
233.El
234.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
235.Sh EXAMPLES
236Print the UTF-16 code units of a multibyte string in hexadecimal text:
237.Bd -literal -offset indent
238char *s = ...;
239size_t n = ...;
240mbstate_t mbs = {0};    /* initial conversion state */
241
242while (n) {
243        char16_t c16;
244        size_t len;
245
246        len = mbrtoc16(&c16, s, n, &mbs);
247        switch (len) {
248        case 0:         /* NUL terminator */
249                assert(c16 == 0);
250                goto out;
251        default:        /* scalar value or high surrogate */
252                printf("U+%04"PRIx16"\en", (uint16_t)c16);
253                break;
254        case (size_t)-3: /* low surrogate */
255                printf("continue U+%04"PRIx16"\en", (uint16_t)c16);
256                break;
257        case (size_t)-2: /* incomplete */
258                printf("incomplete\en");
259                goto readmore;
260        case (size_t)-1: /* error */
261                printf("error: %d\en", errno);
262                goto out;
263        }
264        s += len;
265        n -= len;
266}
267.Ed
268.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
269.Sh ERRORS
270.Bl -tag -width Bq
271.It Bq Er EILSEQ
272The multibyte sequence cannot be decoded in the current locale as a
273Unicode scalar value.
274.It Bq Er EIO
275An error occurred in loading the locale's character conversions.
276.El
277.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
278.Sh SEE ALSO
279.Xr c16rtomb 3 ,
280.Xr c32rtomb 3 ,
281.Xr c8rtomb 3 ,
282.Xr mbrtoc32 3 ,
283.Xr mbrtoc8 3 ,
284.Xr uchar 3
285.Rs
286.%B The Unicode Standard
287.%O Version 15.0 \(em Core Specification
288.%Q The Unicode Consortium
289.%D September 2022
290.%U https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf
291.Re
292.Rs
293.%A P. Hoffman
294.%A F. Yergeau
295.%T UTF-16, an encoding of ISO 10646
296.%R RFC 2781
297.%D February 2000
298.%I Internet Engineering Task Force
299.%U https://datatracker.ietf.org/doc/html/rfc2781
300.Re
301.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
302.Sh STANDARDS
303The
304.Nm
305function conforms to
306.St -isoC-2011 .
307.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
308.Sh HISTORY
309The
310.Nm
311function first appeared in
312.Nx 11.0 .
313