xref: /openbsd-src/lib/libc/locale/c16rtomb.3 (revision 46c354aa2baf687e7a81339ec07289555b065bb2)
1.\" $OpenBSD: c16rtomb.3,v 1.1 2023/08/20 15:02:51 schwarze Exp $
2.\"
3.\" Copyright (c) 2023 Ingo Schwarze <schwarze@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: August 20 2023 $
18.Dt C16RTOMB 3
19.Os
20.Sh NAME
21.Nm c16rtomb
22.Nd convert one UTF-16 encoded character to UTF-8
23.Sh SYNOPSIS
24.In uchar.h
25.Ft size_t
26.Fo c16rtomb
27.Fa "char * restrict s"
28.Fa "char16_t c16"
29.Fa "mbstate_t * restrict mbs"
30.Fc
31.Sh DESCRIPTION
32This function converts one UTF-16 encoded character to UTF-8.
33In some cases, it is necessary to call the function twice
34to convert a single character.
35.Pp
36First, call
37.Fn c16rtomb
38passing the first 16-bit code unit of the UTF-16 encoded character in
39.Fa c16 .
40If the return value is greater than 0, the character is part of the UCS-2
41range, the complete UTF-8 encoding consisting of at most
42.Dv MB_CUR_MAX
43bytes has been written to the storage starting at
44.Fa s ,
45and the function does not need to be called again.
46.Pp
47If the return value is 0, the first 16-bit code unit is a UTF-16
48high surrogate and the function needs to be called a second time,
49this time passing the second 16-bit code unit of the UTF-16 encoded
50character in
51.Fa c16
52and passing the same
53.Fa mbs
54again that was also passed to the first call.
55If the second 16-bit code unit is a UTF-16 low surrogate,
56the second call returns a value greater than 0,
57the surrogate pair represents a Unicode code point
58beyond the basic multilingual plane,
59and the complete UTF-8 encoding consisting of at most
60.Dv MB_CUR_MAX
61bytes is written to the storage starting at
62.Fa s .
63.Pp
64The output encoding that
65.Fn c16rtomb
66uses in
67.Fa s
68is determined by the
69.Dv LC_CTYPE
70category of the current locale.
71.Ox
72only supports UTF-8 and ASCII output,
73and this function is only useful for UTF-8.
74.Pp
75The following arguments cause special processing:
76.Bl -tag -width 012345678901
77.It Fa c16 No == 0
78A NUL byte is stored to
79.Pf * Fa s
80and the state object pointed to by
81.Fa mbs
82is reset to the initial state.
83On operating systems other than
84.Ox
85that support state-dependent multibyte encodings,
86a special byte sequence
87.Pq Dq shift sequence
88is written before the NUL byte to return to the initial state
89if that is required by the output encoding
90and by the current output encoding state.
91.It Fa mbs No == Dv NULL
92An internal
93.Vt mbstate_t
94object specific to the
95.Fn c16rtomb
96function is used instead of the
97.Fa mbs
98argument.
99This internal object is automatically initialized at program startup
100and never changed by any
101.Em libc
102function except
103.Fn c16rtomb .
104.It Fa s No == Dv NULL
105The object pointed to by
106.Fa mbs ,
107or the internal object if
108.Fa mbs
109is a
110.Dv NULL
111pointer, is reset to its initial state,
112.Fa c16
113is ignored, and 1 is returned.
114.El
115.Sh RETURN VALUES
116.Fn c16rtomb
117returns the number of bytes written to
118.Fa s
119on success or
120.Po Vt size_t Pc Ns \-1
121on failure, specifically:
122.Bl -tag -width 10n
123.It 0
124The first 16-bit code unit was successfully decoded
125as a UTF-16 high surrogate.
126Nothing was written to
127.Fa s
128yet.
129.It 1
130The first 16-bit code unit was successfully decoded
131as a character in the range U+0000 to U+007F, or
132.Fa s
133is
134.Dv NULL .
135.It 2
136The first 16-bit code unit was successfully decoded
137as a character in the range U+0080 to U+07FF.
138.It 3
139The first 16-bit code unit was successfully decoded
140as a character in the range U+0800 to U+D7FF or U+E000 to U+FFFF.
141.It 4
142The second 16-bit code unit was successfully decoded as a UTF-16 low
143surrogate, resulting in a character in the range U+10000 to U+10FFFF.
144.It greater
145Return values greater than 4 may occur on operating systems other than
146.Ox
147for output encodings other than UTF-8, in particular when a shift
148sequence was written.
149.It Po Vt size_t Pc Ns \-1
150UTF-16 input decoding or
151.Dv LC_CTYPE
152output encoding failed, or
153.Fa mbs
154is invalid.
155Nothing was written to
156.Fa s ,
157and
158.Va errno
159has been set.
160.El
161.Sh ERRORS
162.Fn c16rtomb
163causes an error in the following cases:
164.Bl -tag -width Er
165.It Bq Er EILSEQ
166UTF-16 input decoding failed because the first 16-bit code unit
167is neither a UCS-2 character nor a UTF-16 high surrogate,
168or because the second 16-bit code unit is not a UTF-16 low surrogate;
169or output encoding failed because the resulting character
170cannot be represented in the output encoding selected with
171.Dv LC_CTYPE .
172.It Bq Er EINVAL
173.Fa mbs
174points to an invalid or uninitialized
175.Vt mbstate_t
176object.
177.El
178.Sh SEE ALSO
179.Xr mbrtoc16 3 ,
180.Xr setlocale 3 ,
181.Xr wcrtomb 3
182.Sh STANDARDS
183.Fn c16rtomb
184conforms to
185.St -isoC-2011 .
186.Sh HISTORY
187.Fn c16rtomb
188has been available since
189.Ox 7.4 .
190.Sh CAVEATS
191The C11 standard only requires the
192.Fa c16
193argument to be interpreted according to UTF-16
194if the predefined environment macro
195.Dv __STDC_UTF_16__
196is defined with a value of 1.
197On
198.Ox ,
199.In uchar.h
200provides this definition.
201Other operating systems which do not define
202.Dv __STDC_UTF_16__
203could theoretically use a different,
204implementation-defined input encoding for
205.Fa c16
206instead of UTF-16.
207Using UTF-16 becomes mandatory in C23.
208