xref: /openbsd-src/lib/libc/locale/c16rtomb.3 (revision 46c354aa2baf687e7a81339ec07289555b065bb2)
1*46c354aaSschwarze.\" $OpenBSD: c16rtomb.3,v 1.1 2023/08/20 15:02:51 schwarze Exp $
2*46c354aaSschwarze.\"
3*46c354aaSschwarze.\" Copyright (c) 2023 Ingo Schwarze <schwarze@openbsd.org>
4*46c354aaSschwarze.\"
5*46c354aaSschwarze.\" Permission to use, copy, modify, and distribute this software for any
6*46c354aaSschwarze.\" purpose with or without fee is hereby granted, provided that the above
7*46c354aaSschwarze.\" copyright notice and this permission notice appear in all copies.
8*46c354aaSschwarze.\"
9*46c354aaSschwarze.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10*46c354aaSschwarze.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11*46c354aaSschwarze.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12*46c354aaSschwarze.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13*46c354aaSschwarze.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14*46c354aaSschwarze.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15*46c354aaSschwarze.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16*46c354aaSschwarze.\"
17*46c354aaSschwarze.Dd $Mdocdate: August 20 2023 $
18*46c354aaSschwarze.Dt C16RTOMB 3
19*46c354aaSschwarze.Os
20*46c354aaSschwarze.Sh NAME
21*46c354aaSschwarze.Nm c16rtomb
22*46c354aaSschwarze.Nd convert one UTF-16 encoded character to UTF-8
23*46c354aaSschwarze.Sh SYNOPSIS
24*46c354aaSschwarze.In uchar.h
25*46c354aaSschwarze.Ft size_t
26*46c354aaSschwarze.Fo c16rtomb
27*46c354aaSschwarze.Fa "char * restrict s"
28*46c354aaSschwarze.Fa "char16_t c16"
29*46c354aaSschwarze.Fa "mbstate_t * restrict mbs"
30*46c354aaSschwarze.Fc
31*46c354aaSschwarze.Sh DESCRIPTION
32*46c354aaSschwarzeThis function converts one UTF-16 encoded character to UTF-8.
33*46c354aaSschwarzeIn some cases, it is necessary to call the function twice
34*46c354aaSschwarzeto convert a single character.
35*46c354aaSschwarze.Pp
36*46c354aaSschwarzeFirst, call
37*46c354aaSschwarze.Fn c16rtomb
38*46c354aaSschwarzepassing the first 16-bit code unit of the UTF-16 encoded character in
39*46c354aaSschwarze.Fa c16 .
40*46c354aaSschwarzeIf the return value is greater than 0, the character is part of the UCS-2
41*46c354aaSschwarzerange, the complete UTF-8 encoding consisting of at most
42*46c354aaSschwarze.Dv MB_CUR_MAX
43*46c354aaSschwarzebytes has been written to the storage starting at
44*46c354aaSschwarze.Fa s ,
45*46c354aaSschwarzeand the function does not need to be called again.
46*46c354aaSschwarze.Pp
47*46c354aaSschwarzeIf the return value is 0, the first 16-bit code unit is a UTF-16
48*46c354aaSschwarzehigh surrogate and the function needs to be called a second time,
49*46c354aaSschwarzethis time passing the second 16-bit code unit of the UTF-16 encoded
50*46c354aaSschwarzecharacter in
51*46c354aaSschwarze.Fa c16
52*46c354aaSschwarzeand passing the same
53*46c354aaSschwarze.Fa mbs
54*46c354aaSschwarzeagain that was also passed to the first call.
55*46c354aaSschwarzeIf the second 16-bit code unit is a UTF-16 low surrogate,
56*46c354aaSschwarzethe second call returns a value greater than 0,
57*46c354aaSschwarzethe surrogate pair represents a Unicode code point
58*46c354aaSschwarzebeyond the basic multilingual plane,
59*46c354aaSschwarzeand the complete UTF-8 encoding consisting of at most
60*46c354aaSschwarze.Dv MB_CUR_MAX
61*46c354aaSschwarzebytes is written to the storage starting at
62*46c354aaSschwarze.Fa s .
63*46c354aaSschwarze.Pp
64*46c354aaSschwarzeThe output encoding that
65*46c354aaSschwarze.Fn c16rtomb
66*46c354aaSschwarzeuses in
67*46c354aaSschwarze.Fa s
68*46c354aaSschwarzeis determined by the
69*46c354aaSschwarze.Dv LC_CTYPE
70*46c354aaSschwarzecategory of the current locale.
71*46c354aaSschwarze.Ox
72*46c354aaSschwarzeonly supports UTF-8 and ASCII output,
73*46c354aaSschwarzeand this function is only useful for UTF-8.
74*46c354aaSschwarze.Pp
75*46c354aaSschwarzeThe following arguments cause special processing:
76*46c354aaSschwarze.Bl -tag -width 012345678901
77*46c354aaSschwarze.It Fa c16 No == 0
78*46c354aaSschwarzeA NUL byte is stored to
79*46c354aaSschwarze.Pf * Fa s
80*46c354aaSschwarzeand the state object pointed to by
81*46c354aaSschwarze.Fa mbs
82*46c354aaSschwarzeis reset to the initial state.
83*46c354aaSschwarzeOn operating systems other than
84*46c354aaSschwarze.Ox
85*46c354aaSschwarzethat support state-dependent multibyte encodings,
86*46c354aaSschwarzea special byte sequence
87*46c354aaSschwarze.Pq Dq shift sequence
88*46c354aaSschwarzeis written before the NUL byte to return to the initial state
89*46c354aaSschwarzeif that is required by the output encoding
90*46c354aaSschwarzeand by the current output encoding state.
91*46c354aaSschwarze.It Fa mbs No == Dv NULL
92*46c354aaSschwarzeAn internal
93*46c354aaSschwarze.Vt mbstate_t
94*46c354aaSschwarzeobject specific to the
95*46c354aaSschwarze.Fn c16rtomb
96*46c354aaSschwarzefunction is used instead of the
97*46c354aaSschwarze.Fa mbs
98*46c354aaSschwarzeargument.
99*46c354aaSschwarzeThis internal object is automatically initialized at program startup
100*46c354aaSschwarzeand never changed by any
101*46c354aaSschwarze.Em libc
102*46c354aaSschwarzefunction except
103*46c354aaSschwarze.Fn c16rtomb .
104*46c354aaSschwarze.It Fa s No == Dv NULL
105*46c354aaSschwarzeThe object pointed to by
106*46c354aaSschwarze.Fa mbs ,
107*46c354aaSschwarzeor the internal object if
108*46c354aaSschwarze.Fa mbs
109*46c354aaSschwarzeis a
110*46c354aaSschwarze.Dv NULL
111*46c354aaSschwarzepointer, is reset to its initial state,
112*46c354aaSschwarze.Fa c16
113*46c354aaSschwarzeis ignored, and 1 is returned.
114*46c354aaSschwarze.El
115*46c354aaSschwarze.Sh RETURN VALUES
116*46c354aaSschwarze.Fn c16rtomb
117*46c354aaSschwarzereturns the number of bytes written to
118*46c354aaSschwarze.Fa s
119*46c354aaSschwarzeon success or
120*46c354aaSschwarze.Po Vt size_t Pc Ns \-1
121*46c354aaSschwarzeon failure, specifically:
122*46c354aaSschwarze.Bl -tag -width 10n
123*46c354aaSschwarze.It 0
124*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded
125*46c354aaSschwarzeas a UTF-16 high surrogate.
126*46c354aaSschwarzeNothing was written to
127*46c354aaSschwarze.Fa s
128*46c354aaSschwarzeyet.
129*46c354aaSschwarze.It 1
130*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded
131*46c354aaSschwarzeas a character in the range U+0000 to U+007F, or
132*46c354aaSschwarze.Fa s
133*46c354aaSschwarzeis
134*46c354aaSschwarze.Dv NULL .
135*46c354aaSschwarze.It 2
136*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded
137*46c354aaSschwarzeas a character in the range U+0080 to U+07FF.
138*46c354aaSschwarze.It 3
139*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded
140*46c354aaSschwarzeas a character in the range U+0800 to U+D7FF or U+E000 to U+FFFF.
141*46c354aaSschwarze.It 4
142*46c354aaSschwarzeThe second 16-bit code unit was successfully decoded as a UTF-16 low
143*46c354aaSschwarzesurrogate, resulting in a character in the range U+10000 to U+10FFFF.
144*46c354aaSschwarze.It greater
145*46c354aaSschwarzeReturn values greater than 4 may occur on operating systems other than
146*46c354aaSschwarze.Ox
147*46c354aaSschwarzefor output encodings other than UTF-8, in particular when a shift
148*46c354aaSschwarzesequence was written.
149*46c354aaSschwarze.It Po Vt size_t Pc Ns \-1
150*46c354aaSschwarzeUTF-16 input decoding or
151*46c354aaSschwarze.Dv LC_CTYPE
152*46c354aaSschwarzeoutput encoding failed, or
153*46c354aaSschwarze.Fa mbs
154*46c354aaSschwarzeis invalid.
155*46c354aaSschwarzeNothing was written to
156*46c354aaSschwarze.Fa s ,
157*46c354aaSschwarzeand
158*46c354aaSschwarze.Va errno
159*46c354aaSschwarzehas been set.
160*46c354aaSschwarze.El
161*46c354aaSschwarze.Sh ERRORS
162*46c354aaSschwarze.Fn c16rtomb
163*46c354aaSschwarzecauses an error in the following cases:
164*46c354aaSschwarze.Bl -tag -width Er
165*46c354aaSschwarze.It Bq Er EILSEQ
166*46c354aaSschwarzeUTF-16 input decoding failed because the first 16-bit code unit
167*46c354aaSschwarzeis neither a UCS-2 character nor a UTF-16 high surrogate,
168*46c354aaSschwarzeor because the second 16-bit code unit is not a UTF-16 low surrogate;
169*46c354aaSschwarzeor output encoding failed because the resulting character
170*46c354aaSschwarzecannot be represented in the output encoding selected with
171*46c354aaSschwarze.Dv LC_CTYPE .
172*46c354aaSschwarze.It Bq Er EINVAL
173*46c354aaSschwarze.Fa mbs
174*46c354aaSschwarzepoints to an invalid or uninitialized
175*46c354aaSschwarze.Vt mbstate_t
176*46c354aaSschwarzeobject.
177*46c354aaSschwarze.El
178*46c354aaSschwarze.Sh SEE ALSO
179*46c354aaSschwarze.Xr mbrtoc16 3 ,
180*46c354aaSschwarze.Xr setlocale 3 ,
181*46c354aaSschwarze.Xr wcrtomb 3
182*46c354aaSschwarze.Sh STANDARDS
183*46c354aaSschwarze.Fn c16rtomb
184*46c354aaSschwarzeconforms to
185*46c354aaSschwarze.St -isoC-2011 .
186*46c354aaSschwarze.Sh HISTORY
187*46c354aaSschwarze.Fn c16rtomb
188*46c354aaSschwarzehas been available since
189*46c354aaSschwarze.Ox 7.4 .
190*46c354aaSschwarze.Sh CAVEATS
191*46c354aaSschwarzeThe C11 standard only requires the
192*46c354aaSschwarze.Fa c16
193*46c354aaSschwarzeargument to be interpreted according to UTF-16
194*46c354aaSschwarzeif the predefined environment macro
195*46c354aaSschwarze.Dv __STDC_UTF_16__
196*46c354aaSschwarzeis defined with a value of 1.
197*46c354aaSschwarzeOn
198*46c354aaSschwarze.Ox ,
199*46c354aaSschwarze.In uchar.h
200*46c354aaSschwarzeprovides this definition.
201*46c354aaSschwarzeOther operating systems which do not define
202*46c354aaSschwarze.Dv __STDC_UTF_16__
203*46c354aaSschwarzecould theoretically use a different,
204*46c354aaSschwarzeimplementation-defined input encoding for
205*46c354aaSschwarze.Fa c16
206*46c354aaSschwarzeinstead of UTF-16.
207*46c354aaSschwarzeUsing UTF-16 becomes mandatory in C23.
208