1*46c354aaSschwarze.\" $OpenBSD: c16rtomb.3,v 1.1 2023/08/20 15:02:51 schwarze Exp $ 2*46c354aaSschwarze.\" 3*46c354aaSschwarze.\" Copyright (c) 2023 Ingo Schwarze <schwarze@openbsd.org> 4*46c354aaSschwarze.\" 5*46c354aaSschwarze.\" Permission to use, copy, modify, and distribute this software for any 6*46c354aaSschwarze.\" purpose with or without fee is hereby granted, provided that the above 7*46c354aaSschwarze.\" copyright notice and this permission notice appear in all copies. 8*46c354aaSschwarze.\" 9*46c354aaSschwarze.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10*46c354aaSschwarze.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11*46c354aaSschwarze.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12*46c354aaSschwarze.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13*46c354aaSschwarze.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14*46c354aaSschwarze.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15*46c354aaSschwarze.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16*46c354aaSschwarze.\" 17*46c354aaSschwarze.Dd $Mdocdate: August 20 2023 $ 18*46c354aaSschwarze.Dt C16RTOMB 3 19*46c354aaSschwarze.Os 20*46c354aaSschwarze.Sh NAME 21*46c354aaSschwarze.Nm c16rtomb 22*46c354aaSschwarze.Nd convert one UTF-16 encoded character to UTF-8 23*46c354aaSschwarze.Sh SYNOPSIS 24*46c354aaSschwarze.In uchar.h 25*46c354aaSschwarze.Ft size_t 26*46c354aaSschwarze.Fo c16rtomb 27*46c354aaSschwarze.Fa "char * restrict s" 28*46c354aaSschwarze.Fa "char16_t c16" 29*46c354aaSschwarze.Fa "mbstate_t * restrict mbs" 30*46c354aaSschwarze.Fc 31*46c354aaSschwarze.Sh DESCRIPTION 32*46c354aaSschwarzeThis function converts one UTF-16 encoded character to UTF-8. 33*46c354aaSschwarzeIn some cases, it is necessary to call the function twice 34*46c354aaSschwarzeto convert a single character. 35*46c354aaSschwarze.Pp 36*46c354aaSschwarzeFirst, call 37*46c354aaSschwarze.Fn c16rtomb 38*46c354aaSschwarzepassing the first 16-bit code unit of the UTF-16 encoded character in 39*46c354aaSschwarze.Fa c16 . 40*46c354aaSschwarzeIf the return value is greater than 0, the character is part of the UCS-2 41*46c354aaSschwarzerange, the complete UTF-8 encoding consisting of at most 42*46c354aaSschwarze.Dv MB_CUR_MAX 43*46c354aaSschwarzebytes has been written to the storage starting at 44*46c354aaSschwarze.Fa s , 45*46c354aaSschwarzeand the function does not need to be called again. 46*46c354aaSschwarze.Pp 47*46c354aaSschwarzeIf the return value is 0, the first 16-bit code unit is a UTF-16 48*46c354aaSschwarzehigh surrogate and the function needs to be called a second time, 49*46c354aaSschwarzethis time passing the second 16-bit code unit of the UTF-16 encoded 50*46c354aaSschwarzecharacter in 51*46c354aaSschwarze.Fa c16 52*46c354aaSschwarzeand passing the same 53*46c354aaSschwarze.Fa mbs 54*46c354aaSschwarzeagain that was also passed to the first call. 55*46c354aaSschwarzeIf the second 16-bit code unit is a UTF-16 low surrogate, 56*46c354aaSschwarzethe second call returns a value greater than 0, 57*46c354aaSschwarzethe surrogate pair represents a Unicode code point 58*46c354aaSschwarzebeyond the basic multilingual plane, 59*46c354aaSschwarzeand the complete UTF-8 encoding consisting of at most 60*46c354aaSschwarze.Dv MB_CUR_MAX 61*46c354aaSschwarzebytes is written to the storage starting at 62*46c354aaSschwarze.Fa s . 63*46c354aaSschwarze.Pp 64*46c354aaSschwarzeThe output encoding that 65*46c354aaSschwarze.Fn c16rtomb 66*46c354aaSschwarzeuses in 67*46c354aaSschwarze.Fa s 68*46c354aaSschwarzeis determined by the 69*46c354aaSschwarze.Dv LC_CTYPE 70*46c354aaSschwarzecategory of the current locale. 71*46c354aaSschwarze.Ox 72*46c354aaSschwarzeonly supports UTF-8 and ASCII output, 73*46c354aaSschwarzeand this function is only useful for UTF-8. 74*46c354aaSschwarze.Pp 75*46c354aaSschwarzeThe following arguments cause special processing: 76*46c354aaSschwarze.Bl -tag -width 012345678901 77*46c354aaSschwarze.It Fa c16 No == 0 78*46c354aaSschwarzeA NUL byte is stored to 79*46c354aaSschwarze.Pf * Fa s 80*46c354aaSschwarzeand the state object pointed to by 81*46c354aaSschwarze.Fa mbs 82*46c354aaSschwarzeis reset to the initial state. 83*46c354aaSschwarzeOn operating systems other than 84*46c354aaSschwarze.Ox 85*46c354aaSschwarzethat support state-dependent multibyte encodings, 86*46c354aaSschwarzea special byte sequence 87*46c354aaSschwarze.Pq Dq shift sequence 88*46c354aaSschwarzeis written before the NUL byte to return to the initial state 89*46c354aaSschwarzeif that is required by the output encoding 90*46c354aaSschwarzeand by the current output encoding state. 91*46c354aaSschwarze.It Fa mbs No == Dv NULL 92*46c354aaSschwarzeAn internal 93*46c354aaSschwarze.Vt mbstate_t 94*46c354aaSschwarzeobject specific to the 95*46c354aaSschwarze.Fn c16rtomb 96*46c354aaSschwarzefunction is used instead of the 97*46c354aaSschwarze.Fa mbs 98*46c354aaSschwarzeargument. 99*46c354aaSschwarzeThis internal object is automatically initialized at program startup 100*46c354aaSschwarzeand never changed by any 101*46c354aaSschwarze.Em libc 102*46c354aaSschwarzefunction except 103*46c354aaSschwarze.Fn c16rtomb . 104*46c354aaSschwarze.It Fa s No == Dv NULL 105*46c354aaSschwarzeThe object pointed to by 106*46c354aaSschwarze.Fa mbs , 107*46c354aaSschwarzeor the internal object if 108*46c354aaSschwarze.Fa mbs 109*46c354aaSschwarzeis a 110*46c354aaSschwarze.Dv NULL 111*46c354aaSschwarzepointer, is reset to its initial state, 112*46c354aaSschwarze.Fa c16 113*46c354aaSschwarzeis ignored, and 1 is returned. 114*46c354aaSschwarze.El 115*46c354aaSschwarze.Sh RETURN VALUES 116*46c354aaSschwarze.Fn c16rtomb 117*46c354aaSschwarzereturns the number of bytes written to 118*46c354aaSschwarze.Fa s 119*46c354aaSschwarzeon success or 120*46c354aaSschwarze.Po Vt size_t Pc Ns \-1 121*46c354aaSschwarzeon failure, specifically: 122*46c354aaSschwarze.Bl -tag -width 10n 123*46c354aaSschwarze.It 0 124*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded 125*46c354aaSschwarzeas a UTF-16 high surrogate. 126*46c354aaSschwarzeNothing was written to 127*46c354aaSschwarze.Fa s 128*46c354aaSschwarzeyet. 129*46c354aaSschwarze.It 1 130*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded 131*46c354aaSschwarzeas a character in the range U+0000 to U+007F, or 132*46c354aaSschwarze.Fa s 133*46c354aaSschwarzeis 134*46c354aaSschwarze.Dv NULL . 135*46c354aaSschwarze.It 2 136*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded 137*46c354aaSschwarzeas a character in the range U+0080 to U+07FF. 138*46c354aaSschwarze.It 3 139*46c354aaSschwarzeThe first 16-bit code unit was successfully decoded 140*46c354aaSschwarzeas a character in the range U+0800 to U+D7FF or U+E000 to U+FFFF. 141*46c354aaSschwarze.It 4 142*46c354aaSschwarzeThe second 16-bit code unit was successfully decoded as a UTF-16 low 143*46c354aaSschwarzesurrogate, resulting in a character in the range U+10000 to U+10FFFF. 144*46c354aaSschwarze.It greater 145*46c354aaSschwarzeReturn values greater than 4 may occur on operating systems other than 146*46c354aaSschwarze.Ox 147*46c354aaSschwarzefor output encodings other than UTF-8, in particular when a shift 148*46c354aaSschwarzesequence was written. 149*46c354aaSschwarze.It Po Vt size_t Pc Ns \-1 150*46c354aaSschwarzeUTF-16 input decoding or 151*46c354aaSschwarze.Dv LC_CTYPE 152*46c354aaSschwarzeoutput encoding failed, or 153*46c354aaSschwarze.Fa mbs 154*46c354aaSschwarzeis invalid. 155*46c354aaSschwarzeNothing was written to 156*46c354aaSschwarze.Fa s , 157*46c354aaSschwarzeand 158*46c354aaSschwarze.Va errno 159*46c354aaSschwarzehas been set. 160*46c354aaSschwarze.El 161*46c354aaSschwarze.Sh ERRORS 162*46c354aaSschwarze.Fn c16rtomb 163*46c354aaSschwarzecauses an error in the following cases: 164*46c354aaSschwarze.Bl -tag -width Er 165*46c354aaSschwarze.It Bq Er EILSEQ 166*46c354aaSschwarzeUTF-16 input decoding failed because the first 16-bit code unit 167*46c354aaSschwarzeis neither a UCS-2 character nor a UTF-16 high surrogate, 168*46c354aaSschwarzeor because the second 16-bit code unit is not a UTF-16 low surrogate; 169*46c354aaSschwarzeor output encoding failed because the resulting character 170*46c354aaSschwarzecannot be represented in the output encoding selected with 171*46c354aaSschwarze.Dv LC_CTYPE . 172*46c354aaSschwarze.It Bq Er EINVAL 173*46c354aaSschwarze.Fa mbs 174*46c354aaSschwarzepoints to an invalid or uninitialized 175*46c354aaSschwarze.Vt mbstate_t 176*46c354aaSschwarzeobject. 177*46c354aaSschwarze.El 178*46c354aaSschwarze.Sh SEE ALSO 179*46c354aaSschwarze.Xr mbrtoc16 3 , 180*46c354aaSschwarze.Xr setlocale 3 , 181*46c354aaSschwarze.Xr wcrtomb 3 182*46c354aaSschwarze.Sh STANDARDS 183*46c354aaSschwarze.Fn c16rtomb 184*46c354aaSschwarzeconforms to 185*46c354aaSschwarze.St -isoC-2011 . 186*46c354aaSschwarze.Sh HISTORY 187*46c354aaSschwarze.Fn c16rtomb 188*46c354aaSschwarzehas been available since 189*46c354aaSschwarze.Ox 7.4 . 190*46c354aaSschwarze.Sh CAVEATS 191*46c354aaSschwarzeThe C11 standard only requires the 192*46c354aaSschwarze.Fa c16 193*46c354aaSschwarzeargument to be interpreted according to UTF-16 194*46c354aaSschwarzeif the predefined environment macro 195*46c354aaSschwarze.Dv __STDC_UTF_16__ 196*46c354aaSschwarzeis defined with a value of 1. 197*46c354aaSschwarzeOn 198*46c354aaSschwarze.Ox , 199*46c354aaSschwarze.In uchar.h 200*46c354aaSschwarzeprovides this definition. 201*46c354aaSschwarzeOther operating systems which do not define 202*46c354aaSschwarze.Dv __STDC_UTF_16__ 203*46c354aaSschwarzecould theoretically use a different, 204*46c354aaSschwarzeimplementation-defined input encoding for 205*46c354aaSschwarze.Fa c16 206*46c354aaSschwarzeinstead of UTF-16. 207*46c354aaSschwarzeUsing UTF-16 becomes mandatory in C23. 208