1.\" $OpenBSD: c16rtomb.3,v 1.1 2023/08/20 15:02:51 schwarze Exp $ 2.\" 3.\" Copyright (c) 2023 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: August 20 2023 $ 18.Dt C16RTOMB 3 19.Os 20.Sh NAME 21.Nm c16rtomb 22.Nd convert one UTF-16 encoded character to UTF-8 23.Sh SYNOPSIS 24.In uchar.h 25.Ft size_t 26.Fo c16rtomb 27.Fa "char * restrict s" 28.Fa "char16_t c16" 29.Fa "mbstate_t * restrict mbs" 30.Fc 31.Sh DESCRIPTION 32This function converts one UTF-16 encoded character to UTF-8. 33In some cases, it is necessary to call the function twice 34to convert a single character. 35.Pp 36First, call 37.Fn c16rtomb 38passing the first 16-bit code unit of the UTF-16 encoded character in 39.Fa c16 . 40If the return value is greater than 0, the character is part of the UCS-2 41range, the complete UTF-8 encoding consisting of at most 42.Dv MB_CUR_MAX 43bytes has been written to the storage starting at 44.Fa s , 45and the function does not need to be called again. 46.Pp 47If the return value is 0, the first 16-bit code unit is a UTF-16 48high surrogate and the function needs to be called a second time, 49this time passing the second 16-bit code unit of the UTF-16 encoded 50character in 51.Fa c16 52and passing the same 53.Fa mbs 54again that was also passed to the first call. 55If the second 16-bit code unit is a UTF-16 low surrogate, 56the second call returns a value greater than 0, 57the surrogate pair represents a Unicode code point 58beyond the basic multilingual plane, 59and the complete UTF-8 encoding consisting of at most 60.Dv MB_CUR_MAX 61bytes is written to the storage starting at 62.Fa s . 63.Pp 64The output encoding that 65.Fn c16rtomb 66uses in 67.Fa s 68is determined by the 69.Dv LC_CTYPE 70category of the current locale. 71.Ox 72only supports UTF-8 and ASCII output, 73and this function is only useful for UTF-8. 74.Pp 75The following arguments cause special processing: 76.Bl -tag -width 012345678901 77.It Fa c16 No == 0 78A NUL byte is stored to 79.Pf * Fa s 80and the state object pointed to by 81.Fa mbs 82is reset to the initial state. 83On operating systems other than 84.Ox 85that support state-dependent multibyte encodings, 86a special byte sequence 87.Pq Dq shift sequence 88is written before the NUL byte to return to the initial state 89if that is required by the output encoding 90and by the current output encoding state. 91.It Fa mbs No == Dv NULL 92An internal 93.Vt mbstate_t 94object specific to the 95.Fn c16rtomb 96function is used instead of the 97.Fa mbs 98argument. 99This internal object is automatically initialized at program startup 100and never changed by any 101.Em libc 102function except 103.Fn c16rtomb . 104.It Fa s No == Dv NULL 105The object pointed to by 106.Fa mbs , 107or the internal object if 108.Fa mbs 109is a 110.Dv NULL 111pointer, is reset to its initial state, 112.Fa c16 113is ignored, and 1 is returned. 114.El 115.Sh RETURN VALUES 116.Fn c16rtomb 117returns the number of bytes written to 118.Fa s 119on success or 120.Po Vt size_t Pc Ns \-1 121on failure, specifically: 122.Bl -tag -width 10n 123.It 0 124The first 16-bit code unit was successfully decoded 125as a UTF-16 high surrogate. 126Nothing was written to 127.Fa s 128yet. 129.It 1 130The first 16-bit code unit was successfully decoded 131as a character in the range U+0000 to U+007F, or 132.Fa s 133is 134.Dv NULL . 135.It 2 136The first 16-bit code unit was successfully decoded 137as a character in the range U+0080 to U+07FF. 138.It 3 139The first 16-bit code unit was successfully decoded 140as a character in the range U+0800 to U+D7FF or U+E000 to U+FFFF. 141.It 4 142The second 16-bit code unit was successfully decoded as a UTF-16 low 143surrogate, resulting in a character in the range U+10000 to U+10FFFF. 144.It greater 145Return values greater than 4 may occur on operating systems other than 146.Ox 147for output encodings other than UTF-8, in particular when a shift 148sequence was written. 149.It Po Vt size_t Pc Ns \-1 150UTF-16 input decoding or 151.Dv LC_CTYPE 152output encoding failed, or 153.Fa mbs 154is invalid. 155Nothing was written to 156.Fa s , 157and 158.Va errno 159has been set. 160.El 161.Sh ERRORS 162.Fn c16rtomb 163causes an error in the following cases: 164.Bl -tag -width Er 165.It Bq Er EILSEQ 166UTF-16 input decoding failed because the first 16-bit code unit 167is neither a UCS-2 character nor a UTF-16 high surrogate, 168or because the second 16-bit code unit is not a UTF-16 low surrogate; 169or output encoding failed because the resulting character 170cannot be represented in the output encoding selected with 171.Dv LC_CTYPE . 172.It Bq Er EINVAL 173.Fa mbs 174points to an invalid or uninitialized 175.Vt mbstate_t 176object. 177.El 178.Sh SEE ALSO 179.Xr mbrtoc16 3 , 180.Xr setlocale 3 , 181.Xr wcrtomb 3 182.Sh STANDARDS 183.Fn c16rtomb 184conforms to 185.St -isoC-2011 . 186.Sh HISTORY 187.Fn c16rtomb 188has been available since 189.Ox 7.4 . 190.Sh CAVEATS 191The C11 standard only requires the 192.Fa c16 193argument to be interpreted according to UTF-16 194if the predefined environment macro 195.Dv __STDC_UTF_16__ 196is defined with a value of 1. 197On 198.Ox , 199.In uchar.h 200provides this definition. 201Other operating systems which do not define 202.Dv __STDC_UTF_16__ 203could theoretically use a different, 204implementation-defined input encoding for 205.Fa c16 206instead of UTF-16. 207Using UTF-16 becomes mandatory in C23. 208