1.\" $OpenBSD: utf8.7,v 1.5 2017/05/31 17:16:48 schwarze Exp $ 2.\" 3.\" Copyright (c) 2017 Ted Unangst <tedu@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: May 31 2017 $ 18.Dt UTF8 7 19.Os 20.Sh NAME 21.Nm utf8 22.Nd UTF-8 text encoding 23.Sh DESCRIPTION 24UTF-8 is a multibyte encoding for Unicode text. 25It is the preferred format for non ASCII text. 26.Pp 27The length of a UTF-8 sequence varies depending on the encoded value. 28If the high bit of the first byte is zero, the sequence length is one and 29the value is the remaining seven bits. 30If the high bit is set, then the number of high bits set, followed by a zero 31bit, indicates the length of the sequence and the value is formed by combining 32the low bits of each byte. 33Continuation bytes all have the same format, with the top two bits set and 34unset, respectively, and six value bits. 35.Pp 36Unicode ranges and their encoding formats: 37.Bl -tag -width Ds 38.It 0x0 - 0x7f 39One byte. 400....... 41.It 0x80 - 0x7ff 42Two bytes. 43110..... 10....... 44.It 0x800 - 0xffff 45Three bytes. 461110.... 10...... 10...... 47.It 0x1000 - 0x10ffff 48Four bytes. 4911110... 10...... 10...... 10...... 50.El 51.Sh SEE ALSO 52.Xr ascii 7 53.Sh STANDARDS 54.Rs 55.%A F. Yergeau 56.%D November 2003 57.%R RFC 3629 58.%T UTF-8, a transformation format of ISO 10646 59.Re 60.Pp 61The Unicode Standard. 62.Sh CAVEATS 63Beware of overlong encodings. 64