10d5acd74SJohn Marino.\" Copyright (c) 1993 20d5acd74SJohn Marino.\" The Regents of the University of California. All rights reserved. 30d5acd74SJohn Marino.\" 40d5acd74SJohn Marino.\" This code is derived from software contributed to Berkeley by 50d5acd74SJohn Marino.\" Paul Borman at Krystal Technologies. 60d5acd74SJohn Marino.\" 70d5acd74SJohn Marino.\" Redistribution and use in source and binary forms, with or without 80d5acd74SJohn Marino.\" modification, are permitted provided that the following conditions 90d5acd74SJohn Marino.\" are met: 100d5acd74SJohn Marino.\" 1. Redistributions of source code must retain the above copyright 110d5acd74SJohn Marino.\" notice, this list of conditions and the following disclaimer. 120d5acd74SJohn Marino.\" 2. Redistributions in binary form must reproduce the above copyright 130d5acd74SJohn Marino.\" notice, this list of conditions and the following disclaimer in the 140d5acd74SJohn Marino.\" documentation and/or other materials provided with the distribution. 15*c66c7e2fSzrj.\" 3. Neither the name of the University nor the names of its contributors 160d5acd74SJohn Marino.\" may be used to endorse or promote products derived from this software 170d5acd74SJohn Marino.\" without specific prior written permission. 180d5acd74SJohn Marino.\" 190d5acd74SJohn Marino.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 200d5acd74SJohn Marino.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 210d5acd74SJohn Marino.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 220d5acd74SJohn Marino.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 230d5acd74SJohn Marino.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 240d5acd74SJohn Marino.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 250d5acd74SJohn Marino.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 260d5acd74SJohn Marino.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 270d5acd74SJohn Marino.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 280d5acd74SJohn Marino.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 290d5acd74SJohn Marino.\" SUCH DAMAGE. 300d5acd74SJohn Marino.\" 310d5acd74SJohn Marino.\" @(#)utf2.4 8.1 (Berkeley) 6/4/93 320d5acd74SJohn Marino.\" $FreeBSD: head/lib/libc/locale/utf8.5 165903 2007-01-09 00:28:16Z imp $ 330d5acd74SJohn Marino.\" 340d5acd74SJohn Marino.Dd April 7, 2004 350d5acd74SJohn Marino.Dt UTF8 5 360d5acd74SJohn Marino.Os 370d5acd74SJohn Marino.Sh NAME 380d5acd74SJohn Marino.Nm utf8 390d5acd74SJohn Marino.Nd "UTF-8, a transformation format of ISO 10646" 400d5acd74SJohn Marino.Sh SYNOPSIS 410d5acd74SJohn Marino.Nm ENCODING 420d5acd74SJohn Marino.Qq UTF-8 430d5acd74SJohn Marino.Sh DESCRIPTION 440d5acd74SJohn MarinoThe 450d5acd74SJohn Marino.Nm UTF-8 460d5acd74SJohn Marinoencoding represents UCS-4 characters as a sequence of octets, using 470d5acd74SJohn Marinobetween 1 and 6 for each character. 480d5acd74SJohn MarinoIt is backwards compatible with 490d5acd74SJohn Marino.Tn ASCII , 500d5acd74SJohn Marinoso 0x00-0x7f refer to the 510d5acd74SJohn Marino.Tn ASCII 520d5acd74SJohn Marinocharacter set. 530d5acd74SJohn MarinoThe multibyte encoding of 540d5acd74SJohn Marino.No non- Ns Tn ASCII 550d5acd74SJohn Marinocharacters 560d5acd74SJohn Marinoconsist entirely of bytes whose high order bit is set. 570d5acd74SJohn MarinoThe actual 580d5acd74SJohn Marinoencoding is represented by the following table: 590d5acd74SJohn Marino.Bd -literal 600d5acd74SJohn Marino[0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb 610d5acd74SJohn Marino[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb 620d5acd74SJohn Marino[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] -> 630d5acd74SJohn Marino 1110bbbb, 10bbbbbb, 10bbbbbb 640d5acd74SJohn Marino[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> 650d5acd74SJohn Marino 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb 660d5acd74SJohn Marino[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 670d5acd74SJohn Marino 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb 680d5acd74SJohn Marino[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 690d5acd74SJohn Marino 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb 700d5acd74SJohn Marino.Ed 710d5acd74SJohn Marino.Pp 720d5acd74SJohn MarinoIf more than a single representation of a value exists (for example, 730d5acd74SJohn Marino0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always 740d5acd74SJohn Marinoused. 750d5acd74SJohn MarinoLonger ones are detected as an error as they pose a potential 760d5acd74SJohn Marinosecurity risk, and destroy the 1:1 character:octet sequence mapping. 770d5acd74SJohn Marino.Sh SEE ALSO 780d5acd74SJohn Marino.Xr euc 5 790d5acd74SJohn Marino.Rs 800d5acd74SJohn Marino.%A "Rob Pike" 810d5acd74SJohn Marino.%A "Ken Thompson" 820d5acd74SJohn Marino.%T "Hello World" 830d5acd74SJohn Marino.%J "Proceedings of the Winter 1993 USENIX Technical Conference" 840d5acd74SJohn Marino.%Q "USENIX Association" 850d5acd74SJohn Marino.%D "January 1993" 860d5acd74SJohn Marino.Re 870d5acd74SJohn Marino.Rs 880d5acd74SJohn Marino.%A "F. Yergeau" 890d5acd74SJohn Marino.%T "UTF-8, a transformation format of ISO 10646" 900d5acd74SJohn Marino.%O "RFC 2279" 910d5acd74SJohn Marino.%D "January 1998" 920d5acd74SJohn Marino.Re 930d5acd74SJohn Marino.Rs 940d5acd74SJohn Marino.%Q "The Unicode Consortium" 950d5acd74SJohn Marino.%T "The Unicode Standard, Version 3.0" 960d5acd74SJohn Marino.%D "2000" 970d5acd74SJohn Marino.%O "as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode Standard Annex #28: Unicode 3.2" 980d5acd74SJohn Marino.Re 990d5acd74SJohn Marino.Sh STANDARDS 1000d5acd74SJohn MarinoThe 1010d5acd74SJohn Marino.Nm 1020d5acd74SJohn Marinoencoding is compatible with RFC 2279 and Unicode 3.2. 103