xref: /dflybsd-src/lib/libc/locale/utf8.5 (revision c66c7e2fb8d0d28477d550f1d2a50c4677d547ff)
10d5acd74SJohn Marino.\" Copyright (c) 1993
20d5acd74SJohn Marino.\"	The Regents of the University of California.  All rights reserved.
30d5acd74SJohn Marino.\"
40d5acd74SJohn Marino.\" This code is derived from software contributed to Berkeley by
50d5acd74SJohn Marino.\" Paul Borman at Krystal Technologies.
60d5acd74SJohn Marino.\"
70d5acd74SJohn Marino.\" Redistribution and use in source and binary forms, with or without
80d5acd74SJohn Marino.\" modification, are permitted provided that the following conditions
90d5acd74SJohn Marino.\" are met:
100d5acd74SJohn Marino.\" 1. Redistributions of source code must retain the above copyright
110d5acd74SJohn Marino.\"    notice, this list of conditions and the following disclaimer.
120d5acd74SJohn Marino.\" 2. Redistributions in binary form must reproduce the above copyright
130d5acd74SJohn Marino.\"    notice, this list of conditions and the following disclaimer in the
140d5acd74SJohn Marino.\"    documentation and/or other materials provided with the distribution.
15*c66c7e2fSzrj.\" 3. Neither the name of the University nor the names of its contributors
160d5acd74SJohn Marino.\"    may be used to endorse or promote products derived from this software
170d5acd74SJohn Marino.\"    without specific prior written permission.
180d5acd74SJohn Marino.\"
190d5acd74SJohn Marino.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
200d5acd74SJohn Marino.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
210d5acd74SJohn Marino.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
220d5acd74SJohn Marino.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
230d5acd74SJohn Marino.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
240d5acd74SJohn Marino.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
250d5acd74SJohn Marino.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
260d5acd74SJohn Marino.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
270d5acd74SJohn Marino.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
280d5acd74SJohn Marino.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
290d5acd74SJohn Marino.\" SUCH DAMAGE.
300d5acd74SJohn Marino.\"
310d5acd74SJohn Marino.\"	@(#)utf2.4	8.1 (Berkeley) 6/4/93
320d5acd74SJohn Marino.\" $FreeBSD: head/lib/libc/locale/utf8.5 165903 2007-01-09 00:28:16Z imp $
330d5acd74SJohn Marino.\"
340d5acd74SJohn Marino.Dd April 7, 2004
350d5acd74SJohn Marino.Dt UTF8 5
360d5acd74SJohn Marino.Os
370d5acd74SJohn Marino.Sh NAME
380d5acd74SJohn Marino.Nm utf8
390d5acd74SJohn Marino.Nd "UTF-8, a transformation format of ISO 10646"
400d5acd74SJohn Marino.Sh SYNOPSIS
410d5acd74SJohn Marino.Nm ENCODING
420d5acd74SJohn Marino.Qq UTF-8
430d5acd74SJohn Marino.Sh DESCRIPTION
440d5acd74SJohn MarinoThe
450d5acd74SJohn Marino.Nm UTF-8
460d5acd74SJohn Marinoencoding represents UCS-4 characters as a sequence of octets, using
470d5acd74SJohn Marinobetween 1 and 6 for each character.
480d5acd74SJohn MarinoIt is backwards compatible with
490d5acd74SJohn Marino.Tn ASCII ,
500d5acd74SJohn Marinoso 0x00-0x7f refer to the
510d5acd74SJohn Marino.Tn ASCII
520d5acd74SJohn Marinocharacter set.
530d5acd74SJohn MarinoThe multibyte encoding of
540d5acd74SJohn Marino.No non- Ns Tn ASCII
550d5acd74SJohn Marinocharacters
560d5acd74SJohn Marinoconsist entirely of bytes whose high order bit is set.
570d5acd74SJohn MarinoThe actual
580d5acd74SJohn Marinoencoding is represented by the following table:
590d5acd74SJohn Marino.Bd -literal
600d5acd74SJohn Marino[0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
610d5acd74SJohn Marino[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
620d5acd74SJohn Marino[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
630d5acd74SJohn Marino	1110bbbb, 10bbbbbb, 10bbbbbb
640d5acd74SJohn Marino[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
650d5acd74SJohn Marino	11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
660d5acd74SJohn Marino[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
670d5acd74SJohn Marino	111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
680d5acd74SJohn Marino[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
690d5acd74SJohn Marino	1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
700d5acd74SJohn Marino.Ed
710d5acd74SJohn Marino.Pp
720d5acd74SJohn MarinoIf more than a single representation of a value exists (for example,
730d5acd74SJohn Marino0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
740d5acd74SJohn Marinoused.
750d5acd74SJohn MarinoLonger ones are detected as an error as they pose a potential
760d5acd74SJohn Marinosecurity risk, and destroy the 1:1 character:octet sequence mapping.
770d5acd74SJohn Marino.Sh SEE ALSO
780d5acd74SJohn Marino.Xr euc 5
790d5acd74SJohn Marino.Rs
800d5acd74SJohn Marino.%A "Rob Pike"
810d5acd74SJohn Marino.%A "Ken Thompson"
820d5acd74SJohn Marino.%T "Hello World"
830d5acd74SJohn Marino.%J "Proceedings of the Winter 1993 USENIX Technical Conference"
840d5acd74SJohn Marino.%Q "USENIX Association"
850d5acd74SJohn Marino.%D "January 1993"
860d5acd74SJohn Marino.Re
870d5acd74SJohn Marino.Rs
880d5acd74SJohn Marino.%A "F. Yergeau"
890d5acd74SJohn Marino.%T "UTF-8, a transformation format of ISO 10646"
900d5acd74SJohn Marino.%O "RFC 2279"
910d5acd74SJohn Marino.%D "January 1998"
920d5acd74SJohn Marino.Re
930d5acd74SJohn Marino.Rs
940d5acd74SJohn Marino.%Q "The Unicode Consortium"
950d5acd74SJohn Marino.%T "The Unicode Standard, Version 3.0"
960d5acd74SJohn Marino.%D "2000"
970d5acd74SJohn Marino.%O "as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode Standard Annex #28: Unicode 3.2"
980d5acd74SJohn Marino.Re
990d5acd74SJohn Marino.Sh STANDARDS
1000d5acd74SJohn MarinoThe
1010d5acd74SJohn Marino.Nm
1020d5acd74SJohn Marinoencoding is compatible with RFC 2279 and Unicode 3.2.
103