xref: /csrg-svn/lib/libc/locale/utf2.4 (revision 61139)
1*61139Sbostic.\" Copyright (c) 1993
2*61139Sbostic.\"	The Regents of the University of California.  All rights reserved.
361077Sbostic.\"
461077Sbostic.\" This code is derived from software contributed to Berkeley by
561077Sbostic.\" Paul Borman at Krystal Technologies.
661077Sbostic.\"
761077Sbostic.\" %sccs.include.redist.roff%
861077Sbostic.\"
9*61139Sbostic.\"	@(#)utf2.4	8.1 (Berkeley) 06/04/93
1061077Sbostic.\"
1161077Sbostic.Dd ""
1261077Sbostic.Dt UTF2 4
1361077Sbostic.Os
1461077Sbostic.Sh NAME
1561077Sbostic.Nm UTF2
1661077Sbostic.Nd "Universal character set Transformation Format encoding of runes
1761077Sbostic.Sh SYNOPSIS
1861077Sbostic\fBENCODING "UTF2"\fP
1961077Sbostic.Sh DESCRIPTION
2061077SbosticThe
2161077Sbostic.Nm UTF2
2261077Sbosticencoding is based on a proposed X-Open multibyte
2361077Sbostic\s-1FSS-UCS-TF\s+1 (File System Safe Universal Character Set Transformation Format) encoding as used in
2461077Sbostic.Nm Plan 9 from Bell Labs.
2561077SbosticAlthough it is capable of representing more than 16 bits,
2661077Sbosticthe current implementation is limited to 16 bits as defined by the
2761077SbosticUnicode Standard.
2861077Sbostic.Pp
2961077Sbostic.Nm UTF2
3061077Sbosticrepresentation is backwards compatible with ASCII, so 0x00-0x7f refer to the
3161077SbosticASCII character set.  The multibyte encoding of runes between 0x0080 and 0xffff
3261077Sbosticconsist entirely of bytes whose high order bit is set.  The actual
3361077Sbosticencoding is represented by the following table:
3461077Sbostic.Bd -literal
3561077Sbostic[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb
3661077Sbostic[0x0080 - 0x03ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
3761077Sbostic[0x0400 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
3861077Sbostic.Ed
3961077Sbostic.sp
4061077SbosticIf more than a single representation of a value exists (for example,
4161077Sbostic0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
4261077Sbosticused (but the longer ones will be correctly decoded).
4361077Sbostic.Pp
4461077SbosticThe final three encodings provided by X-Open:
4561077Sbostic.Bd -literal
4661077Sbostic[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
4761077Sbostic	11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
4861077Sbostic
4961077Sbostic[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
5061077Sbostic	111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
5161077Sbostic
5261077Sbostic[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
5361077Sbostic	1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
5461077Sbostic.Ed
5561077Sbostic.sp
5661077Sbosticwhich provides for the entire proposed ISO-10646 31 bit standard are currently
5761077Sbosticnot implemented.
5861077Sbostic.Sh "SEE ALSO"
5961077Sbostic.Xr mklocale 1 ,
6061077Sbostic.Xr setlocale 3
61