1*61139Sbostic.\" Copyright (c) 1993 2*61139Sbostic.\" The Regents of the University of California. All rights reserved. 361077Sbostic.\" 461077Sbostic.\" This code is derived from software contributed to Berkeley by 561077Sbostic.\" Paul Borman at Krystal Technologies. 661077Sbostic.\" 761077Sbostic.\" %sccs.include.redist.roff% 861077Sbostic.\" 9*61139Sbostic.\" @(#)utf2.4 8.1 (Berkeley) 06/04/93 1061077Sbostic.\" 1161077Sbostic.Dd "" 1261077Sbostic.Dt UTF2 4 1361077Sbostic.Os 1461077Sbostic.Sh NAME 1561077Sbostic.Nm UTF2 1661077Sbostic.Nd "Universal character set Transformation Format encoding of runes 1761077Sbostic.Sh SYNOPSIS 1861077Sbostic\fBENCODING "UTF2"\fP 1961077Sbostic.Sh DESCRIPTION 2061077SbosticThe 2161077Sbostic.Nm UTF2 2261077Sbosticencoding is based on a proposed X-Open multibyte 2361077Sbostic\s-1FSS-UCS-TF\s+1 (File System Safe Universal Character Set Transformation Format) encoding as used in 2461077Sbostic.Nm Plan 9 from Bell Labs. 2561077SbosticAlthough it is capable of representing more than 16 bits, 2661077Sbosticthe current implementation is limited to 16 bits as defined by the 2761077SbosticUnicode Standard. 2861077Sbostic.Pp 2961077Sbostic.Nm UTF2 3061077Sbosticrepresentation is backwards compatible with ASCII, so 0x00-0x7f refer to the 3161077SbosticASCII character set. The multibyte encoding of runes between 0x0080 and 0xffff 3261077Sbosticconsist entirely of bytes whose high order bit is set. The actual 3361077Sbosticencoding is represented by the following table: 3461077Sbostic.Bd -literal 3561077Sbostic[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb 3661077Sbostic[0x0080 - 0x03ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb 3761077Sbostic[0x0400 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb 3861077Sbostic.Ed 3961077Sbostic.sp 4061077SbosticIf more than a single representation of a value exists (for example, 4161077Sbostic0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always 4261077Sbosticused (but the longer ones will be correctly decoded). 4361077Sbostic.Pp 4461077SbosticThe final three encodings provided by X-Open: 4561077Sbostic.Bd -literal 4661077Sbostic[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> 4761077Sbostic 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb 4861077Sbostic 4961077Sbostic[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 5061077Sbostic 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb 5161077Sbostic 5261077Sbostic[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 5361077Sbostic 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb 5461077Sbostic.Ed 5561077Sbostic.sp 5661077Sbosticwhich provides for the entire proposed ISO-10646 31 bit standard are currently 5761077Sbosticnot implemented. 5861077Sbostic.Sh "SEE ALSO" 5961077Sbostic.Xr mklocale 1 , 6061077Sbostic.Xr setlocale 3 61