tcs (revision 3c2ddefeebfd7a80eaebf272955335c2cf163bd5) - OpenGrok cross reference for /plan9/sys/man/1/tcs

     TCS 1
 NAME
tcs - translate character sets
 SYNOPSIS
 tcs [
 -slcv ]
[
 -f  ics ]
[
 -t  ocs ]
[
 file ... ]
 DESCRIPTION
 Tcs interprets the named
 file(s) (standard input default) as a stream of characters from the
 ics character set or format, converts them to runes,
and then converts them into a stream of characters from the
 ocs character set or format on the standard output.
The default value for
 ics and
 ocs is
 utf , the
 UTF encoding described in
 utf (6). The
 -l option lists the character sets known to
 tcs . Processing continues in the face of conversion errors (the
 -s option prevents reporting of these errors).
The
 -c option forces the output to contain only correctly converted characters;
otherwise,
 Runeerror (0xFFFD)
characters will be substituted for
 UTF encoding errors and unknown characters.

The
 -v option generates various diagnostic and summary information on standard error,
or makes the
 -l output more verbose.

 Tcs recognizes an ever changing list of character sets.
In particular, it supports a variety of Russian and Japanese encodings.
Some of the supported encodings are
.TF jis-kanji

 utf The Plan 9
 UTF encoding, known by ISO as UTF-8

 utf1 The deprecated original
 UTF encoding from ISO 10646

 ascii 7-bit ASCII

 8859-1 Latin-1 (Central European)

 8859-2 Latin-2 (Czech .. Slovak)

 8859-3 Latin-3 (Dutch .. Turkish)

 8859-4 Latin-4 (Scandinavian)

 8859-5 Part 5 (Cyrillic)

 8859-6 Part 6 (Arabic)

 8859-7 Part 7 (Greek)

 8859-8 Part 8 (Hebrew)

 8859-9 Latin-5 (Finnish .. Portuguese)

 html Unicode as encoded by HTML

 koi8 KOI-8 (GOST 19769-74)

 jis-kanji ISO 2022-JP

 ujis EUC-JX: JIS 0208

 ms-kanji Microsoft, or Shift-JIS

 jis (from only) guesses between ISO 2022-JP, EUC or Shift-Jis

 gb Chinese national standard (GB2312-80)

 big5 Big 5 (HKU version)

 unicode Unicode Standard 1.0

 tis Thai character set plus
 ASCII (TIS 620-1986)

 msdos IBM PC: CP 437

 atari Atari-ST character set
 EXAMPLES

 tcs -f 8859-1 Convert 8859-1 (Latin-1) characters into
 UTF format.

 tcs -s -f jis Convert characters encoded in one of several shift JIS encodings into
 UTF format.
Unknown Kanji will be converted into
 0xFFFD characters.

 tcs -t html Convert UTF into character set-independent HTML.

 tcs -lv Print an up to date list of the supported character sets.
 SOURCE
 /sys/src/cmd/tcs  SEE ALSO
 ascii (1),   rune (2),   utf (6).