Encode/JP/JP.pm

*0Sstevel@tonic-gatepackage Encode::JP;
*0Sstevel@tonic-gateBEGIN {
*0Sstevel@tonic-gate    if (ord("A") == 193) {
*0Sstevel@tonic-gate	die "Encode::JP not supported on EBCDIC\n";
*0Sstevel@tonic-gate    }
*0Sstevel@tonic-gate}
*0Sstevel@tonic-gateuse Encode;
*0Sstevel@tonic-gateour $VERSION = do { my @r = (q$Revision: 1.25 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateuse XSLoader;
*0Sstevel@tonic-gateXSLoader::load(__PACKAGE__,$VERSION);
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateuse Encode::JP::JIS7;
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate1;
*0Sstevel@tonic-gate__END__
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 NAME
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateEncode::JP - Japanese Encodings
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 SYNOPSIS
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate    use Encode qw/encode decode/;
*0Sstevel@tonic-gate    $euc_jp = encode("euc-jp", $utf8);   # loads Encode::JP implicitly
*0Sstevel@tonic-gate    $utf8   = decode("euc-jp", $euc_jp); # ditto
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 ABSTRACT
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateThis module implements Japanese charset encodings.  Encodings
*0Sstevel@tonic-gatesupported are as follows.
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate  Canonical   Alias		Description
*0Sstevel@tonic-gate  --------------------------------------------------------------------
*0Sstevel@tonic-gate  euc-jp      /\beuc.*jp$/i	EUC (Extended Unix Character)
*0Sstevel@tonic-gate              /\bjp.*euc/i
*0Sstevel@tonic-gate	      /\bujis$/i
*0Sstevel@tonic-gate  shiftjis    /\bshift.*jis$/i	Shift JIS (aka MS Kanji)
*0Sstevel@tonic-gate	      /\bsjis$/i
*0Sstevel@tonic-gate  7bit-jis    /\bjis$/i		7bit JIS
*0Sstevel@tonic-gate  iso-2022-jp			ISO-2022-JP                  [RFC1468]
*0Sstevel@tonic-gate				= 7bit JIS with all Halfwidth Kana
*0Sstevel@tonic-gate				  converted to Fullwidth
*0Sstevel@tonic-gate  iso-2022-jp-1			ISO-2022-JP-1                [RFC2237]
*0Sstevel@tonic-gate                                = ISO-2022-JP with JIS X 0212-1990
*0Sstevel@tonic-gate				  support.  See below
*0Sstevel@tonic-gate  MacJapanese	                Shift JIS + Apple vendor mappings
*0Sstevel@tonic-gate  cp932                         Code Page 932
*0Sstevel@tonic-gate                                = Shift JIS + MS/IBM vendor mappings
*0Sstevel@tonic-gate  jis0201-raw                   JIS0201, raw format
*0Sstevel@tonic-gate  jis0208-raw                   JIS0201, raw format
*0Sstevel@tonic-gate  jis0212-raw                   JIS0201, raw format
*0Sstevel@tonic-gate  --------------------------------------------------------------------
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 DESCRIPTION
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateTo find out how to use this module in detail, see L<Encode>.
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 Note on ISO-2022-JP(-1)?
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateISO-2022-JP-1 (RFC2237) is a superset of ISO-2022-JP (RFC1468) which
*0Sstevel@tonic-gateadds support for JIS X 0212-1990.  That means you can use the same
*0Sstevel@tonic-gatecode to decode to utf8 but not vice versa.
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate  $utf8 = decode('iso-2022-jp-1', $stream);
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateand
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate  $utf8 = decode('iso-2022-jp',   $stream);
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateyield the same result but
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate  $with_0212 = encode('iso-2022-jp-1', $utf8);
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateis now different from
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate  $without_0212 = encode('iso-2022-jp', $utf8 );
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateIn the latter case, characters that map to 0212 are first converted
*0Sstevel@tonic-gateto U+3013 (0xA2AE in EUC-JP; a white square also known as 'Tofu' or
*0Sstevel@tonic-gate'geta mark') then fed to the decoding engine.  U+FFFD is not used,
*0Sstevel@tonic-gatein order to preserve text layout as much as possible.
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 BUGS
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateThe ASCII region (0x00-0x7f) is preserved for all encodings, even
*0Sstevel@tonic-gatethough this conflicts with mappings by the Unicode Consortium.  See
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateL<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateto find out why it is implemented that way.
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=head1 SEE ALSO
*0Sstevel@tonic-gate
*0Sstevel@tonic-gateL<Encode>
*0Sstevel@tonic-gate
*0Sstevel@tonic-gate=cut