xref: /onnv-gate/usr/src/cmd/perl/5.8.4/distrib/lib/Locale/Script.pod (revision 0:68f95e015346)
1*0Sstevel@tonic-gate
2*0Sstevel@tonic-gate=head1 NAME
3*0Sstevel@tonic-gate
4*0Sstevel@tonic-gateLocale::Script - ISO codes for script identification (ISO 15924)
5*0Sstevel@tonic-gate
6*0Sstevel@tonic-gate=head1 SYNOPSIS
7*0Sstevel@tonic-gate
8*0Sstevel@tonic-gate    use Locale::Script;
9*0Sstevel@tonic-gate    use Locale::Constants;
10*0Sstevel@tonic-gate
11*0Sstevel@tonic-gate    $script  = code2script('ph');                       # 'Phoenician'
12*0Sstevel@tonic-gate    $code    = script2code('Tibetan');                  # 'bo'
13*0Sstevel@tonic-gate    $code3   = script2code('Tibetan',
14*0Sstevel@tonic-gate                           LOCALE_CODE_ALPHA_3);        # 'bod'
15*0Sstevel@tonic-gate    $codeN   = script2code('Tibetan',
16*0Sstevel@tonic-gate                           LOCALE_CODE_ALPHA_NUMERIC);  # 330
17*0Sstevel@tonic-gate
18*0Sstevel@tonic-gate    @codes   = all_script_codes();
19*0Sstevel@tonic-gate    @scripts = all_script_names();
20*0Sstevel@tonic-gate
21*0Sstevel@tonic-gate
22*0Sstevel@tonic-gate=head1 DESCRIPTION
23*0Sstevel@tonic-gate
24*0Sstevel@tonic-gateThe C<Locale::Script> module provides access to the ISO
25*0Sstevel@tonic-gatecodes for identifying scripts, as defined in ISO 15924.
26*0Sstevel@tonic-gateFor example, Egyptian hieroglyphs are denoted by the two-letter
27*0Sstevel@tonic-gatecode 'eg', the three-letter code 'egy', and the numeric code 050.
28*0Sstevel@tonic-gate
29*0Sstevel@tonic-gateYou can either access the codes via the conversion routines
30*0Sstevel@tonic-gate(described below), or with the two functions which return lists
31*0Sstevel@tonic-gateof all script codes or all script names.
32*0Sstevel@tonic-gate
33*0Sstevel@tonic-gateThere are three different code sets you can use for identifying
34*0Sstevel@tonic-gatescripts:
35*0Sstevel@tonic-gate
36*0Sstevel@tonic-gate=over 4
37*0Sstevel@tonic-gate
38*0Sstevel@tonic-gate=item B<alpha-2>
39*0Sstevel@tonic-gate
40*0Sstevel@tonic-gateTwo letter codes, such as 'bo' for Tibetan.
41*0Sstevel@tonic-gateThis code set is identified with the symbol C<LOCALE_CODE_ALPHA_2>.
42*0Sstevel@tonic-gate
43*0Sstevel@tonic-gate=item B<alpha-3>
44*0Sstevel@tonic-gate
45*0Sstevel@tonic-gateThree letter codes, such as 'ell' for Greek.
46*0Sstevel@tonic-gateThis code set is identified with the symbol C<LOCALE_CODE_ALPHA_3>.
47*0Sstevel@tonic-gate
48*0Sstevel@tonic-gate=item B<numeric>
49*0Sstevel@tonic-gate
50*0Sstevel@tonic-gateNumeric codes, such as 410 for Hiragana.
51*0Sstevel@tonic-gateThis code set is identified with the symbol C<LOCALE_CODE_NUMERIC>.
52*0Sstevel@tonic-gate
53*0Sstevel@tonic-gate=back
54*0Sstevel@tonic-gate
55*0Sstevel@tonic-gateAll of the routines take an optional additional argument
56*0Sstevel@tonic-gatewhich specifies the code set to use.
57*0Sstevel@tonic-gateIf not specified, it defaults to the two-letter codes.
58*0Sstevel@tonic-gateThis is partly for backwards compatibility (previous versions
59*0Sstevel@tonic-gateof Locale modules only supported the alpha-2 codes), and
60*0Sstevel@tonic-gatepartly because they are the most widely used codes.
61*0Sstevel@tonic-gate
62*0Sstevel@tonic-gateThe alpha-2 and alpha-3 codes are not case-dependent,
63*0Sstevel@tonic-gateso you can use 'BO', 'Bo', 'bO' or 'bo' for Tibetan.
64*0Sstevel@tonic-gateWhen a code is returned by one of the functions in
65*0Sstevel@tonic-gatethis module, it will always be lower-case.
66*0Sstevel@tonic-gate
67*0Sstevel@tonic-gate=head2 SPECIAL CODES
68*0Sstevel@tonic-gate
69*0Sstevel@tonic-gateThe standard defines various special codes.
70*0Sstevel@tonic-gate
71*0Sstevel@tonic-gate=over 4
72*0Sstevel@tonic-gate
73*0Sstevel@tonic-gate=item *
74*0Sstevel@tonic-gate
75*0Sstevel@tonic-gateThe standard reserves codes in the ranges B<qa> - B<qt>,
76*0Sstevel@tonic-gateB<qaa> - B<qat>, and B<900> - B<919>, for private use.
77*0Sstevel@tonic-gate
78*0Sstevel@tonic-gate=item *
79*0Sstevel@tonic-gate
80*0Sstevel@tonic-gateB<zx>, B<zxx>, and B<997>, are the codes for unwritten languages.
81*0Sstevel@tonic-gate
82*0Sstevel@tonic-gate=item *
83*0Sstevel@tonic-gate
84*0Sstevel@tonic-gateB<zy>, B<zyy>, and B<998>, are the codes for an undetermined script.
85*0Sstevel@tonic-gate
86*0Sstevel@tonic-gate=item *
87*0Sstevel@tonic-gate
88*0Sstevel@tonic-gateB<zz>, B<zzz>, and B<999>, are the codes for an uncoded script.
89*0Sstevel@tonic-gate
90*0Sstevel@tonic-gate=back
91*0Sstevel@tonic-gate
92*0Sstevel@tonic-gateThe private codes are not recognised by Locale::Script,
93*0Sstevel@tonic-gatebut the others are.
94*0Sstevel@tonic-gate
95*0Sstevel@tonic-gate
96*0Sstevel@tonic-gate=head1 CONVERSION ROUTINES
97*0Sstevel@tonic-gate
98*0Sstevel@tonic-gateThere are three conversion routines: C<code2script()>, C<script2code()>,
99*0Sstevel@tonic-gateand C<script_code2code()>.
100*0Sstevel@tonic-gate
101*0Sstevel@tonic-gate=over 4
102*0Sstevel@tonic-gate
103*0Sstevel@tonic-gate=item code2script( CODE, [ CODESET ] )
104*0Sstevel@tonic-gate
105*0Sstevel@tonic-gateThis function takes a script code and returns a string
106*0Sstevel@tonic-gatewhich contains the name of the script identified.
107*0Sstevel@tonic-gateIf the code is not a valid script code, as defined by ISO 15924,
108*0Sstevel@tonic-gatethen C<undef> will be returned:
109*0Sstevel@tonic-gate
110*0Sstevel@tonic-gate    $script = code2script('cy');   # Cyrillic
111*0Sstevel@tonic-gate
112*0Sstevel@tonic-gate=item script2code( STRING, [ CODESET ] )
113*0Sstevel@tonic-gate
114*0Sstevel@tonic-gateThis function takes a script name and returns the corresponding
115*0Sstevel@tonic-gatescript code, if such exists.
116*0Sstevel@tonic-gateIf the argument could not be identified as a script name,
117*0Sstevel@tonic-gatethen C<undef> will be returned:
118*0Sstevel@tonic-gate
119*0Sstevel@tonic-gate    $code = script2code('Gothic', LOCALE_CODE_ALPHA_3);
120*0Sstevel@tonic-gate    # $code will now be 'gth'
121*0Sstevel@tonic-gate
122*0Sstevel@tonic-gateThe case of the script name is not important.
123*0Sstevel@tonic-gateSee the section L<KNOWN BUGS AND LIMITATIONS> below.
124*0Sstevel@tonic-gate
125*0Sstevel@tonic-gate=item script_code2code( CODE, CODESET, CODESET )
126*0Sstevel@tonic-gate
127*0Sstevel@tonic-gateThis function takes a script code from one code set,
128*0Sstevel@tonic-gateand returns the corresponding code from another code set.
129*0Sstevel@tonic-gate
130*0Sstevel@tonic-gate    $alpha2 = script_code2code('jwi',
131*0Sstevel@tonic-gate		 LOCALE_CODE_ALPHA_3 => LOCALE_CODE_ALPHA_2);
132*0Sstevel@tonic-gate    # $alpha2 will now be 'jw' (Javanese)
133*0Sstevel@tonic-gate
134*0Sstevel@tonic-gateIf the code passed is not a valid script code in
135*0Sstevel@tonic-gatethe first code set, or if there isn't a code for the
136*0Sstevel@tonic-gatecorresponding script in the second code set,
137*0Sstevel@tonic-gatethen C<undef> will be returned.
138*0Sstevel@tonic-gate
139*0Sstevel@tonic-gate=back
140*0Sstevel@tonic-gate
141*0Sstevel@tonic-gate
142*0Sstevel@tonic-gate=head1 QUERY ROUTINES
143*0Sstevel@tonic-gate
144*0Sstevel@tonic-gateThere are two function which can be used to obtain a list of all codes,
145*0Sstevel@tonic-gateor all script names:
146*0Sstevel@tonic-gate
147*0Sstevel@tonic-gate=over 4
148*0Sstevel@tonic-gate
149*0Sstevel@tonic-gate=item C<all_script_codes ( [ CODESET ] )>
150*0Sstevel@tonic-gate
151*0Sstevel@tonic-gateReturns a list of all two-letter script codes.
152*0Sstevel@tonic-gateThe codes are guaranteed to be all lower-case,
153*0Sstevel@tonic-gateand not in any particular order.
154*0Sstevel@tonic-gate
155*0Sstevel@tonic-gate=item C<all_script_names ( [ CODESET ] )>
156*0Sstevel@tonic-gate
157*0Sstevel@tonic-gateReturns a list of all script names for which there is a corresponding
158*0Sstevel@tonic-gatescript code in the specified code set.
159*0Sstevel@tonic-gateThe names are capitalised, and not returned in any particular order.
160*0Sstevel@tonic-gate
161*0Sstevel@tonic-gate=back
162*0Sstevel@tonic-gate
163*0Sstevel@tonic-gate
164*0Sstevel@tonic-gate=head1 EXAMPLES
165*0Sstevel@tonic-gate
166*0Sstevel@tonic-gateThe following example illustrates use of the C<code2script()> function.
167*0Sstevel@tonic-gateThe user is prompted for a script code, and then told the corresponding
168*0Sstevel@tonic-gatescript name:
169*0Sstevel@tonic-gate
170*0Sstevel@tonic-gate    $| = 1;   # turn off buffering
171*0Sstevel@tonic-gate
172*0Sstevel@tonic-gate    print "Enter script code: ";
173*0Sstevel@tonic-gate    chop($code = <STDIN>);
174*0Sstevel@tonic-gate    $script = code2script($code, LOCALE_CODE_ALPHA_2);
175*0Sstevel@tonic-gate    if (defined $script)
176*0Sstevel@tonic-gate    {
177*0Sstevel@tonic-gate        print "$code = $script\n";
178*0Sstevel@tonic-gate    }
179*0Sstevel@tonic-gate    else
180*0Sstevel@tonic-gate    {
181*0Sstevel@tonic-gate        print "'$code' is not a valid script code!\n";
182*0Sstevel@tonic-gate    }
183*0Sstevel@tonic-gate
184*0Sstevel@tonic-gate
185*0Sstevel@tonic-gate=head1 KNOWN BUGS AND LIMITATIONS
186*0Sstevel@tonic-gate
187*0Sstevel@tonic-gate=over 4
188*0Sstevel@tonic-gate
189*0Sstevel@tonic-gate=item *
190*0Sstevel@tonic-gate
191*0Sstevel@tonic-gateWhen using C<script2code()>, the script name must currently appear
192*0Sstevel@tonic-gateexactly as it does in the source of the module. For example,
193*0Sstevel@tonic-gate
194*0Sstevel@tonic-gate    script2code('Egyptian hieroglyphs')
195*0Sstevel@tonic-gate
196*0Sstevel@tonic-gatewill return B<eg>, as expected. But the following will all return C<undef>:
197*0Sstevel@tonic-gate
198*0Sstevel@tonic-gate    script2code('hieroglyphs')
199*0Sstevel@tonic-gate    script2code('Egyptian Hieroglypics')
200*0Sstevel@tonic-gate
201*0Sstevel@tonic-gateIf there's need for it, a future version could have variants
202*0Sstevel@tonic-gatefor script names.
203*0Sstevel@tonic-gate
204*0Sstevel@tonic-gate=item *
205*0Sstevel@tonic-gate
206*0Sstevel@tonic-gateIn the current implementation, all data is read in when the
207*0Sstevel@tonic-gatemodule is loaded, and then held in memory.
208*0Sstevel@tonic-gateA lazy implementation would be more memory friendly.
209*0Sstevel@tonic-gate
210*0Sstevel@tonic-gate=back
211*0Sstevel@tonic-gate
212*0Sstevel@tonic-gate=head1 SEE ALSO
213*0Sstevel@tonic-gate
214*0Sstevel@tonic-gate=over 4
215*0Sstevel@tonic-gate
216*0Sstevel@tonic-gate=item Locale::Language
217*0Sstevel@tonic-gate
218*0Sstevel@tonic-gateISO two letter codes for identification of language (ISO 639).
219*0Sstevel@tonic-gate
220*0Sstevel@tonic-gate=item Locale::Currency
221*0Sstevel@tonic-gate
222*0Sstevel@tonic-gateISO three letter codes for identification of currencies
223*0Sstevel@tonic-gateand funds (ISO 4217).
224*0Sstevel@tonic-gate
225*0Sstevel@tonic-gate=item Locale::Country
226*0Sstevel@tonic-gate
227*0Sstevel@tonic-gateISO three letter codes for identification of countries (ISO 3166)
228*0Sstevel@tonic-gate
229*0Sstevel@tonic-gate=item ISO 15924
230*0Sstevel@tonic-gate
231*0Sstevel@tonic-gateThe ISO standard which defines these codes.
232*0Sstevel@tonic-gate
233*0Sstevel@tonic-gate=item http://www.evertype.com/standards/iso15924/
234*0Sstevel@tonic-gate
235*0Sstevel@tonic-gateHome page for ISO 15924.
236*0Sstevel@tonic-gate
237*0Sstevel@tonic-gate
238*0Sstevel@tonic-gate=back
239*0Sstevel@tonic-gate
240*0Sstevel@tonic-gate
241*0Sstevel@tonic-gate=head1 AUTHOR
242*0Sstevel@tonic-gate
243*0Sstevel@tonic-gateNeil Bowers E<lt>neil@bowers.comE<gt>
244*0Sstevel@tonic-gate
245*0Sstevel@tonic-gate=head1 COPYRIGHT
246*0Sstevel@tonic-gate
247*0Sstevel@tonic-gateCopyright (c) 2002 Neil Bowers.
248*0Sstevel@tonic-gate
249*0Sstevel@tonic-gateThis module is free software; you can redistribute it and/or
250*0Sstevel@tonic-gatemodify it under the same terms as Perl itself.
251*0Sstevel@tonic-gate
252*0Sstevel@tonic-gate=cut
253*0Sstevel@tonic-gate
254