1*0Sstevel@tonic-gatepackage bytes; 2*0Sstevel@tonic-gate 3*0Sstevel@tonic-gateour $VERSION = '1.01'; 4*0Sstevel@tonic-gate 5*0Sstevel@tonic-gate$bytes::hint_bits = 0x00000008; 6*0Sstevel@tonic-gate 7*0Sstevel@tonic-gatesub import { 8*0Sstevel@tonic-gate $^H |= $bytes::hint_bits; 9*0Sstevel@tonic-gate} 10*0Sstevel@tonic-gate 11*0Sstevel@tonic-gatesub unimport { 12*0Sstevel@tonic-gate $^H &= ~$bytes::hint_bits; 13*0Sstevel@tonic-gate} 14*0Sstevel@tonic-gate 15*0Sstevel@tonic-gatesub AUTOLOAD { 16*0Sstevel@tonic-gate require "bytes_heavy.pl"; 17*0Sstevel@tonic-gate goto &$AUTOLOAD; 18*0Sstevel@tonic-gate} 19*0Sstevel@tonic-gate 20*0Sstevel@tonic-gatesub length ($); 21*0Sstevel@tonic-gatesub chr ($); 22*0Sstevel@tonic-gatesub ord ($); 23*0Sstevel@tonic-gatesub substr ($$;$$); 24*0Sstevel@tonic-gatesub index ($$;$); 25*0Sstevel@tonic-gatesub rindex ($$;$); 26*0Sstevel@tonic-gate 27*0Sstevel@tonic-gate1; 28*0Sstevel@tonic-gate__END__ 29*0Sstevel@tonic-gate 30*0Sstevel@tonic-gate=head1 NAME 31*0Sstevel@tonic-gate 32*0Sstevel@tonic-gatebytes - Perl pragma to force byte semantics rather than character semantics 33*0Sstevel@tonic-gate 34*0Sstevel@tonic-gate=head1 SYNOPSIS 35*0Sstevel@tonic-gate 36*0Sstevel@tonic-gate use bytes; 37*0Sstevel@tonic-gate ... chr(...); # or bytes::chr 38*0Sstevel@tonic-gate ... index(...); # or bytes::index 39*0Sstevel@tonic-gate ... length(...); # or bytes::length 40*0Sstevel@tonic-gate ... ord(...); # or bytes::ord 41*0Sstevel@tonic-gate ... rindex(...); # or bytes::rindex 42*0Sstevel@tonic-gate ... substr(...); # or bytes::substr 43*0Sstevel@tonic-gate no bytes; 44*0Sstevel@tonic-gate 45*0Sstevel@tonic-gate 46*0Sstevel@tonic-gate=head1 DESCRIPTION 47*0Sstevel@tonic-gate 48*0Sstevel@tonic-gateThe C<use bytes> pragma disables character semantics for the rest of the 49*0Sstevel@tonic-gatelexical scope in which it appears. C<no bytes> can be used to reverse 50*0Sstevel@tonic-gatethe effect of C<use bytes> within the current lexical scope. 51*0Sstevel@tonic-gate 52*0Sstevel@tonic-gatePerl normally assumes character semantics in the presence of character 53*0Sstevel@tonic-gatedata (i.e. data that has come from a source that has been marked as 54*0Sstevel@tonic-gatebeing of a particular character encoding). When C<use bytes> is in 55*0Sstevel@tonic-gateeffect, the encoding is temporarily ignored, and each string is treated 56*0Sstevel@tonic-gateas a series of bytes. 57*0Sstevel@tonic-gate 58*0Sstevel@tonic-gateAs an example, when Perl sees C<$x = chr(400)>, it encodes the character 59*0Sstevel@tonic-gatein UTF-8 and stores it in $x. Then it is marked as character data, so, 60*0Sstevel@tonic-gatefor instance, C<length $x> returns C<1>. However, in the scope of the 61*0Sstevel@tonic-gateC<bytes> pragma, $x is treated as a series of bytes - the bytes that make 62*0Sstevel@tonic-gateup the UTF8 encoding - and C<length $x> returns C<2>: 63*0Sstevel@tonic-gate 64*0Sstevel@tonic-gate $x = chr(400); 65*0Sstevel@tonic-gate print "Length is ", length $x, "\n"; # "Length is 1" 66*0Sstevel@tonic-gate printf "Contents are %vd\n", $x; # "Contents are 400" 67*0Sstevel@tonic-gate { 68*0Sstevel@tonic-gate use bytes; # or "require bytes; bytes::length()" 69*0Sstevel@tonic-gate print "Length is ", length $x, "\n"; # "Length is 2" 70*0Sstevel@tonic-gate printf "Contents are %vd\n", $x; # "Contents are 198.144" 71*0Sstevel@tonic-gate } 72*0Sstevel@tonic-gate 73*0Sstevel@tonic-gatechr(), ord(), substr(), index() and rindex() behave similarly. 74*0Sstevel@tonic-gate 75*0Sstevel@tonic-gateFor more on the implications and differences between character 76*0Sstevel@tonic-gatesemantics and byte semantics, see L<perluniintro> and L<perlunicode>. 77*0Sstevel@tonic-gate 78*0Sstevel@tonic-gate=head1 LIMITATIONS 79*0Sstevel@tonic-gate 80*0Sstevel@tonic-gatebytes::substr() does not work as an lvalue(). 81*0Sstevel@tonic-gate 82*0Sstevel@tonic-gate=head1 SEE ALSO 83*0Sstevel@tonic-gate 84*0Sstevel@tonic-gateL<perluniintro>, L<perlunicode>, L<utf8> 85*0Sstevel@tonic-gate 86*0Sstevel@tonic-gate=cut 87