1package bytes; 2 3our $VERSION = '1.03'; 4 5$bytes::hint_bits = 0x00000008; 6 7sub import { 8 $^H |= $bytes::hint_bits; 9} 10 11sub unimport { 12 $^H &= ~$bytes::hint_bits; 13} 14 15sub AUTOLOAD { 16 require "bytes_heavy.pl"; 17 goto &$AUTOLOAD if defined &$AUTOLOAD; 18 require Carp; 19 Carp::croak("Undefined subroutine $AUTOLOAD called"); 20} 21 22sub length (_); 23sub chr (_); 24sub ord (_); 25sub substr ($$;$$); 26sub index ($$;$); 27sub rindex ($$;$); 28 291; 30__END__ 31 32=head1 NAME 33 34bytes - Perl pragma to force byte semantics rather than character semantics 35 36=head1 SYNOPSIS 37 38 use bytes; 39 ... chr(...); # or bytes::chr 40 ... index(...); # or bytes::index 41 ... length(...); # or bytes::length 42 ... ord(...); # or bytes::ord 43 ... rindex(...); # or bytes::rindex 44 ... substr(...); # or bytes::substr 45 no bytes; 46 47 48=head1 DESCRIPTION 49 50The C<use bytes> pragma disables character semantics for the rest of the 51lexical scope in which it appears. C<no bytes> can be used to reverse 52the effect of C<use bytes> within the current lexical scope. 53 54Perl normally assumes character semantics in the presence of character 55data (i.e. data that has come from a source that has been marked as 56being of a particular character encoding). When C<use bytes> is in 57effect, the encoding is temporarily ignored, and each string is treated 58as a series of bytes. 59 60As an example, when Perl sees C<$x = chr(400)>, it encodes the character 61in UTF-8 and stores it in $x. Then it is marked as character data, so, 62for instance, C<length $x> returns C<1>. However, in the scope of the 63C<bytes> pragma, $x is treated as a series of bytes - the bytes that make 64up the UTF8 encoding - and C<length $x> returns C<2>: 65 66 $x = chr(400); 67 print "Length is ", length $x, "\n"; # "Length is 1" 68 printf "Contents are %vd\n", $x; # "Contents are 400" 69 { 70 use bytes; # or "require bytes; bytes::length()" 71 print "Length is ", length $x, "\n"; # "Length is 2" 72 printf "Contents are %vd\n", $x; # "Contents are 198.144" 73 } 74 75chr(), ord(), substr(), index() and rindex() behave similarly. 76 77For more on the implications and differences between character 78semantics and byte semantics, see L<perluniintro> and L<perlunicode>. 79 80=head1 LIMITATIONS 81 82bytes::substr() does not work as an lvalue(). 83 84=head1 SEE ALSO 85 86L<perluniintro>, L<perlunicode>, L<utf8> 87 88=cut 89