1package PerlIO; 2 3our $VERSION = '1.04'; 4 5# Map layer name to package that defines it 6our %alias; 7 8sub import 9{ 10 my $class = shift; 11 while (@_) 12 { 13 my $layer = shift; 14 if (exists $alias{$layer}) 15 { 16 $layer = $alias{$layer} 17 } 18 else 19 { 20 $layer = "${class}::$layer"; 21 } 22 eval "require $layer"; 23 warn $@ if $@; 24 } 25} 26 27sub F_UTF8 () { 0x8000 } 28 291; 30__END__ 31 32=head1 NAME 33 34PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space 35 36=head1 SYNOPSIS 37 38 open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files 39 40 open($fh,"<","his.jpg"); # portably open a binary file for reading 41 binmode($fh); 42 43 Shell: 44 PERLIO=perlio perl .... 45 46=head1 DESCRIPTION 47 48When an undefined layer 'foo' is encountered in an C<open> or 49C<binmode> layer specification then C code performs the equivalent of: 50 51 use PerlIO 'foo'; 52 53The perl code in PerlIO.pm then attempts to locate a layer by doing 54 55 require PerlIO::foo; 56 57Otherwise the C<PerlIO> package is a place holder for additional 58PerlIO related functions. 59 60The following layers are currently defined: 61 62=over 4 63 64=item :unix 65 66Lowest level layer which provides basic PerlIO operations in terms of 67UNIX/POSIX numeric file descriptor calls 68(open(), read(), write(), lseek(), close()). 69 70=item :stdio 71 72Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note 73that as this is "real" stdio it will ignore any layers beneath it and 74got straight to the operating system via the C library as usual. 75 76=item :perlio 77 78A from scratch implementation of buffering for PerlIO. Provides fast 79access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> 80and in general attempts to minimize data copying. 81 82C<:perlio> will insert a C<:unix> layer below itself to do low level IO. 83 84=item :crlf 85 86A layer that implements DOS/Windows like CRLF line endings. On read 87converts pairs of CR,LF to a single "\n" newline character. On write 88converts each "\n" to a CR,LF pair. Note that this layer likes to be 89one of its kind: it silently ignores attempts to be pushed into the 90layer stack more than once. 91 92It currently does I<not> mimic MS-DOS as far as treating of Control-Z 93as being an end-of-file marker. 94 95(Gory details follow) To be more exact what happens is this: after 96pushing itself to the stack, the C<:crlf> layer checks all the layers 97below itself to find the first layer that is capable of being a CRLF 98layer but is not yet enabled to be a CRLF layer. If it finds such a 99layer, it enables the CRLFness of that other deeper layer, and then 100pops itself off the stack. If not, fine, use the one we just pushed. 101 102The end result is that a C<:crlf> means "please enable the first CRLF 103layer you can find, and if you can't find one, here would be a good 104spot to place a new one." 105 106Based on the C<:perlio> layer. 107 108=item :mmap 109 110A layer which implements "reading" of files by using C<mmap()> to 111make (whole) file appear in the process's address space, and then 112using that as PerlIO's "buffer". This I<may> be faster in certain 113circumstances for large files, and may result in less physical memory 114use when multiple processes are reading the same file. 115 116Files which are not C<mmap()>-able revert to behaving like the C<:perlio> 117layer. Writes also behave like C<:perlio> layer as C<mmap()> for write 118needs extra house-keeping (to extend the file) which negates any advantage. 119 120The C<:mmap> layer will not exist if platform does not support C<mmap()>. 121 122=item :utf8 123 124Declares that the stream accepts perl's I<internal> encoding of 125characters. (Which really is UTF-8 on ASCII machines, but is 126UTF-EBCDIC on EBCDIC machines.) This allows any character perl can 127represent to be read from or written to the stream. The UTF-X encoding 128is chosen to render simple text parts (i.e. non-accented letters, 129digits and common punctuation) human readable in the encoded file. 130 131Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) 132and then read it back in. 133 134 open(F, ">:utf8", "data.utf"); 135 print F $out; 136 close(F); 137 138 open(F, "<:utf8", "data.utf"); 139 $in = <F>; 140 close(F); 141 142Note that this layer does not validate byte sequences. For reading 143input, using C<:encoding(utf8)> instead of bare C<:utf8>, is strongly 144recommended. 145 146=item :bytes 147 148This is the inverse of C<:utf8> layer. It turns off the flag 149on the layer below so that data read from it is considered to 150be "octets" i.e. characters in range 0..255 only. Likewise 151on output perl will warn if a "wide" character is written 152to a such a stream. 153 154=item :raw 155 156The C<:raw> layer is I<defined> as being identical to calling 157C<binmode($fh)> - the stream is made suitable for passing binary data 158i.e. each byte is passed as-is. The stream will still be 159buffered. 160 161In Perl 5.6 and some books the C<:raw> layer (previously sometimes also 162referred to as a "discipline") is documented as the inverse of the 163C<:crlf> layer. That is no longer the case - other layers which would 164alter binary nature of the stream are also disabled. If you want UNIX 165line endings on a platform that normally does CRLF translation, but still 166want UTF-8 or encoding defaults the appropriate thing to do is to add 167C<:perlio> to PERLIO environment variable. 168 169The implementation of C<:raw> is as a pseudo-layer which when "pushed" 170pops itself and then any layers which do not declare themselves as suitable 171for binary data. (Undoing :utf8 and :crlf are implemented by clearing 172flags rather than popping layers but that is an implementation detail.) 173 174As a consequence of the fact that C<:raw> normally pops layers 175it usually only makes sense to have it as the only or first element in 176a layer specification. When used as the first element it provides 177a known base on which to build e.g. 178 179 open($fh,":raw:utf8",...) 180 181will construct a "binary" stream, but then enable UTF-8 translation. 182 183=item :pop 184 185A pseudo layer that removes the top-most layer. Gives perl code 186a way to manipulate the layer stack. Should be considered 187as experimental. Note that C<:pop> only works on real layers 188and will not undo the effects of pseudo layers like C<:utf8>. 189An example of a possible use might be: 190 191 open($fh,...) 192 ... 193 binmode($fh,":encoding(...)"); # next chunk is encoded 194 ... 195 binmode($fh,":pop"); # back to un-encoded 196 197A more elegant (and safer) interface is needed. 198 199=item :win32 200 201On Win32 platforms this I<experimental> layer uses native "handle" IO 202rather than unix-like numeric file descriptor layer. Known to be 203buggy as of perl 5.8.2. 204 205=back 206 207=head2 Custom Layers 208 209It is possible to write custom layers in addition to the above builtin 210ones, both in C/XS and Perl. Two such layers (and one example written 211in Perl using the latter) come with the Perl distribution. 212 213=over 4 214 215=item :encoding 216 217Use C<:encoding(ENCODING)> either in open() or binmode() to install 218a layer that does transparently character set and encoding transformations, 219for example from Shift-JIS to Unicode. Note that under C<stdio> 220an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> 221for more information. 222 223=item :via 224 225Use C<:via(MODULE)> either in open() or binmode() to install a layer 226that does whatever transformation (for example compression / 227decompression, encryption / decryption) to the filehandle. 228See L<PerlIO::via> for more information. 229 230=back 231 232=head2 Alternatives to raw 233 234To get a binary stream an alternate method is to use: 235 236 open($fh,"whatever") 237 binmode($fh); 238 239this has advantage of being backward compatible with how such things have 240had to be coded on some platforms for years. 241 242To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>) 243in the open call: 244 245 open($fh,"<:unix",$path) 246 247=head2 Defaults and how to override them 248 249If the platform is MS-DOS like and normally does CRLF to "\n" 250translation for text files then the default layers are : 251 252 unix crlf 253 254(The low level "unix" layer may be replaced by a platform specific low 255level layer.) 256 257Otherwise if C<Configure> found out how to do "fast" IO using system's 258stdio, then the default layers are: 259 260 unix stdio 261 262Otherwise the default layers are 263 264 unix perlio 265 266These defaults may change once perlio has been better tested and tuned. 267 268The default can be overridden by setting the environment variable 269PERLIO to a space separated list of layers (C<unix> or platform low 270level layer is always pushed first). 271 272This can be used to see the effect of/bugs in the various layers e.g. 273 274 cd .../perl/t 275 PERLIO=stdio ./perl harness 276 PERLIO=perlio ./perl harness 277 278For the various value of PERLIO see L<perlrun/PERLIO>. 279 280=head2 Querying the layers of filehandles 281 282The following returns the B<names> of the PerlIO layers on a filehandle. 283 284 my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". 285 286The layers are returned in the order an open() or binmode() call would 287use them. Note that the "default stack" depends on the operating 288system and on the Perl version, and both the compile-time and 289runtime configurations of Perl. 290 291The following table summarizes the default layers on UNIX-like and 292DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>: 293 294 PERLIO UNIX-like DOS-like 295 ------ --------- -------- 296 unset / "" unix perlio / stdio [1] unix crlf 297 stdio unix perlio / stdio [1] stdio 298 perlio unix perlio unix perlio 299 mmap unix mmap unix mmap 300 301 # [1] "stdio" if Configure found out how to do "fast stdio" (depends 302 # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" 303 304By default the layers from the input side of the filehandle is 305returned, to get the output side use the optional C<output> argument: 306 307 my @layers = PerlIO::get_layers($fh, output => 1); 308 309(Usually the layers are identical on either side of a filehandle but 310for example with sockets there may be differences, or if you have 311been using the C<open> pragma.) 312 313There is no set_layers(), nor does get_layers() return a tied array 314mirroring the stack, or anything fancy like that. This is not 315accidental or unintentional. The PerlIO layer stack is a bit more 316complicated than just a stack (see for example the behaviour of C<:raw>). 317You are supposed to use open() and binmode() to manipulate the stack. 318 319B<Implementation details follow, please close your eyes.> 320 321The arguments to layers are by default returned in parenthesis after 322the name of the layer, and certain layers (like C<utf8>) are not real 323layers but instead flags on real layers: to get all of these returned 324separately use the optional C<details> argument: 325 326 my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); 327 328The result will be up to be three times the number of layers: 329the first element will be a name, the second element the arguments 330(unspecified arguments will be C<undef>), the third element the flags, 331the fourth element a name again, and so forth. 332 333B<You may open your eyes now.> 334 335=head1 AUTHOR 336 337Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> 338 339=head1 SEE ALSO 340 341L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, 342L<Encode> 343 344=cut 345