xref: /openbsd-src/gnu/usr.bin/perl/lib/PerlIO.pm (revision 8500990981f885cbe5e6a4958549cacc238b5ae6)
1package PerlIO;
2
3our $VERSION = '1.02';
4
5# Map layer name to package that defines it
6our %alias;
7
8sub import
9{
10 my $class = shift;
11 while (@_)
12  {
13   my $layer = shift;
14   if (exists $alias{$layer})
15    {
16     $layer = $alias{$layer}
17    }
18   else
19    {
20     $layer = "${class}::$layer";
21    }
22   eval "require $layer";
23   warn $@ if $@;
24  }
25}
26
27sub F_UTF8 () { 0x8000 }
28
291;
30__END__
31
32=head1 NAME
33
34PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space
35
36=head1 SYNOPSIS
37
38  open($fh,"<:crlf", "my.txt"); # portably open a text file for reading
39
40  open($fh,"<","his.jpg");      # portably open a binary file for reading
41  binmode($fh);
42
43  Shell:
44    PERLIO=perlio perl ....
45
46=head1 DESCRIPTION
47
48When an undefined layer 'foo' is encountered in an C<open> or
49C<binmode> layer specification then C code performs the equivalent of:
50
51  use PerlIO 'foo';
52
53The perl code in PerlIO.pm then attempts to locate a layer by doing
54
55  require PerlIO::foo;
56
57Otherwise the C<PerlIO> package is a place holder for additional
58PerlIO related functions.
59
60The following layers are currently defined:
61
62=over 4
63
64=item unix
65
66Low level layer which calls C<read>, C<write> and C<lseek> etc.
67
68=item stdio
69
70Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc.  Note
71that as this is "real" stdio it will ignore any layers beneath it and
72got straight to the operating system via the C library as usual.
73
74=item perlio
75
76This is a re-implementation of "stdio-like" buffering written as a
77PerlIO "layer".  As such it will call whatever layer is below it for
78its operations.
79
80=item crlf
81
82A layer which does CRLF to "\n" translation distinguishing "text" and
83"binary" files in the manner of MS-DOS and similar operating systems.
84(It currently does I<not> mimic MS-DOS as far as treating of Control-Z
85as being an end-of-file marker.)
86
87=item utf8
88
89Declares that the stream accepts perl's internal encoding of
90characters.  (Which really is UTF-8 on ASCII machines, but is
91UTF-EBCDIC on EBCDIC machines.)  This allows any character perl can
92represent to be read from or written to the stream. The UTF-X encoding
93is chosen to render simple text parts (i.e.  non-accented letters,
94digits and common punctuation) human readable in the encoded file.
95
96Here is how to write your native data out using UTF-8 (or UTF-EBCDIC)
97and then read it back in.
98
99	open(F, ">:utf8", "data.utf");
100	print F $out;
101	close(F);
102
103	open(F, "<:utf8", "data.utf");
104	$in = <F>;
105	close(F);
106
107=item bytes
108
109This is the inverse of C<:utf8> layer. It turns off the flag
110on the layer below so that data read from it is considered to
111be "octets" i.e. characters in range 0..255 only. Likewise
112on output perl will warn if a "wide" character is written
113to a such a stream.
114
115=item raw
116
117The C<:raw> layer is I<defined> as being identical to calling
118C<binmode($fh)> - the stream is made suitable for passing binary data
119i.e. each byte is passed as-is. The stream will still be
120buffered. Unlike in the earlier versions of Perl C<:raw> is I<not>
121just the inverse of C<:crlf> - other layers which would affect the
122binary nature of the stream are also removed or disabled.
123
124The implementation of C<:raw> is as a pseudo-layer which when "pushed"
125pops itself and then any layers which do not declare themselves as suitable
126for binary data. (Undoing :utf8 and :crlf are implemented by clearing
127flags rather than popping layers but that is an implementation detail.)
128
129As a consequence of the fact that C<:raw> normally pops layers
130it usually only makes sense to have it as the only or first element in
131a layer specification.  When used as the first element it provides
132a known base on which to build e.g.
133
134    open($fh,":raw:utf8",...)
135
136will construct a "binary" stream, but then enable UTF-8 translation.
137
138=item pop
139
140A pseudo layer that removes the top-most layer. Gives perl code
141a way to manipulate the layer stack. Should be considered
142as experimental. Note that C<:pop> only works on real layers
143and will not undo the effects of pseudo layers like C<:utf8>.
144An example of a possible use might be:
145
146    open($fh,...)
147    ...
148    binmode($fh,":encoding(...)");  # next chunk is encoded
149    ...
150    binmode($fh,":pop");            # back to un-encocded
151
152A more elegant (and safer) interface is needed.
153
154=back
155
156=head2 Custom Layers
157
158It is possible to write custom layers in addition to the above builtin
159ones, both in C/XS and Perl.  Two such layers (and one example written
160in Perl using the latter) come with the Perl distribution.
161
162=over 4
163
164=item :encoding
165
166Use C<:encoding(ENCODING)> either in open() or binmode() to install
167a layer that does transparently character set and encoding transformations,
168for example from Shift-JIS to Unicode.  Note that under C<stdio>
169an C<:encoding> also enables C<:utf8>.  See L<PerlIO::encoding>
170for more information.
171
172=item :via
173
174Use C<:via(MODULE)> either in open() or binmode() to install a layer
175that does whatever transformation (for example compression /
176decompression, encryption / decryption) to the filehandle.
177See L<PerlIO::via> for more information.
178
179=back
180
181=head2 Alternatives to raw
182
183To get a binary stream an alternate method is to use:
184
185    open($fh,"whatever")
186    binmode($fh);
187
188this has advantage of being backward compatible with how such things have
189had to be coded on some platforms for years.
190
191To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>)
192in the open call:
193
194    open($fh,"<:unix",$path)
195
196=head2 Defaults and how to override them
197
198If the platform is MS-DOS like and normally does CRLF to "\n"
199translation for text files then the default layers are :
200
201  unix crlf
202
203(The low level "unix" layer may be replaced by a platform specific low
204level layer.)
205
206Otherwise if C<Configure> found out how to do "fast" IO using system's
207stdio, then the default layers are:
208
209  unix stdio
210
211Otherwise the default layers are
212
213  unix perlio
214
215These defaults may change once perlio has been better tested and tuned.
216
217The default can be overridden by setting the environment variable
218PERLIO to a space separated list of layers (C<unix> or platform low
219level layer is always pushed first).
220
221This can be used to see the effect of/bugs in the various layers e.g.
222
223  cd .../perl/t
224  PERLIO=stdio  ./perl harness
225  PERLIO=perlio ./perl harness
226
227For the various value of PERLIO see L<perlrun/PERLIO>.
228
229=head2 Querying the layers of filehandles
230
231The following returns the B<names> of the PerlIO layers on a filehandle.
232
233   my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH".
234
235The layers are returned in the order an open() or binmode() call would
236use them.  Note that the "default stack" depends on the operating
237system and on the Perl version, and both the compile-time and
238runtime configurations of Perl.
239
240The following table summarizes the default layers on UNIX-like and
241DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>:
242
243 PERLIO     UNIX-like                   DOS-like
244
245 unset / "" unix perlio / stdio [1]     unix crlf
246 stdio      unix perlio / stdio [1]     stdio
247 perlio     unix perlio                 unix perlio
248 mmap       unix mmap                   unix mmap
249
250 # [1] "stdio" if Configure found out how to do "fast stdio" (depends
251 # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio"
252
253By default the layers from the input side of the filehandle is
254returned, to get the output side use the optional C<output> argument:
255
256   my @layers = PerlIO::get_layers($fh, output => 1);
257
258(Usually the layers are identical on either side of a filehandle but
259for example with sockets there may be differences, or if you have
260been using the C<open> pragma.)
261
262There is no set_layers(), nor does get_layers() return a tied array
263mirroring the stack, or anything fancy like that.  This is not
264accidental or unintentional.  The PerlIO layer stack is a bit more
265complicated than just a stack (see for example the behaviour of C<:raw>).
266You are supposed to use open() and binmode() to manipulate the stack.
267
268B<Implementation details follow, please close your eyes.>
269
270The arguments to layers are by default returned in parenthesis after
271the name of the layer, and certain layers (like C<utf8>) are not real
272layers but instead flags on real layers: to get all of these returned
273separately use the optional C<separate> argument:
274
275   my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1);
276
277The result will be up to be three times the number of layers:
278the first element will be a name, the second element the arguments
279(unspecified arguments will be C<undef>), the third element the flags,
280the fourth element a name again, and so forth.
281
282B<You may open your eyes now.>
283
284=head1 AUTHOR
285
286Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt>
287
288=head1 SEE ALSO
289
290L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>,
291L<Encode>
292
293=cut
294
295