xref: /onnv-gate/usr/src/cmd/perl/5.8.4/distrib/pod/perlmod.pod (revision 0:68f95e015346)
1=head1 NAME
2
3perlmod - Perl modules (packages and symbol tables)
4
5=head1 DESCRIPTION
6
7=head2 Packages
8
9Perl provides a mechanism for alternative namespaces to protect
10packages from stomping on each other's variables.  In fact, there's
11really no such thing as a global variable in Perl.  The package
12statement declares the compilation unit as being in the given
13namespace.  The scope of the package declaration is from the
14declaration itself through the end of the enclosing block, C<eval>,
15or file, whichever comes first (the same scope as the my() and
16local() operators).  Unqualified dynamic identifiers will be in
17this namespace, except for those few identifiers that if unqualified,
18default to the main package instead of the current one as described
19below.  A package statement affects only dynamic variables--including
20those you've used local() on--but I<not> lexical variables created
21with my().  Typically it would be the first declaration in a file
22included by the C<do>, C<require>, or C<use> operators.  You can
23switch into a package in more than one place; it merely influences
24which symbol table is used by the compiler for the rest of that
25block.  You can refer to variables and filehandles in other packages
26by prefixing the identifier with the package name and a double
27colon: C<$Package::Variable>.  If the package name is null, the
28C<main> package is assumed.  That is, C<$::sail> is equivalent to
29C<$main::sail>.
30
31The old package delimiter was a single quote, but double colon is now the
32preferred delimiter, in part because it's more readable to humans, and
33in part because it's more readable to B<emacs> macros.  It also makes C++
34programmers feel like they know what's going on--as opposed to using the
35single quote as separator, which was there to make Ada programmers feel
36like they knew what was going on.  Because the old-fashioned syntax is still
37supported for backwards compatibility, if you try to use a string like
38C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is,
39the $s variable in package C<owner>, which is probably not what you meant.
40Use braces to disambiguate, as in C<"This is ${owner}'s house">.
41
42Packages may themselves contain package separators, as in
43C<$OUTER::INNER::var>.  This implies nothing about the order of
44name lookups, however.  There are no relative packages: all symbols
45are either local to the current package, or must be fully qualified
46from the outer package name down.  For instance, there is nowhere
47within package C<OUTER> that C<$INNER::var> refers to
48C<$OUTER::INNER::var>.  C<INNER> refers to a totally
49separate global package.
50
51Only identifiers starting with letters (or underscore) are stored
52in a package's symbol table.  All other symbols are kept in package
53C<main>, including all punctuation variables, like $_.  In addition,
54when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV,
55ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>,
56even when used for other purposes than their built-in ones.  If you
57have a package called C<m>, C<s>, or C<y>, then you can't use the
58qualified form of an identifier because it would be instead interpreted
59as a pattern match, a substitution, or a transliteration.
60
61Variables beginning with underscore used to be forced into package
62main, but we decided it was more useful for package writers to be able
63to use leading underscore to indicate private variables and method names.
64However, variables and functions named with a single C<_>, such as
65$_ and C<sub _>, are still forced into the package C<main>.  See also
66L<perlvar/"Technical Note on the Syntax of Variable Names">.
67
68C<eval>ed strings are compiled in the package in which the eval() was
69compiled.  (Assignments to C<$SIG{}>, however, assume the signal
70handler specified is in the C<main> package.  Qualify the signal handler
71name if you wish to have a signal handler in a package.)  For an
72example, examine F<perldb.pl> in the Perl library.  It initially switches
73to the C<DB> package so that the debugger doesn't interfere with variables
74in the program you are trying to debug.  At various points, however, it
75temporarily switches back to the C<main> package to evaluate various
76expressions in the context of the C<main> package (or wherever you came
77from).  See L<perldebug>.
78
79The special symbol C<__PACKAGE__> contains the current package, but cannot
80(easily) be used to construct variable names.
81
82See L<perlsub> for other scoping issues related to my() and local(),
83and L<perlref> regarding closures.
84
85=head2 Symbol Tables
86
87The symbol table for a package happens to be stored in the hash of that
88name with two colons appended.  The main symbol table's name is thus
89C<%main::>, or C<%::> for short.  Likewise the symbol table for the nested
90package mentioned earlier is named C<%OUTER::INNER::>.
91
92The value in each entry of the hash is what you are referring to when you
93use the C<*name> typeglob notation.  In fact, the following have the same
94effect, though the first is more efficient because it does the symbol
95table lookups at compile time:
96
97    local *main::foo    = *main::bar;
98    local $main::{foo}  = $main::{bar};
99
100(Be sure to note the B<vast> difference between the second line above
101and C<local $main::foo = $main::bar>. The former is accessing the hash
102C<%main::>, which is the symbol table of package C<main>. The latter is
103simply assigning scalar C<$bar> in package C<main> to scalar C<$foo> of
104the same package.)
105
106You can use this to print out all the variables in a package, for
107instance.  The standard but antiquated F<dumpvar.pl> library and
108the CPAN module Devel::Symdump make use of this.
109
110Assignment to a typeglob performs an aliasing operation, i.e.,
111
112    *dick = *richard;
113
114causes variables, subroutines, formats, and file and directory handles
115accessible via the identifier C<richard> also to be accessible via the
116identifier C<dick>.  If you want to alias only a particular variable or
117subroutine, assign a reference instead:
118
119    *dick = \$richard;
120
121Which makes $richard and $dick the same variable, but leaves
122@richard and @dick as separate arrays.  Tricky, eh?
123
124There is one subtle difference between the following statements:
125
126    *foo = *bar;
127    *foo = \$bar;
128
129C<*foo = *bar> makes the typeglobs themselves synonymous while
130C<*foo = \$bar> makes the SCALAR portions of two distinct typeglobs
131refer to the same scalar value. This means that the following code:
132
133    $bar = 1;
134    *foo = \$bar;       # Make $foo an alias for $bar
135
136    {
137        local $bar = 2; # Restrict changes to block
138        print $foo;     # Prints '1'!
139    }
140
141Would print '1', because C<$foo> holds a reference to the I<original>
142C<$bar> -- the one that was stuffed away by C<local()> and which will be
143restored when the block ends. Because variables are accessed through the
144typeglob, you can use C<*foo = *bar> to create an alias which can be
145localized. (But be aware that this means you can't have a separate
146C<@foo> and C<@bar>, etc.)
147
148What makes all of this important is that the Exporter module uses glob
149aliasing as the import/export mechanism. Whether or not you can properly
150localize a variable that has been exported from a module depends on how
151it was exported:
152
153    @EXPORT = qw($FOO); # Usual form, can't be localized
154    @EXPORT = qw(*FOO); # Can be localized
155
156You can work around the first case by using the fully qualified name
157(C<$Package::FOO>) where you need a local value, or by overriding it
158by saying C<*FOO = *Package::FOO> in your script.
159
160The C<*x = \$y> mechanism may be used to pass and return cheap references
161into or from subroutines if you don't want to copy the whole
162thing.  It only works when assigning to dynamic variables, not
163lexicals.
164
165    %some_hash = ();			# can't be my()
166    *some_hash = fn( \%another_hash );
167    sub fn {
168	local *hashsym = shift;
169	# now use %hashsym normally, and you
170	# will affect the caller's %another_hash
171	my %nhash = (); # do what you want
172	return \%nhash;
173    }
174
175On return, the reference will overwrite the hash slot in the
176symbol table specified by the *some_hash typeglob.  This
177is a somewhat tricky way of passing around references cheaply
178when you don't want to have to remember to dereference variables
179explicitly.
180
181Another use of symbol tables is for making "constant" scalars.
182
183    *PI = \3.14159265358979;
184
185Now you cannot alter C<$PI>, which is probably a good thing all in all.
186This isn't the same as a constant subroutine, which is subject to
187optimization at compile-time.  A constant subroutine is one prototyped
188to take no arguments and to return a constant expression.  See
189L<perlsub> for details on these.  The C<use constant> pragma is a
190convenient shorthand for these.
191
192You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and
193package the *foo symbol table entry comes from.  This may be useful
194in a subroutine that gets passed typeglobs as arguments:
195
196    sub identify_typeglob {
197        my $glob = shift;
198        print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n";
199    }
200    identify_typeglob *foo;
201    identify_typeglob *bar::baz;
202
203This prints
204
205    You gave me main::foo
206    You gave me bar::baz
207
208The C<*foo{THING}> notation can also be used to obtain references to the
209individual elements of *foo.  See L<perlref>.
210
211Subroutine definitions (and declarations, for that matter) need
212not necessarily be situated in the package whose symbol table they
213occupy.  You can define a subroutine outside its package by
214explicitly qualifying the name of the subroutine:
215
216    package main;
217    sub Some_package::foo { ... }   # &foo defined in Some_package
218
219This is just a shorthand for a typeglob assignment at compile time:
220
221    BEGIN { *Some_package::foo = sub { ... } }
222
223and is I<not> the same as writing:
224
225    {
226	package Some_package;
227	sub foo { ... }
228    }
229
230In the first two versions, the body of the subroutine is
231lexically in the main package, I<not> in Some_package. So
232something like this:
233
234    package main;
235
236    $Some_package::name = "fred";
237    $main::name = "barney";
238
239    sub Some_package::foo {
240	print "in ", __PACKAGE__, ": \$name is '$name'\n";
241    }
242
243    Some_package::foo();
244
245prints:
246
247    in main: $name is 'barney'
248
249rather than:
250
251    in Some_package: $name is 'fred'
252
253This also has implications for the use of the SUPER:: qualifier
254(see L<perlobj>).
255
256=head2 BEGIN, CHECK, INIT and END
257
258Four specially named code blocks are executed at the beginning and at the end
259of a running Perl program.  These are the C<BEGIN>, C<CHECK>, C<INIT>, and
260C<END> blocks.
261
262These code blocks can be prefixed with C<sub> to give the appearance of a
263subroutine (although this is not considered good style).  One should note
264that these code blocks don't really exist as named subroutines (despite
265their appearance). The thing that gives this away is the fact that you can
266have B<more than one> of these code blocks in a program, and they will get
267B<all> executed at the appropriate moment.  So you can't execute any of
268these code blocks by name.
269
270A C<BEGIN> code block is executed as soon as possible, that is, the moment
271it is completely defined, even before the rest of the containing file (or
272string) is parsed.  You may have multiple C<BEGIN> blocks within a file (or
273eval'ed string) -- they will execute in order of definition.  Because a C<BEGIN>
274code block executes immediately, it can pull in definitions of subroutines
275and such from other files in time to be visible to the rest of the compile
276and run time.  Once a C<BEGIN> has run, it is immediately undefined and any
277code it used is returned to Perl's memory pool.
278
279It should be noted that C<BEGIN> code blocks B<are> executed inside string
280C<eval()>'s.  The C<CHECK> and C<INIT> code blocks are B<not> executed inside
281a string eval, which e.g. can be a problem in a mod_perl environment.
282
283An C<END> code block is executed as late as possible, that is, after
284perl has finished running the program and just before the interpreter
285is being exited, even if it is exiting as a result of a die() function.
286(But not if it's polymorphing into another program via C<exec>, or
287being blown out of the water by a signal--you have to trap that yourself
288(if you can).)  You may have multiple C<END> blocks within a file--they
289will execute in reverse order of definition; that is: last in, first
290out (LIFO).  C<END> blocks are not executed when you run perl with the
291C<-c> switch, or if compilation fails.
292
293Note that C<END> code blocks are B<not> executed at the end of a string
294C<eval()>: if any C<END> code blocks are created in a string C<eval()>,
295they will be executed just as any other C<END> code block of that package
296in LIFO order just before the interpreter is being exited.
297
298Inside an C<END> code block, C<$?> contains the value that the program is
299going to pass to C<exit()>.  You can modify C<$?> to change the exit
300value of the program.  Beware of changing C<$?> by accident (e.g. by
301running something via C<system>).
302
303C<CHECK> and C<INIT> code blocks are useful to catch the transition between
304the compilation phase and the execution phase of the main program.
305
306C<CHECK> code blocks are run just after the B<initial> Perl compile phase ends
307and before the run time begins, in LIFO order.  C<CHECK> code blocks are used
308in the Perl compiler suite to save the compiled state of the program.
309
310C<INIT> blocks are run just before the Perl runtime begins execution, in
311"first in, first out" (FIFO) order. For example, the code generators
312documented in L<perlcc> make use of C<INIT> blocks to initialize and
313resolve pointers to XSUBs.
314
315When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and
316C<END> work just as they do in B<awk>, as a degenerate case.
317Both C<BEGIN> and C<CHECK> blocks are run when you use the B<-c>
318switch for a compile-only syntax check, although your main code
319is not.
320
321The B<begincheck> program makes it all clear, eventually:
322
323  #!/usr/bin/perl
324
325  # begincheck
326
327  print         " 8. Ordinary code runs at runtime.\n";
328
329  END { print   "14.   So this is the end of the tale.\n" }
330  INIT { print  " 5. INIT blocks run FIFO just before runtime.\n" }
331  CHECK { print " 4.   So this is the fourth line.\n" }
332
333  print         " 9.   It runs in order, of course.\n";
334
335  BEGIN { print " 1. BEGIN blocks run FIFO during compilation.\n" }
336  END { print   "13.   Read perlmod for the rest of the story.\n" }
337  CHECK { print " 3. CHECK blocks run LIFO at compilation's end.\n" }
338  INIT { print  " 6.   Run this again, using Perl's -c switch.\n" }
339
340  print         "10.   This is anti-obfuscated code.\n";
341
342  END { print   "12. END blocks run LIFO at quitting time.\n" }
343  BEGIN { print " 2.   So this line comes out second.\n" }
344  INIT { print  " 7.   You'll see the difference right away.\n" }
345
346  print         "11.   It merely _looks_ like it should be confusing.\n";
347
348  __END__
349
350=head2 Perl Classes
351
352There is no special class syntax in Perl, but a package may act
353as a class if it provides subroutines to act as methods.  Such a
354package may also derive some of its methods from another class (package)
355by listing the other package name(s) in its global @ISA array (which
356must be a package global, not a lexical).
357
358For more on this, see L<perltoot> and L<perlobj>.
359
360=head2 Perl Modules
361
362A module is just a set of related functions in a library file, i.e.,
363a Perl package with the same name as the file.  It is specifically
364designed to be reusable by other modules or programs.  It may do this
365by providing a mechanism for exporting some of its symbols into the
366symbol table of any package using it, or it may function as a class
367definition and make its semantics available implicitly through
368method calls on the class and its objects, without explicitly
369exporting anything.  Or it can do a little of both.
370
371For example, to start a traditional, non-OO module called Some::Module,
372create a file called F<Some/Module.pm> and start with this template:
373
374    package Some::Module;  # assumes Some/Module.pm
375
376    use strict;
377    use warnings;
378
379    BEGIN {
380        use Exporter   ();
381        our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
382
383        # set the version for version checking
384        $VERSION     = 1.00;
385        # if using RCS/CVS, this may be preferred
386        $VERSION = sprintf "%d.%03d", q$Revision: 1.1 $ =~ /(\d+)/g;
387
388        @ISA         = qw(Exporter);
389        @EXPORT      = qw(&func1 &func2 &func4);
390        %EXPORT_TAGS = ( );     # eg: TAG => [ qw!name1 name2! ],
391
392        # your exported package globals go here,
393        # as well as any optionally exported functions
394        @EXPORT_OK   = qw($Var1 %Hashit &func3);
395    }
396    our @EXPORT_OK;
397
398    # exported package globals go here
399    our $Var1;
400    our %Hashit;
401
402    # non-exported package globals go here
403    our @more;
404    our $stuff;
405
406    # initialize package globals, first exported ones
407    $Var1   = '';
408    %Hashit = ();
409
410    # then the others (which are still accessible as $Some::Module::stuff)
411    $stuff  = '';
412    @more   = ();
413
414    # all file-scoped lexicals must be created before
415    # the functions below that use them.
416
417    # file-private lexicals go here
418    my $priv_var    = '';
419    my %secret_hash = ();
420
421    # here's a file-private function as a closure,
422    # callable as &$priv_func;  it cannot be prototyped.
423    my $priv_func = sub {
424        # stuff goes here.
425    };
426
427    # make all your functions, whether exported or not;
428    # remember to put something interesting in the {} stubs
429    sub func1      {}    # no prototype
430    sub func2()    {}    # proto'd void
431    sub func3($$)  {}    # proto'd to 2 scalars
432
433    # this one isn't exported, but could be called!
434    sub func4(\%)  {}    # proto'd to 1 hash ref
435
436    END { }       # module clean-up code here (global destructor)
437
438    ## YOUR CODE GOES HERE
439
440    1;  # don't forget to return a true value from the file
441
442Then go on to declare and use your variables in functions without
443any qualifications.  See L<Exporter> and the L<perlmodlib> for
444details on mechanics and style issues in module creation.
445
446Perl modules are included into your program by saying
447
448    use Module;
449
450or
451
452    use Module LIST;
453
454This is exactly equivalent to
455
456    BEGIN { require Module; import Module; }
457
458or
459
460    BEGIN { require Module; import Module LIST; }
461
462As a special case
463
464    use Module ();
465
466is exactly equivalent to
467
468    BEGIN { require Module; }
469
470All Perl module files have the extension F<.pm>.  The C<use> operator
471assumes this so you don't have to spell out "F<Module.pm>" in quotes.
472This also helps to differentiate new modules from old F<.pl> and
473F<.ph> files.  Module names are also capitalized unless they're
474functioning as pragmas; pragmas are in effect compiler directives,
475and are sometimes called "pragmatic modules" (or even "pragmata"
476if you're a classicist).
477
478The two statements:
479
480    require SomeModule;
481    require "SomeModule.pm";
482
483differ from each other in two ways.  In the first case, any double
484colons in the module name, such as C<Some::Module>, are translated
485into your system's directory separator, usually "/".   The second
486case does not, and would have to be specified literally.  The other
487difference is that seeing the first C<require> clues in the compiler
488that uses of indirect object notation involving "SomeModule", as
489in C<$ob = purge SomeModule>, are method calls, not function calls.
490(Yes, this really can make a difference.)
491
492Because the C<use> statement implies a C<BEGIN> block, the importing
493of semantics happens as soon as the C<use> statement is compiled,
494before the rest of the file is compiled.  This is how it is able
495to function as a pragma mechanism, and also how modules are able to
496declare subroutines that are then visible as list or unary operators for
497the rest of the current file.  This will not work if you use C<require>
498instead of C<use>.  With C<require> you can get into this problem:
499
500    require Cwd;		# make Cwd:: accessible
501    $here = Cwd::getcwd();
502
503    use Cwd;			# import names from Cwd::
504    $here = getcwd();
505
506    require Cwd;	    	# make Cwd:: accessible
507    $here = getcwd(); 		# oops! no main::getcwd()
508
509In general, C<use Module ()> is recommended over C<require Module>,
510because it determines module availability at compile time, not in the
511middle of your program's execution.  An exception would be if two modules
512each tried to C<use> each other, and each also called a function from
513that other module.  In that case, it's easy to use C<require> instead.
514
515Perl packages may be nested inside other package names, so we can have
516package names containing C<::>.  But if we used that package name
517directly as a filename it would make for unwieldy or impossible
518filenames on some systems.  Therefore, if a module's name is, say,
519C<Text::Soundex>, then its definition is actually found in the library
520file F<Text/Soundex.pm>.
521
522Perl modules always have a F<.pm> file, but there may also be
523dynamically linked executables (often ending in F<.so>) or autoloaded
524subroutine definitions (often ending in F<.al>) associated with the
525module.  If so, these will be entirely transparent to the user of
526the module.  It is the responsibility of the F<.pm> file to load
527(or arrange to autoload) any additional functionality.  For example,
528although the POSIX module happens to do both dynamic loading and
529autoloading, the user can say just C<use POSIX> to get it all.
530
531=head2 Making your module threadsafe
532
533Since 5.6.0, Perl has had support for a new type of threads called
534interpreter threads (ithreads). These threads can be used explicitly
535and implicitly.
536
537Ithreads work by cloning the data tree so that no data is shared
538between different threads. These threads can be used by using the C<threads>
539module or by doing fork() on win32 (fake fork() support). When a
540thread is cloned all Perl data is cloned, however non-Perl data cannot
541be cloned automatically.  Perl after 5.7.2 has support for the C<CLONE>
542special subroutine .  In C<CLONE> you can do whatever you need to do,
543like for example handle the cloning of non-Perl data, if necessary.
544C<CLONE> will be executed once for every package that has it defined
545(or inherits it).  It will be called in the context of the new thread,
546so all modifications are made in the new area.
547
548If you want to CLONE all objects you will need to keep track of them per
549package. This is simply done using a hash and Scalar::Util::weaken().
550
551=head1 SEE ALSO
552
553See L<perlmodlib> for general style issues related to building Perl
554modules and classes, as well as descriptions of the standard library
555and CPAN, L<Exporter> for how Perl's standard import/export mechanism
556works, L<perltoot> and L<perltooc> for an in-depth tutorial on
557creating classes, L<perlobj> for a hard-core reference document on
558objects, L<perlsub> for an explanation of functions and scoping,
559and L<perlxstut> and L<perlguts> for more information on writing
560extension modules.
561