1=head1 NAME 2 3perlsyn - Perl syntax 4 5=head1 DESCRIPTION 6 7A Perl script consists of a sequence of declarations and statements. 8The sequence of statements is executed just once, unlike in B<sed> 9and B<awk> scripts, where the sequence of statements is executed 10for each input line. While this means that you must explicitly 11loop over the lines of your input file (or files), it also means 12you have much more control over which files and which lines you look at. 13(Actually, I'm lying--it is possible to do an implicit loop with 14either the B<-n> or B<-p> switch. It's just not the mandatory 15default like it is in B<sed> and B<awk>.) 16 17Perl is, for the most part, a free-form language. (The only exception 18to this is format declarations, for obvious reasons.) Text from a 19C<"#"> character until the end of the line is a comment, and is 20ignored. If you attempt to use C</* */> C-style comments, it will be 21interpreted either as division or pattern matching, depending on the 22context, and C++ C<//> comments just look like a null regular 23expression, so don't do that. 24 25=head2 Declarations 26 27The only things you need to declare in Perl are report formats 28and subroutines--and even undefined subroutines can be handled 29through AUTOLOAD. A variable holds the undefined value (C<undef>) 30until it has been assigned a defined value, which is anything 31other than C<undef>. When used as a number, C<undef> is treated 32as C<0>; when used as a string, it is treated the empty string, 33C<"">; and when used as a reference that isn't being assigned 34to, it is treated as an error. If you enable warnings, you'll 35be notified of an uninitialized value whenever you treat C<undef> 36as a string or a number. Well, usually. Boolean ("don't-care") 37contexts and operators such as C<++>, C<-->, C<+=>, C<-=>, and 38C<.=> are always exempt from such warnings. 39 40A declaration can be put anywhere a statement can, but has no effect on 41the execution of the primary sequence of statements--declarations all 42take effect at compile time. Typically all the declarations are put at 43the beginning or the end of the script. However, if you're using 44lexically-scoped private variables created with C<my()>, you'll 45have to make sure 46your format or subroutine definition is within the same block scope 47as the my if you expect to be able to access those private variables. 48 49Declaring a subroutine allows a subroutine name to be used as if it were a 50list operator from that point forward in the program. You can declare a 51subroutine without defining it by saying C<sub name>, thus: 52 53 sub myname; 54 $me = myname $0 or die "can't get myname"; 55 56Note that myname() functions as a list operator, not as a unary operator; 57so be careful to use C<or> instead of C<||> in this case. However, if 58you were to declare the subroutine as C<sub myname ($)>, then 59C<myname> would function as a unary operator, so either C<or> or 60C<||> would work. 61 62Subroutines declarations can also be loaded up with the C<require> statement 63or both loaded and imported into your namespace with a C<use> statement. 64See L<perlmod> for details on this. 65 66A statement sequence may contain declarations of lexically-scoped 67variables, but apart from declaring a variable name, the declaration acts 68like an ordinary statement, and is elaborated within the sequence of 69statements as if it were an ordinary statement. That means it actually 70has both compile-time and run-time effects. 71 72=head2 Simple statements 73 74The only kind of simple statement is an expression evaluated for its 75side effects. Every simple statement must be terminated with a 76semicolon, unless it is the final statement in a block, in which case 77the semicolon is optional. (A semicolon is still encouraged there if the 78block takes up more than one line, because you may eventually add another line.) 79Note that there are some operators like C<eval {}> and C<do {}> that look 80like compound statements, but aren't (they're just TERMs in an expression), 81and thus need an explicit termination if used as the last item in a statement. 82 83Any simple statement may optionally be followed by a I<SINGLE> modifier, 84just before the terminating semicolon (or block ending). The possible 85modifiers are: 86 87 if EXPR 88 unless EXPR 89 while EXPR 90 until EXPR 91 foreach EXPR 92 93The C<if> and C<unless> modifiers have the expected semantics, 94presuming you're a speaker of English. The C<foreach> modifier is an 95iterator: For each value in EXPR, it aliases C<$_> to the value and 96executes the statement. The C<while> and C<until> modifiers have the 97usual "C<while> loop" semantics (conditional evaluated first), except 98when applied to a C<do>-BLOCK (or to the deprecated C<do>-SUBROUTINE 99statement), in which case the block executes once before the 100conditional is evaluated. This is so that you can write loops like: 101 102 do { 103 $line = <STDIN>; 104 ... 105 } until $line eq ".\n"; 106 107See L<perlfunc/do>. Note also that the loop control statements described 108later will I<NOT> work in this construct, because modifiers don't take 109loop labels. Sorry. You can always put another block inside of it 110(for C<next>) or around it (for C<last>) to do that sort of thing. 111For C<next>, just double the braces: 112 113 do {{ 114 next if $x == $y; 115 # do something here 116 }} until $x++ > $z; 117 118For C<last>, you have to be more elaborate: 119 120 LOOP: { 121 do { 122 last if $x = $y**2; 123 # do something here 124 } while $x++ <= $z; 125 } 126 127=head2 Compound statements 128 129In Perl, a sequence of statements that defines a scope is called a block. 130Sometimes a block is delimited by the file containing it (in the case 131of a required file, or the program as a whole), and sometimes a block 132is delimited by the extent of a string (in the case of an eval). 133 134But generally, a block is delimited by curly brackets, also known as braces. 135We will call this syntactic construct a BLOCK. 136 137The following compound statements may be used to control flow: 138 139 if (EXPR) BLOCK 140 if (EXPR) BLOCK else BLOCK 141 if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK 142 LABEL while (EXPR) BLOCK 143 LABEL while (EXPR) BLOCK continue BLOCK 144 LABEL for (EXPR; EXPR; EXPR) BLOCK 145 LABEL foreach VAR (LIST) BLOCK 146 LABEL foreach VAR (LIST) BLOCK continue BLOCK 147 LABEL BLOCK continue BLOCK 148 149Note that, unlike C and Pascal, these are defined in terms of BLOCKs, 150not statements. This means that the curly brackets are I<required>--no 151dangling statements allowed. If you want to write conditionals without 152curly brackets there are several other ways to do it. The following 153all do the same thing: 154 155 if (!open(FOO)) { die "Can't open $FOO: $!"; } 156 die "Can't open $FOO: $!" unless open(FOO); 157 open(FOO) or die "Can't open $FOO: $!"; # FOO or bust! 158 open(FOO) ? 'hi mom' : die "Can't open $FOO: $!"; 159 # a bit exotic, that last one 160 161The C<if> statement is straightforward. Because BLOCKs are always 162bounded by curly brackets, there is never any ambiguity about which 163C<if> an C<else> goes with. If you use C<unless> in place of C<if>, 164the sense of the test is reversed. 165 166The C<while> statement executes the block as long as the expression is 167true (does not evaluate to the null string C<""> or C<0> or C<"0">). 168The LABEL is optional, and if present, consists of an identifier followed 169by a colon. The LABEL identifies the loop for the loop control 170statements C<next>, C<last>, and C<redo>. 171If the LABEL is omitted, the loop control statement 172refers to the innermost enclosing loop. This may include dynamically 173looking back your call-stack at run time to find the LABEL. Such 174desperate behavior triggers a warning if you use the C<use warnings> 175pragma or the B<-w> flag. 176Unlike a C<foreach> statement, a C<while> statement never implicitly 177localises any variables. 178 179If there is a C<continue> BLOCK, it is always executed just before the 180conditional is about to be evaluated again, just like the third part of a 181C<for> loop in C. Thus it can be used to increment a loop variable, even 182when the loop has been continued via the C<next> statement (which is 183similar to the C C<continue> statement). 184 185=head2 Loop Control 186 187The C<next> command is like the C<continue> statement in C; it starts 188the next iteration of the loop: 189 190 LINE: while (<STDIN>) { 191 next LINE if /^#/; # discard comments 192 ... 193 } 194 195The C<last> command is like the C<break> statement in C (as used in 196loops); it immediately exits the loop in question. The 197C<continue> block, if any, is not executed: 198 199 LINE: while (<STDIN>) { 200 last LINE if /^$/; # exit when done with header 201 ... 202 } 203 204The C<redo> command restarts the loop block without evaluating the 205conditional again. The C<continue> block, if any, is I<not> executed. 206This command is normally used by programs that want to lie to themselves 207about what was just input. 208 209For example, when processing a file like F</etc/termcap>. 210If your input lines might end in backslashes to indicate continuation, you 211want to skip ahead and get the next record. 212 213 while (<>) { 214 chomp; 215 if (s/\\$//) { 216 $_ .= <>; 217 redo unless eof(); 218 } 219 # now process $_ 220 } 221 222which is Perl short-hand for the more explicitly written version: 223 224 LINE: while (defined($line = <ARGV>)) { 225 chomp($line); 226 if ($line =~ s/\\$//) { 227 $line .= <ARGV>; 228 redo LINE unless eof(); # not eof(ARGV)! 229 } 230 # now process $line 231 } 232 233Note that if there were a C<continue> block on the above code, it would get 234executed even on discarded lines. This is often used to reset line counters 235or C<?pat?> one-time matches. 236 237 # inspired by :1,$g/fred/s//WILMA/ 238 while (<>) { 239 ?(fred)? && s//WILMA $1 WILMA/; 240 ?(barney)? && s//BETTY $1 BETTY/; 241 ?(homer)? && s//MARGE $1 MARGE/; 242 } continue { 243 print "$ARGV $.: $_"; 244 close ARGV if eof(); # reset $. 245 reset if eof(); # reset ?pat? 246 } 247 248If the word C<while> is replaced by the word C<until>, the sense of the 249test is reversed, but the conditional is still tested before the first 250iteration. 251 252The loop control statements don't work in an C<if> or C<unless>, since 253they aren't loops. You can double the braces to make them such, though. 254 255 if (/pattern/) {{ 256 next if /fred/; 257 next if /barney/; 258 # so something here 259 }} 260 261The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer 262available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>. 263 264=head2 For Loops 265 266Perl's C-style C<for> loop works like the corresponding C<while> loop; 267that means that this: 268 269 for ($i = 1; $i < 10; $i++) { 270 ... 271 } 272 273is the same as this: 274 275 $i = 1; 276 while ($i < 10) { 277 ... 278 } continue { 279 $i++; 280 } 281 282There is one minor difference: if variables are declared with C<my> 283in the initialization section of the C<for>, the lexical scope of 284those variables is exactly the C<for> loop (the body of the loop 285and the control sections). 286 287Besides the normal array index looping, C<for> can lend itself 288to many other interesting applications. Here's one that avoids the 289problem you get into if you explicitly test for end-of-file on 290an interactive file descriptor causing your program to appear to 291hang. 292 293 $on_a_tty = -t STDIN && -t STDOUT; 294 sub prompt { print "yes? " if $on_a_tty } 295 for ( prompt(); <STDIN>; prompt() ) { 296 # do something 297 } 298 299=head2 Foreach Loops 300 301The C<foreach> loop iterates over a normal list value and sets the 302variable VAR to be each element of the list in turn. If the variable 303is preceded with the keyword C<my>, then it is lexically scoped, and 304is therefore visible only within the loop. Otherwise, the variable is 305implicitly local to the loop and regains its former value upon exiting 306the loop. If the variable was previously declared with C<my>, it uses 307that variable instead of the global one, but it's still localized to 308the loop. 309 310The C<foreach> keyword is actually a synonym for the C<for> keyword, so 311you can use C<foreach> for readability or C<for> for brevity. (Or because 312the Bourne shell is more familiar to you than I<csh>, so writing C<for> 313comes more naturally.) If VAR is omitted, C<$_> is set to each value. 314 315If any element of LIST is an lvalue, you can modify it by modifying 316VAR inside the loop. Conversely, if any element of LIST is NOT an 317lvalue, any attempt to modify that element will fail. In other words, 318the C<foreach> loop index variable is an implicit alias for each item 319in the list that you're looping over. 320 321If any part of LIST is an array, C<foreach> will get very confused if 322you add or remove elements within the loop body, for example with 323C<splice>. So don't do that. 324 325C<foreach> probably won't do what you expect if VAR is a tied or other 326special variable. Don't do that either. 327 328Examples: 329 330 for (@ary) { s/foo/bar/ } 331 332 for my $elem (@elements) { 333 $elem *= 2; 334 } 335 336 for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') { 337 print $count, "\n"; sleep(1); 338 } 339 340 for (1..15) { print "Merry Christmas\n"; } 341 342 foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) { 343 print "Item: $item\n"; 344 } 345 346Here's how a C programmer might code up a particular algorithm in Perl: 347 348 for (my $i = 0; $i < @ary1; $i++) { 349 for (my $j = 0; $j < @ary2; $j++) { 350 if ($ary1[$i] > $ary2[$j]) { 351 last; # can't go to outer :-( 352 } 353 $ary1[$i] += $ary2[$j]; 354 } 355 # this is where that last takes me 356 } 357 358Whereas here's how a Perl programmer more comfortable with the idiom might 359do it: 360 361 OUTER: for my $wid (@ary1) { 362 INNER: for my $jet (@ary2) { 363 next OUTER if $wid > $jet; 364 $wid += $jet; 365 } 366 } 367 368See how much easier this is? It's cleaner, safer, and faster. It's 369cleaner because it's less noisy. It's safer because if code gets added 370between the inner and outer loops later on, the new code won't be 371accidentally executed. The C<next> explicitly iterates the other loop 372rather than merely terminating the inner one. And it's faster because 373Perl executes a C<foreach> statement more rapidly than it would the 374equivalent C<for> loop. 375 376=head2 Basic BLOCKs and Switch Statements 377 378A BLOCK by itself (labeled or not) is semantically equivalent to a 379loop that executes once. Thus you can use any of the loop control 380statements in it to leave or restart the block. (Note that this is 381I<NOT> true in C<eval{}>, C<sub{}>, or contrary to popular belief 382C<do{}> blocks, which do I<NOT> count as loops.) The C<continue> 383block is optional. 384 385The BLOCK construct is particularly nice for doing case 386structures. 387 388 SWITCH: { 389 if (/^abc/) { $abc = 1; last SWITCH; } 390 if (/^def/) { $def = 1; last SWITCH; } 391 if (/^xyz/) { $xyz = 1; last SWITCH; } 392 $nothing = 1; 393 } 394 395There is no official C<switch> statement in Perl, because there are 396already several ways to write the equivalent. In addition to the 397above, you could write 398 399 SWITCH: { 400 $abc = 1, last SWITCH if /^abc/; 401 $def = 1, last SWITCH if /^def/; 402 $xyz = 1, last SWITCH if /^xyz/; 403 $nothing = 1; 404 } 405 406(That's actually not as strange as it looks once you realize that you can 407use loop control "operators" within an expression, That's just the normal 408C comma operator.) 409 410or 411 412 SWITCH: { 413 /^abc/ && do { $abc = 1; last SWITCH; }; 414 /^def/ && do { $def = 1; last SWITCH; }; 415 /^xyz/ && do { $xyz = 1; last SWITCH; }; 416 $nothing = 1; 417 } 418 419or formatted so it stands out more as a "proper" C<switch> statement: 420 421 SWITCH: { 422 /^abc/ && do { 423 $abc = 1; 424 last SWITCH; 425 }; 426 427 /^def/ && do { 428 $def = 1; 429 last SWITCH; 430 }; 431 432 /^xyz/ && do { 433 $xyz = 1; 434 last SWITCH; 435 }; 436 $nothing = 1; 437 } 438 439or 440 441 SWITCH: { 442 /^abc/ and $abc = 1, last SWITCH; 443 /^def/ and $def = 1, last SWITCH; 444 /^xyz/ and $xyz = 1, last SWITCH; 445 $nothing = 1; 446 } 447 448or even, horrors, 449 450 if (/^abc/) 451 { $abc = 1 } 452 elsif (/^def/) 453 { $def = 1 } 454 elsif (/^xyz/) 455 { $xyz = 1 } 456 else 457 { $nothing = 1 } 458 459A common idiom for a C<switch> statement is to use C<foreach>'s aliasing to make 460a temporary assignment to C<$_> for convenient matching: 461 462 SWITCH: for ($where) { 463 /In Card Names/ && do { push @flags, '-e'; last; }; 464 /Anywhere/ && do { push @flags, '-h'; last; }; 465 /In Rulings/ && do { last; }; 466 die "unknown value for form variable where: `$where'"; 467 } 468 469Another interesting approach to a switch statement is arrange 470for a C<do> block to return the proper value: 471 472 $amode = do { 473 if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0? 474 elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" } 475 elsif ($flag & O_RDWR) { 476 if ($flag & O_CREAT) { "w+" } 477 else { ($flag & O_APPEND) ? "a+" : "r+" } 478 } 479 }; 480 481Or 482 483 print do { 484 ($flags & O_WRONLY) ? "write-only" : 485 ($flags & O_RDWR) ? "read-write" : 486 "read-only"; 487 }; 488 489Or if you are certainly that all the C<&&> clauses are true, you can use 490something like this, which "switches" on the value of the 491C<HTTP_USER_AGENT> environment variable. 492 493 #!/usr/bin/perl 494 # pick out jargon file page based on browser 495 $dir = 'http://www.wins.uva.nl/~mes/jargon'; 496 for ($ENV{HTTP_USER_AGENT}) { 497 $page = /Mac/ && 'm/Macintrash.html' 498 || /Win(dows )?NT/ && 'e/evilandrude.html' 499 || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html' 500 || /Linux/ && 'l/Linux.html' 501 || /HP-UX/ && 'h/HP-SUX.html' 502 || /SunOS/ && 's/ScumOS.html' 503 || 'a/AppendixB.html'; 504 } 505 print "Location: $dir/$page\015\012\015\012"; 506 507That kind of switch statement only works when you know the C<&&> clauses 508will be true. If you don't, the previous C<?:> example should be used. 509 510You might also consider writing a hash of subroutine references 511instead of synthesizing a C<switch> statement. 512 513=head2 Goto 514 515Although not for the faint of heart, Perl does support a C<goto> 516statement. There are three forms: C<goto>-LABEL, C<goto>-EXPR, and 517C<goto>-&NAME. A loop's LABEL is not actually a valid target for 518a C<goto>; it's just the name of the loop. 519 520The C<goto>-LABEL form finds the statement labeled with LABEL and resumes 521execution there. It may not be used to go into any construct that 522requires initialization, such as a subroutine or a C<foreach> loop. It 523also can't be used to go into a construct that is optimized away. It 524can be used to go almost anywhere else within the dynamic scope, 525including out of subroutines, but it's usually better to use some other 526construct such as C<last> or C<die>. The author of Perl has never felt the 527need to use this form of C<goto> (in Perl, that is--C is another matter). 528 529The C<goto>-EXPR form expects a label name, whose scope will be resolved 530dynamically. This allows for computed C<goto>s per FORTRAN, but isn't 531necessarily recommended if you're optimizing for maintainability: 532 533 goto(("FOO", "BAR", "GLARCH")[$i]); 534 535The C<goto>-&NAME form is highly magical, and substitutes a call to the 536named subroutine for the currently running subroutine. This is used by 537C<AUTOLOAD()> subroutines that wish to load another subroutine and then 538pretend that the other subroutine had been called in the first place 539(except that any modifications to C<@_> in the current subroutine are 540propagated to the other subroutine.) After the C<goto>, not even C<caller()> 541will be able to tell that this routine was called first. 542 543In almost all cases like this, it's usually a far, far better idea to use the 544structured control flow mechanisms of C<next>, C<last>, or C<redo> instead of 545resorting to a C<goto>. For certain applications, the catch and throw pair of 546C<eval{}> and die() for exception processing can also be a prudent approach. 547 548=head2 PODs: Embedded Documentation 549 550Perl has a mechanism for intermixing documentation with source code. 551While it's expecting the beginning of a new statement, if the compiler 552encounters a line that begins with an equal sign and a word, like this 553 554 =head1 Here There Be Pods! 555 556Then that text and all remaining text up through and including a line 557beginning with C<=cut> will be ignored. The format of the intervening 558text is described in L<perlpod>. 559 560This allows you to intermix your source code 561and your documentation text freely, as in 562 563 =item snazzle($) 564 565 The snazzle() function will behave in the most spectacular 566 form that you can possibly imagine, not even excepting 567 cybernetic pyrotechnics. 568 569 =cut back to the compiler, nuff of this pod stuff! 570 571 sub snazzle($) { 572 my $thingie = shift; 573 ......... 574 } 575 576Note that pod translators should look at only paragraphs beginning 577with a pod directive (it makes parsing easier), whereas the compiler 578actually knows to look for pod escapes even in the middle of a 579paragraph. This means that the following secret stuff will be 580ignored by both the compiler and the translators. 581 582 $a=3; 583 =secret stuff 584 warn "Neither POD nor CODE!?" 585 =cut back 586 print "got $a\n"; 587 588You probably shouldn't rely upon the C<warn()> being podded out forever. 589Not all pod translators are well-behaved in this regard, and perhaps 590the compiler will become pickier. 591 592One may also use pod directives to quickly comment out a section 593of code. 594 595=head2 Plain Old Comments (Not!) 596 597Much like the C preprocessor, Perl can process line directives. Using 598this, one can control Perl's idea of filenames and line numbers in 599error or warning messages (especially for strings that are processed 600with C<eval()>). The syntax for this mechanism is the same as for most 601C preprocessors: it matches the regular expression 602C</^#\s*line\s+(\d+)\s*(?:\s"([^"]+)")?\s*$/> with C<$1> being the line 603number for the next line, and C<$2> being the optional filename 604(specified within quotes). 605 606There is a fairly obvious gotcha included with the line directive: 607Debuggers and profilers will only show the last source line to appear 608at a particular line number in a given file. Care should be taken not 609to cause line number collisions in code you'd like to debug later. 610 611Here are some examples that you should be able to type into your command 612shell: 613 614 % perl 615 # line 200 "bzzzt" 616 # the `#' on the previous line must be the first char on line 617 die 'foo'; 618 __END__ 619 foo at bzzzt line 201. 620 621 % perl 622 # line 200 "bzzzt" 623 eval qq[\n#line 2001 ""\ndie 'foo']; print $@; 624 __END__ 625 foo at - line 2001. 626 627 % perl 628 eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@; 629 __END__ 630 foo at foo bar line 200. 631 632 % perl 633 # line 345 "goop" 634 eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'"; 635 print $@; 636 __END__ 637 foo at goop line 345. 638 639=cut 640