1=head1 NAME 2 3perlop - Perl operators and precedence 4 5=head1 DESCRIPTION 6 7=head2 Operator Precedence and Associativity 8 9Operator precedence and associativity work in Perl more or less like 10they do in mathematics. 11 12I<Operator precedence> means some operators are evaluated before 13others. For example, in C<2 + 4 * 5>, the multiplication has higher 14precedence so C<4 * 5> is evaluated first yielding C<2 + 20 == 1522> and not C<6 * 5 == 30>. 16 17I<Operator associativity> defines what happens if a sequence of the 18same operators is used one after another: whether the evaluator will 19evaluate the left operations first or the right. For example, in C<8 20- 4 - 2>, subtraction is left associative so Perl evaluates the 21expression left to right. C<8 - 4> is evaluated first making the 22expression C<4 - 2 == 2> and not C<8 - 2 == 6>. 23 24Perl operators have the following associativity and precedence, 25listed from highest precedence to lowest. Operators borrowed from 26C keep the same precedence relationship with each other, even where 27C's precedence is slightly screwy. (This makes learning Perl easier 28for C folks.) With very few exceptions, these all operate on scalar 29values only, not array values. 30 31 left terms and list operators (leftward) 32 left -> 33 nonassoc ++ -- 34 right ** 35 right ! ~ \ and unary + and - 36 left =~ !~ 37 left * / % x 38 left + - . 39 left << >> 40 nonassoc named unary operators 41 nonassoc < > <= >= lt gt le ge 42 nonassoc == != <=> eq ne cmp 43 left & 44 left | ^ 45 left && 46 left || 47 nonassoc .. ... 48 right ?: 49 right = += -= *= etc. 50 left , => 51 nonassoc list operators (rightward) 52 right not 53 left and 54 left or xor 55 56In the following sections, these operators are covered in precedence order. 57 58Many operators can be overloaded for objects. See L<overload>. 59 60=head2 Terms and List Operators (Leftward) 61 62A TERM has the highest precedence in Perl. They include variables, 63quote and quote-like operators, any expression in parentheses, 64and any function whose arguments are parenthesized. Actually, there 65aren't really functions in this sense, just list operators and unary 66operators behaving as functions because you put parentheses around 67the arguments. These are all documented in L<perlfunc>. 68 69If any list operator (print(), etc.) or any unary operator (chdir(), etc.) 70is followed by a left parenthesis as the next token, the operator and 71arguments within parentheses are taken to be of highest precedence, 72just like a normal function call. 73 74In the absence of parentheses, the precedence of list operators such as 75C<print>, C<sort>, or C<chmod> is either very high or very low depending on 76whether you are looking at the left side or the right side of the operator. 77For example, in 78 79 @ary = (1, 3, sort 4, 2); 80 print @ary; # prints 1324 81 82the commas on the right of the sort are evaluated before the sort, 83but the commas on the left are evaluated after. In other words, 84list operators tend to gobble up all arguments that follow, and 85then act like a simple TERM with regard to the preceding expression. 86Be careful with parentheses: 87 88 # These evaluate exit before doing the print: 89 print($foo, exit); # Obviously not what you want. 90 print $foo, exit; # Nor is this. 91 92 # These do the print before evaluating exit: 93 (print $foo), exit; # This is what you want. 94 print($foo), exit; # Or this. 95 print ($foo), exit; # Or even this. 96 97Also note that 98 99 print ($foo & 255) + 1, "\n"; 100 101probably doesn't do what you expect at first glance. The parentheses 102enclose the argument list for C<print> which is evaluated (printing 103the result of C<$foo & 255>). Then one is added to the return value 104of C<print> (usually 1). The result is something like this: 105 106 1 + 1, "\n"; # Obviously not what you meant. 107 108To do what you meant properly, you must write: 109 110 print(($foo & 255) + 1, "\n"); 111 112See L<Named Unary Operators> for more discussion of this. 113 114Also parsed as terms are the C<do {}> and C<eval {}> constructs, as 115well as subroutine and method calls, and the anonymous 116constructors C<[]> and C<{}>. 117 118See also L<Quote and Quote-like Operators> toward the end of this section, 119as well as L<"I/O Operators">. 120 121=head2 The Arrow Operator 122 123"C<< -> >>" is an infix dereference operator, just as it is in C 124and C++. If the right side is either a C<[...]>, C<{...}>, or a 125C<(...)> subscript, then the left side must be either a hard or 126symbolic reference to an array, a hash, or a subroutine respectively. 127(Or technically speaking, a location capable of holding a hard 128reference, if it's an array or hash reference being used for 129assignment.) See L<perlreftut> and L<perlref>. 130 131Otherwise, the right side is a method name or a simple scalar 132variable containing either the method name or a subroutine reference, 133and the left side must be either an object (a blessed reference) 134or a class name (that is, a package name). See L<perlobj>. 135 136=head2 Auto-increment and Auto-decrement 137 138"++" and "--" work as in C. That is, if placed before a variable, 139they increment or decrement the variable by one before returning the 140value, and if placed after, increment or decrement after returning the 141value. 142 143 $i = 0; $j = 0; 144 print $i++; # prints 0 145 print ++$j; # prints 1 146 147The auto-increment operator has a little extra builtin magic to it. If 148you increment a variable that is numeric, or that has ever been used in 149a numeric context, you get a normal increment. If, however, the 150variable has been used in only string contexts since it was set, and 151has a value that is not the empty string and matches the pattern 152C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each 153character within its range, with carry: 154 155 print ++($foo = '99'); # prints '100' 156 print ++($foo = 'a0'); # prints 'a1' 157 print ++($foo = 'Az'); # prints 'Ba' 158 print ++($foo = 'zz'); # prints 'aaa' 159 160C<undef> is always treated as numeric, and in particular is changed 161to C<0> before incrementing (so that a post-increment of an undef value 162will return C<0> rather than C<undef>). 163 164The auto-decrement operator is not magical. 165 166=head2 Exponentiation 167 168Binary "**" is the exponentiation operator. It binds even more 169tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is 170implemented using C's pow(3) function, which actually works on doubles 171internally.) 172 173=head2 Symbolic Unary Operators 174 175Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower 176precedence version of this. 177 178Unary "-" performs arithmetic negation if the operand is numeric. If 179the operand is an identifier, a string consisting of a minus sign 180concatenated with the identifier is returned. Otherwise, if the string 181starts with a plus or minus, a string starting with the opposite sign 182is returned. One effect of these rules is that C<-bareword> is equivalent 183to C<"-bareword">. 184 185Unary "~" performs bitwise negation, i.e., 1's complement. For 186example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and 187L<Bitwise String Operators>.) Note that the width of the result is 188platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64 189bits wide on a 64-bit platform, so if you are expecting a certain bit 190width, remember to use the & operator to mask off the excess bits. 191 192Unary "+" has no effect whatsoever, even on strings. It is useful 193syntactically for separating a function name from a parenthesized expression 194that would otherwise be interpreted as the complete list of function 195arguments. (See examples above under L<Terms and List Operators (Leftward)>.) 196 197Unary "\" creates a reference to whatever follows it. See L<perlreftut> 198and L<perlref>. Do not confuse this behavior with the behavior of 199backslash within a string, although both forms do convey the notion 200of protecting the next thing from interpolation. 201 202=head2 Binding Operators 203 204Binary "=~" binds a scalar expression to a pattern match. Certain operations 205search or modify the string $_ by default. This operator makes that kind 206of operation work on some other string. The right argument is a search 207pattern, substitution, or transliteration. The left argument is what is 208supposed to be searched, substituted, or transliterated instead of the default 209$_. When used in scalar context, the return value generally indicates the 210success of the operation. Behavior in list context depends on the particular 211operator. See L</"Regexp Quote-Like Operators"> for details. 212 213If the right argument is an expression rather than a search pattern, 214substitution, or transliteration, it is interpreted as a search pattern at run 215time. 216 217Binary "!~" is just like "=~" except the return value is negated in 218the logical sense. 219 220=head2 Multiplicative Operators 221 222Binary "*" multiplies two numbers. 223 224Binary "/" divides two numbers. 225 226Binary "%" computes the modulus of two numbers. Given integer 227operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is 228C<$a> minus the largest multiple of C<$b> that is not greater than 229C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the 230smallest multiple of C<$b> that is not less than C<$a> (i.e. the 231result will be less than or equal to zero). 232Note that when C<use integer> is in scope, "%" gives you direct access 233to the modulus operator as implemented by your C compiler. This 234operator is not as well defined for negative operands, but it will 235execute faster. 236 237Binary "x" is the repetition operator. In scalar context or if the left 238operand is not enclosed in parentheses, it returns a string consisting 239of the left operand repeated the number of times specified by the right 240operand. In list context, if the left operand is enclosed in 241parentheses, it repeats the list. If the right operand is zero or 242negative, it returns an empty string or an empty list, depending on the 243context. 244 245 print '-' x 80; # print row of dashes 246 247 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over 248 249 @ones = (1) x 80; # a list of 80 1's 250 @ones = (5) x @ones; # set all elements to 5 251 252 253=head2 Additive Operators 254 255Binary "+" returns the sum of two numbers. 256 257Binary "-" returns the difference of two numbers. 258 259Binary "." concatenates two strings. 260 261=head2 Shift Operators 262 263Binary "<<" returns the value of its left argument shifted left by the 264number of bits specified by the right argument. Arguments should be 265integers. (See also L<Integer Arithmetic>.) 266 267Binary ">>" returns the value of its left argument shifted right by 268the number of bits specified by the right argument. Arguments should 269be integers. (See also L<Integer Arithmetic>.) 270 271Note that both "<<" and ">>" in Perl are implemented directly using 272"<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is 273in force then signed C integers are used, else unsigned C integers are 274used. Either way, the implementation isn't going to generate results 275larger than the size of the integer type Perl was built with (32 bits 276or 64 bits). 277 278The result of overflowing the range of the integers is undefined 279because it is undefined also in C. In other words, using 32-bit 280integers, C<< 1 << 32 >> is undefined. Shifting by a negative number 281of bits is also undefined. 282 283=head2 Named Unary Operators 284 285The various named unary operators are treated as functions with one 286argument, with optional parentheses. 287 288If any list operator (print(), etc.) or any unary operator (chdir(), etc.) 289is followed by a left parenthesis as the next token, the operator and 290arguments within parentheses are taken to be of highest precedence, 291just like a normal function call. For example, 292because named unary operators are higher precedence than ||: 293 294 chdir $foo || die; # (chdir $foo) || die 295 chdir($foo) || die; # (chdir $foo) || die 296 chdir ($foo) || die; # (chdir $foo) || die 297 chdir +($foo) || die; # (chdir $foo) || die 298 299but, because * is higher precedence than named operators: 300 301 chdir $foo * 20; # chdir ($foo * 20) 302 chdir($foo) * 20; # (chdir $foo) * 20 303 chdir ($foo) * 20; # (chdir $foo) * 20 304 chdir +($foo) * 20; # chdir ($foo * 20) 305 306 rand 10 * 20; # rand (10 * 20) 307 rand(10) * 20; # (rand 10) * 20 308 rand (10) * 20; # (rand 10) * 20 309 rand +(10) * 20; # rand (10 * 20) 310 311Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are 312treated like named unary operators, but they don't follow this functional 313parenthesis rule. That means, for example, that C<-f($file).".bak"> is 314equivalent to C<-f "$file.bak">. 315 316See also L<"Terms and List Operators (Leftward)">. 317 318=head2 Relational Operators 319 320Binary "<" returns true if the left argument is numerically less than 321the right argument. 322 323Binary ">" returns true if the left argument is numerically greater 324than the right argument. 325 326Binary "<=" returns true if the left argument is numerically less than 327or equal to the right argument. 328 329Binary ">=" returns true if the left argument is numerically greater 330than or equal to the right argument. 331 332Binary "lt" returns true if the left argument is stringwise less than 333the right argument. 334 335Binary "gt" returns true if the left argument is stringwise greater 336than the right argument. 337 338Binary "le" returns true if the left argument is stringwise less than 339or equal to the right argument. 340 341Binary "ge" returns true if the left argument is stringwise greater 342than or equal to the right argument. 343 344=head2 Equality Operators 345 346Binary "==" returns true if the left argument is numerically equal to 347the right argument. 348 349Binary "!=" returns true if the left argument is numerically not equal 350to the right argument. 351 352Binary "<=>" returns -1, 0, or 1 depending on whether the left 353argument is numerically less than, equal to, or greater than the right 354argument. If your platform supports NaNs (not-a-numbers) as numeric 355values, using them with "<=>" returns undef. NaN is not "<", "==", ">", 356"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN 357returns true, as does NaN != anything else. If your platform doesn't 358support NaNs then NaN is just a string with numeric value 0. 359 360 perl -le '$a = NaN; print "No NaN support here" if $a == $a' 361 perl -le '$a = NaN; print "NaN support here" if $a != $a' 362 363Binary "eq" returns true if the left argument is stringwise equal to 364the right argument. 365 366Binary "ne" returns true if the left argument is stringwise not equal 367to the right argument. 368 369Binary "cmp" returns -1, 0, or 1 depending on whether the left 370argument is stringwise less than, equal to, or greater than the right 371argument. 372 373"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified 374by the current locale if C<use locale> is in effect. See L<perllocale>. 375 376=head2 Bitwise And 377 378Binary "&" returns its operands ANDed together bit by bit. 379(See also L<Integer Arithmetic> and L<Bitwise String Operators>.) 380 381Note that "&" has lower priority than relational operators, so for example 382the brackets are essential in a test like 383 384 print "Even\n" if ($x & 1) == 0; 385 386=head2 Bitwise Or and Exclusive Or 387 388Binary "|" returns its operands ORed together bit by bit. 389(See also L<Integer Arithmetic> and L<Bitwise String Operators>.) 390 391Binary "^" returns its operands XORed together bit by bit. 392(See also L<Integer Arithmetic> and L<Bitwise String Operators>.) 393 394Note that "|" and "^" have lower priority than relational operators, so 395for example the brackets are essential in a test like 396 397 print "false\n" if (8 | 2) != 10; 398 399=head2 C-style Logical And 400 401Binary "&&" performs a short-circuit logical AND operation. That is, 402if the left operand is false, the right operand is not even evaluated. 403Scalar or list context propagates down to the right operand if it 404is evaluated. 405 406=head2 C-style Logical Or 407 408Binary "||" performs a short-circuit logical OR operation. That is, 409if the left operand is true, the right operand is not even evaluated. 410Scalar or list context propagates down to the right operand if it 411is evaluated. 412 413The C<||> and C<&&> operators return the last value evaluated 414(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably 415portable way to find out the home directory might be: 416 417 $home = $ENV{'HOME'} || $ENV{'LOGDIR'} || 418 (getpwuid($<))[7] || die "You're homeless!\n"; 419 420In particular, this means that you shouldn't use this 421for selecting between two aggregates for assignment: 422 423 @a = @b || @c; # this is wrong 424 @a = scalar(@b) || @c; # really meant this 425 @a = @b ? @b : @c; # this works fine, though 426 427As more readable alternatives to C<&&> and C<||> when used for 428control flow, Perl provides C<and> and C<or> operators (see below). 429The short-circuit behavior is identical. The precedence of "and" and 430"or" is much lower, however, so that you can safely use them after a 431list operator without the need for parentheses: 432 433 unlink "alpha", "beta", "gamma" 434 or gripe(), next LINE; 435 436With the C-style operators that would have been written like this: 437 438 unlink("alpha", "beta", "gamma") 439 || (gripe(), next LINE); 440 441Using "or" for assignment is unlikely to do what you want; see below. 442 443=head2 Range Operators 444 445Binary ".." is the range operator, which is really two different 446operators depending on the context. In list context, it returns a 447list of values counting (up by ones) from the left value to the right 448value. If the left value is greater than the right value then it 449returns the empty list. The range operator is useful for writing 450C<foreach (1..10)> loops and for doing slice operations on arrays. In 451the current implementation, no temporary array is created when the 452range operator is used as the expression in C<foreach> loops, but older 453versions of Perl might burn a lot of memory when you write something 454like this: 455 456 for (1 .. 1_000_000) { 457 # code 458 } 459 460The range operator also works on strings, using the magical auto-increment, 461see below. 462 463In scalar context, ".." returns a boolean value. The operator is 464bistable, like a flip-flop, and emulates the line-range (comma) operator 465of B<sed>, B<awk>, and various editors. Each ".." operator maintains its 466own boolean state. It is false as long as its left operand is false. 467Once the left operand is true, the range operator stays true until the 468right operand is true, I<AFTER> which the range operator becomes false 469again. It doesn't become false till the next time the range operator is 470evaluated. It can test the right operand and become false on the same 471evaluation it became true (as in B<awk>), but it still returns true once. 472If you don't want it to test the right operand till the next 473evaluation, as in B<sed>, just use three dots ("...") instead of 474two. In all other regards, "..." behaves just like ".." does. 475 476The right operand is not evaluated while the operator is in the 477"false" state, and the left operand is not evaluated while the 478operator is in the "true" state. The precedence is a little lower 479than || and &&. The value returned is either the empty string for 480false, or a sequence number (beginning with 1) for true. The 481sequence number is reset for each range encountered. The final 482sequence number in a range has the string "E0" appended to it, which 483doesn't affect its numeric value, but gives you something to search 484for if you want to exclude the endpoint. You can exclude the 485beginning point by waiting for the sequence number to be greater 486than 1. 487 488If either operand of scalar ".." is a constant expression, 489that operand is considered true if it is equal (C<==>) to the current 490input line number (the C<$.> variable). 491 492To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>, 493but that is only an issue if you use a floating point expression; when 494implicitly using C<$.> as described in the previous paragraph, the 495comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.> 496is set to a floating point value and you are not reading from a file. 497Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what 498you want in scalar context because each of the operands are evaluated 499using their integer representation. 500 501Examples: 502 503As a scalar operator: 504 505 if (101 .. 200) { print; } # print 2nd hundred lines, short for 506 # if ($. == 101 .. $. == 200) ... 507 next line if (1 .. /^$/); # skip header lines, short for 508 # ... if ($. == 1 .. /^$/); 509 s/^/> / if (/^$/ .. eof()); # quote body 510 511 # parse mail messages 512 while (<>) { 513 $in_header = 1 .. /^$/; 514 $in_body = /^$/ .. eof; 515 if ($in_header) { 516 # ... 517 } else { # in body 518 # ... 519 } 520 } continue { 521 close ARGV if eof; # reset $. each file 522 } 523 524As a list operator: 525 526 for (101 .. 200) { print; } # print $_ 100 times 527 @foo = @foo[0 .. $#foo]; # an expensive no-op 528 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items 529 530The range operator (in list context) makes use of the magical 531auto-increment algorithm if the operands are strings. You 532can say 533 534 @alphabet = ('A' .. 'Z'); 535 536to get all normal letters of the English alphabet, or 537 538 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; 539 540to get a hexadecimal digit, or 541 542 @z2 = ('01' .. '31'); print $z2[$mday]; 543 544to get dates with leading zeros. If the final value specified is not 545in the sequence that the magical increment would produce, the sequence 546goes until the next value would be longer than the final value 547specified. 548 549Because each operand is evaluated in integer form, C<2.18 .. 3.14> will 550return two elements in list context. 551 552 @list = (2.18 .. 3.14); # same as @list = (2 .. 3); 553 554=head2 Conditional Operator 555 556Ternary "?:" is the conditional operator, just as in C. It works much 557like an if-then-else. If the argument before the ? is true, the 558argument before the : is returned, otherwise the argument after the : 559is returned. For example: 560 561 printf "I have %d dog%s.\n", $n, 562 ($n == 1) ? '' : "s"; 563 564Scalar or list context propagates downward into the 2nd 565or 3rd argument, whichever is selected. 566 567 $a = $ok ? $b : $c; # get a scalar 568 @a = $ok ? @b : @c; # get an array 569 $a = $ok ? @b : @c; # oops, that's just a count! 570 571The operator may be assigned to if both the 2nd and 3rd arguments are 572legal lvalues (meaning that you can assign to them): 573 574 ($a_or_b ? $a : $b) = $c; 575 576Because this operator produces an assignable result, using assignments 577without parentheses will get you in trouble. For example, this: 578 579 $a % 2 ? $a += 10 : $a += 2 580 581Really means this: 582 583 (($a % 2) ? ($a += 10) : $a) += 2 584 585Rather than this: 586 587 ($a % 2) ? ($a += 10) : ($a += 2) 588 589That should probably be written more simply as: 590 591 $a += ($a % 2) ? 10 : 2; 592 593=head2 Assignment Operators 594 595"=" is the ordinary assignment operator. 596 597Assignment operators work as in C. That is, 598 599 $a += 2; 600 601is equivalent to 602 603 $a = $a + 2; 604 605although without duplicating any side effects that dereferencing the lvalue 606might trigger, such as from tie(). Other assignment operators work similarly. 607The following are recognized: 608 609 **= += *= &= <<= &&= 610 -= /= |= >>= ||= 611 .= %= ^= 612 x= 613 614Although these are grouped by family, they all have the precedence 615of assignment. 616 617Unlike in C, the scalar assignment operator produces a valid lvalue. 618Modifying an assignment is equivalent to doing the assignment and 619then modifying the variable that was assigned to. This is useful 620for modifying a copy of something, like this: 621 622 ($tmp = $global) =~ tr [A-Z] [a-z]; 623 624Likewise, 625 626 ($a += 2) *= 3; 627 628is equivalent to 629 630 $a += 2; 631 $a *= 3; 632 633Similarly, a list assignment in list context produces the list of 634lvalues assigned to, and a list assignment in scalar context returns 635the number of elements produced by the expression on the right hand 636side of the assignment. 637 638=head2 Comma Operator 639 640Binary "," is the comma operator. In scalar context it evaluates 641its left argument, throws that value away, then evaluates its right 642argument and returns that value. This is just like C's comma operator. 643 644In list context, it's just the list argument separator, and inserts 645both its arguments into the list. 646 647The C<< => >> operator is a synonym for the comma, but forces any word 648to its left to be interpreted as a string (as of 5.001). It is helpful 649in documenting the correspondence between keys and values in hashes, 650and other paired elements in lists. 651 652=head2 List Operators (Rightward) 653 654On the right side of a list operator, it has very low precedence, 655such that it controls all comma-separated expressions found there. 656The only operators with lower precedence are the logical operators 657"and", "or", and "not", which may be used to evaluate calls to list 658operators without the need for extra parentheses: 659 660 open HANDLE, "filename" 661 or die "Can't open: $!\n"; 662 663See also discussion of list operators in L<Terms and List Operators (Leftward)>. 664 665=head2 Logical Not 666 667Unary "not" returns the logical negation of the expression to its right. 668It's the equivalent of "!" except for the very low precedence. 669 670=head2 Logical And 671 672Binary "and" returns the logical conjunction of the two surrounding 673expressions. It's equivalent to && except for the very low 674precedence. This means that it short-circuits: i.e., the right 675expression is evaluated only if the left expression is true. 676 677=head2 Logical or and Exclusive Or 678 679Binary "or" returns the logical disjunction of the two surrounding 680expressions. It's equivalent to || except for the very low precedence. 681This makes it useful for control flow 682 683 print FH $data or die "Can't write to FH: $!"; 684 685This means that it short-circuits: i.e., the right expression is evaluated 686only if the left expression is false. Due to its precedence, you should 687probably avoid using this for assignment, only for control flow. 688 689 $a = $b or $c; # bug: this is wrong 690 ($a = $b) or $c; # really means this 691 $a = $b || $c; # better written this way 692 693However, when it's a list-context assignment and you're trying to use 694"||" for control flow, you probably need "or" so that the assignment 695takes higher precedence. 696 697 @info = stat($file) || die; # oops, scalar sense of stat! 698 @info = stat($file) or die; # better, now @info gets its due 699 700Then again, you could always use parentheses. 701 702Binary "xor" returns the exclusive-OR of the two surrounding expressions. 703It cannot short circuit, of course. 704 705=head2 C Operators Missing From Perl 706 707Here is what C has that Perl doesn't: 708 709=over 8 710 711=item unary & 712 713Address-of operator. (But see the "\" operator for taking a reference.) 714 715=item unary * 716 717Dereference-address operator. (Perl's prefix dereferencing 718operators are typed: $, @, %, and &.) 719 720=item (TYPE) 721 722Type-casting operator. 723 724=back 725 726=head2 Quote and Quote-like Operators 727 728While we usually think of quotes as literal values, in Perl they 729function as operators, providing various kinds of interpolating and 730pattern matching capabilities. Perl provides customary quote characters 731for these behaviors, but also provides a way for you to choose your 732quote character for any of them. In the following table, a C<{}> represents 733any pair of delimiters you choose. 734 735 Customary Generic Meaning Interpolates 736 '' q{} Literal no 737 "" qq{} Literal yes 738 `` qx{} Command yes* 739 qw{} Word list no 740 // m{} Pattern match yes* 741 qr{} Pattern yes* 742 s{}{} Substitution yes* 743 tr{}{} Transliteration no (but see below) 744 <<EOF here-doc yes* 745 746 * unless the delimiter is ''. 747 748Non-bracketing delimiters use the same character fore and aft, but the four 749sorts of brackets (round, angle, square, curly) will all nest, which means 750that 751 752 q{foo{bar}baz} 753 754is the same as 755 756 'foo{bar}baz' 757 758Note, however, that this does not always work for quoting Perl code: 759 760 $s = q{ if($a eq "}") ... }; # WRONG 761 762is a syntax error. The C<Text::Balanced> module (from CPAN, and 763starting from Perl 5.8 part of the standard distribution) is able 764to do this properly. 765 766There can be whitespace between the operator and the quoting 767characters, except when C<#> is being used as the quoting character. 768C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the 769operator C<q> followed by a comment. Its argument will be taken 770from the next line. This allows you to write: 771 772 s {foo} # Replace foo 773 {bar} # with bar. 774 775The following escape sequences are available in constructs that interpolate 776and in transliterations. 777 778 \t tab (HT, TAB) 779 \n newline (NL) 780 \r return (CR) 781 \f form feed (FF) 782 \b backspace (BS) 783 \a alarm (bell) (BEL) 784 \e escape (ESC) 785 \033 octal char (ESC) 786 \x1b hex char (ESC) 787 \x{263a} wide hex char (SMILEY) 788 \c[ control char (ESC) 789 \N{name} named Unicode character 790 791B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for 792the vertical tab (VT - ASCII 11). 793 794The following escape sequences are available in constructs that interpolate 795but not in transliterations. 796 797 \l lowercase next char 798 \u uppercase next char 799 \L lowercase till \E 800 \U uppercase till \E 801 \E end case modification 802 \Q quote non-word characters till \E 803 804If C<use locale> is in effect, the case map used by C<\l>, C<\L>, 805C<\u> and C<\U> is taken from the current locale. See L<perllocale>. 806If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or 807beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and 808C<\U> is as defined by Unicode. For documentation of C<\N{name}>, 809see L<charnames>. 810 811All systems use the virtual C<"\n"> to represent a line terminator, 812called a "newline". There is no such thing as an unvarying, physical 813newline character. It is only an illusion that the operating system, 814device drivers, C libraries, and Perl all conspire to preserve. Not all 815systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example, 816on a Mac, these are reversed, and on systems without line terminator, 817printing C<"\n"> may emit no actual data. In general, use C<"\n"> when 818you mean a "newline" for your system, but use the literal ASCII when you 819need an exact character. For example, most networking protocols expect 820and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators, 821and although they often accept just C<"\012">, they seldom tolerate just 822C<"\015">. If you get in the habit of using C<"\n"> for networking, 823you may be burned some day. 824 825For constructs that do interpolate, variables beginning with "C<$>" 826or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or 827C<< $href->{key}[0] >> are also interpolated, as are array and hash slices. 828But method calls such as C<< $obj->meth >> are not. 829 830Interpolating an array or slice interpolates the elements in order, 831separated by the value of C<$">, so is equivalent to interpolating 832C<join $", @array>. "Punctuation" arrays such as C<@+> are only 833interpolated if the name is enclosed in braces C<@{+}>. 834 835You cannot include a literal C<$> or C<@> within a C<\Q> sequence. 836An unescaped C<$> or C<@> interpolates the corresponding variable, 837while escaping will cause the literal string C<\$> to be inserted. 838You'll need to write something like C<m/\Quser\E\@\Qhost/>. 839 840Patterns are subject to an additional level of interpretation as a 841regular expression. This is done as a second pass, after variables are 842interpolated, so that regular expressions may be incorporated into the 843pattern from the variables. If this is not what you want, use C<\Q> to 844interpolate a variable literally. 845 846Apart from the behavior described above, Perl does not expand 847multiple levels of interpolation. In particular, contrary to the 848expectations of shell programmers, back-quotes do I<NOT> interpolate 849within double quotes, nor do single quotes impede evaluation of 850variables when used within double quotes. 851 852=head2 Regexp Quote-Like Operators 853 854Here are the quote-like operators that apply to pattern 855matching and related activities. 856 857=over 8 858 859=item ?PATTERN? 860 861This is just like the C</pattern/> search, except that it matches only 862once between calls to the reset() operator. This is a useful 863optimization when you want to see only the first occurrence of 864something in each file of a set of files, for instance. Only C<??> 865patterns local to the current package are reset. 866 867 while (<>) { 868 if (?^$?) { 869 # blank line between header and body 870 } 871 } continue { 872 reset if eof; # clear ?? status for next file 873 } 874 875This usage is vaguely deprecated, which means it just might possibly 876be removed in some distant future version of Perl, perhaps somewhere 877around the year 2168. 878 879=item m/PATTERN/cgimosx 880 881=item /PATTERN/cgimosx 882 883Searches a string for a pattern match, and in scalar context returns 884true if it succeeds, false if it fails. If no string is specified 885via the C<=~> or C<!~> operator, the $_ string is searched. (The 886string specified with C<=~> need not be an lvalue--it may be the 887result of an expression evaluation, but remember the C<=~> binds 888rather tightly.) See also L<perlre>. See L<perllocale> for 889discussion of additional considerations that apply when C<use locale> 890is in effect. 891 892Options are: 893 894 c Do not reset search position on a failed match when /g is in effect. 895 g Match globally, i.e., find all occurrences. 896 i Do case-insensitive pattern matching. 897 m Treat string as multiple lines. 898 o Compile pattern only once. 899 s Treat string as single line. 900 x Use extended regular expressions. 901 902If "/" is the delimiter then the initial C<m> is optional. With the C<m> 903you can use any pair of non-alphanumeric, non-whitespace characters 904as delimiters. This is particularly useful for matching path names 905that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is 906the delimiter, then the match-only-once rule of C<?PATTERN?> applies. 907If "'" is the delimiter, no interpolation is performed on the PATTERN. 908 909PATTERN may contain variables, which will be interpolated (and the 910pattern recompiled) every time the pattern search is evaluated, except 911for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and 912C<$|> are not interpolated because they look like end-of-string tests.) 913If you want such a pattern to be compiled only once, add a C</o> after 914the trailing delimiter. This avoids expensive run-time recompilations, 915and is useful when the value you are interpolating won't change over 916the life of the script. However, mentioning C</o> constitutes a promise 917that you won't change the variables in the pattern. If you change them, 918Perl won't even notice. See also L<"qr/STRING/imosx">. 919 920If the PATTERN evaluates to the empty string, the last 921I<successfully> matched regular expression is used instead. In this 922case, only the C<g> and C<c> flags on the empty pattern is honoured - 923the other flags are taken from the original pattern. If no match has 924previously succeeded, this will (silently) act instead as a genuine 925empty pattern (which will always match). 926 927If the C</g> option is not used, C<m//> in list context returns a 928list consisting of the subexpressions matched by the parentheses in the 929pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are 930also set, and that this differs from Perl 4's behavior.) When there are 931no parentheses in the pattern, the return value is the list C<(1)> for 932success. With or without parentheses, an empty list is returned upon 933failure. 934 935Examples: 936 937 open(TTY, '/dev/tty'); 938 <TTY> =~ /^y/i && foo(); # do foo if desired 939 940 if (/Version: *([0-9.]*)/) { $version = $1; } 941 942 next if m#^/usr/spool/uucp#; 943 944 # poor man's grep 945 $arg = shift; 946 while (<>) { 947 print if /$arg/o; # compile only once 948 } 949 950 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/)) 951 952This last example splits $foo into the first two words and the 953remainder of the line, and assigns those three fields to $F1, $F2, and 954$Etc. The conditional is true if any variables were assigned, i.e., if 955the pattern matched. 956 957The C</g> modifier specifies global pattern matching--that is, 958matching as many times as possible within the string. How it behaves 959depends on the context. In list context, it returns a list of the 960substrings matched by any capturing parentheses in the regular 961expression. If there are no parentheses, it returns a list of all 962the matched strings, as if there were parentheses around the whole 963pattern. 964 965In scalar context, each execution of C<m//g> finds the next match, 966returning true if it matches, and false if there is no further match. 967The position after the last match can be read or set using the pos() 968function; see L<perlfunc/pos>. A failed match normally resets the 969search position to the beginning of the string, but you can avoid that 970by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target 971string also resets the search position. 972 973You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a 974zero-width assertion that matches the exact position where the previous 975C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion 976still anchors at pos(), but the match is of course only attempted once. 977Using C<\G> without C</g> on a target string that has not previously had a 978C</g> match applied to it is the same as using the C<\A> assertion to match 979the beginning of the string. Note also that, currently, C<\G> is only 980properly supported when anchored at the very beginning of the pattern. 981 982Examples: 983 984 # list context 985 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); 986 987 # scalar context 988 $/ = ""; 989 while (defined($paragraph = <>)) { 990 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { 991 $sentences++; 992 } 993 } 994 print "$sentences\n"; 995 996 # using m//gc with \G 997 $_ = "ppooqppqq"; 998 while ($i++ < 2) { 999 print "1: '"; 1000 print $1 while /(o)/gc; print "', pos=", pos, "\n"; 1001 print "2: '"; 1002 print $1 if /\G(q)/gc; print "', pos=", pos, "\n"; 1003 print "3: '"; 1004 print $1 while /(p)/gc; print "', pos=", pos, "\n"; 1005 } 1006 print "Final: '$1', pos=",pos,"\n" if /\G(.)/; 1007 1008The last example should print: 1009 1010 1: 'oo', pos=4 1011 2: 'q', pos=5 1012 3: 'pp', pos=7 1013 1: '', pos=7 1014 2: 'q', pos=8 1015 3: '', pos=8 1016 Final: 'q', pos=8 1017 1018Notice that the final match matched C<q> instead of C<p>, which a match 1019without the C<\G> anchor would have done. Also note that the final match 1020did not update C<pos> -- C<pos> is only updated on a C</g> match. If the 1021final match did indeed match C<p>, it's a good bet that you're running an 1022older (pre-5.6.0) Perl. 1023 1024A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can 1025combine several regexps like this to process a string part-by-part, 1026doing different actions depending on which regexp matched. Each 1027regexp tries to match where the previous one leaves off. 1028 1029 $_ = <<'EOL'; 1030 $url = new URI::URL "http://www/"; die if $url eq "xXx"; 1031 EOL 1032 LOOP: 1033 { 1034 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc; 1035 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc; 1036 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc; 1037 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc; 1038 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc; 1039 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc; 1040 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc; 1041 print ". That's all!\n"; 1042 } 1043 1044Here is the output (split into several lines): 1045 1046 line-noise lowercase line-noise lowercase UPPERCASE line-noise 1047 UPPERCASE line-noise lowercase line-noise lowercase line-noise 1048 lowercase lowercase line-noise lowercase lowercase line-noise 1049 MiXeD line-noise. That's all! 1050 1051=item q/STRING/ 1052 1053=item C<'STRING'> 1054 1055A single-quoted, literal string. A backslash represents a backslash 1056unless followed by the delimiter or another backslash, in which case 1057the delimiter or backslash is interpolated. 1058 1059 $foo = q!I said, "You said, 'She said it.'"!; 1060 $bar = q('This is it.'); 1061 $baz = '\n'; # a two-character string 1062 1063=item qq/STRING/ 1064 1065=item "STRING" 1066 1067A double-quoted, interpolated string. 1068 1069 $_ .= qq 1070 (*** The previous line contains the naughty word "$1".\n) 1071 if /\b(tcl|java|python)\b/i; # :-) 1072 $baz = "\n"; # a one-character string 1073 1074=item qr/STRING/imosx 1075 1076This operator quotes (and possibly compiles) its I<STRING> as a regular 1077expression. I<STRING> is interpolated the same way as I<PATTERN> 1078in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation 1079is done. Returns a Perl value which may be used instead of the 1080corresponding C</STRING/imosx> expression. 1081 1082For example, 1083 1084 $rex = qr/my.STRING/is; 1085 s/$rex/foo/; 1086 1087is equivalent to 1088 1089 s/my.STRING/foo/is; 1090 1091The result may be used as a subpattern in a match: 1092 1093 $re = qr/$pattern/; 1094 $string =~ /foo${re}bar/; # can be interpolated in other patterns 1095 $string =~ $re; # or used standalone 1096 $string =~ /$re/; # or this way 1097 1098Since Perl may compile the pattern at the moment of execution of qr() 1099operator, using qr() may have speed advantages in some situations, 1100notably if the result of qr() is used standalone: 1101 1102 sub match { 1103 my $patterns = shift; 1104 my @compiled = map qr/$_/i, @$patterns; 1105 grep { 1106 my $success = 0; 1107 foreach my $pat (@compiled) { 1108 $success = 1, last if /$pat/; 1109 } 1110 $success; 1111 } @_; 1112 } 1113 1114Precompilation of the pattern into an internal representation at 1115the moment of qr() avoids a need to recompile the pattern every 1116time a match C</$pat/> is attempted. (Perl has many other internal 1117optimizations, but none would be triggered in the above example if 1118we did not use qr() operator.) 1119 1120Options are: 1121 1122 i Do case-insensitive pattern matching. 1123 m Treat string as multiple lines. 1124 o Compile pattern only once. 1125 s Treat string as single line. 1126 x Use extended regular expressions. 1127 1128See L<perlre> for additional information on valid syntax for STRING, and 1129for a detailed look at the semantics of regular expressions. 1130 1131=item qx/STRING/ 1132 1133=item `STRING` 1134 1135A string which is (possibly) interpolated and then executed as a 1136system command with C</bin/sh> or its equivalent. Shell wildcards, 1137pipes, and redirections will be honored. The collected standard 1138output of the command is returned; standard error is unaffected. In 1139scalar context, it comes back as a single (potentially multi-line) 1140string, or undef if the command failed. In list context, returns a 1141list of lines (however you've defined lines with $/ or 1142$INPUT_RECORD_SEPARATOR), or an empty list if the command failed. 1143 1144Because backticks do not affect standard error, use shell file descriptor 1145syntax (assuming the shell supports this) if you care to address this. 1146To capture a command's STDERR and STDOUT together: 1147 1148 $output = `cmd 2>&1`; 1149 1150To capture a command's STDOUT but discard its STDERR: 1151 1152 $output = `cmd 2>/dev/null`; 1153 1154To capture a command's STDERR but discard its STDOUT (ordering is 1155important here): 1156 1157 $output = `cmd 2>&1 1>/dev/null`; 1158 1159To exchange a command's STDOUT and STDERR in order to capture the STDERR 1160but leave its STDOUT to come out the old STDERR: 1161 1162 $output = `cmd 3>&1 1>&2 2>&3 3>&-`; 1163 1164To read both a command's STDOUT and its STDERR separately, it's easiest 1165to redirect them separately to files, and then read from those files 1166when the program is done: 1167 1168 system("program args 1>program.stdout 2>program.stderr"); 1169 1170Using single-quote as a delimiter protects the command from Perl's 1171double-quote interpolation, passing it on to the shell instead: 1172 1173 $perl_info = qx(ps $$); # that's Perl's $$ 1174 $shell_info = qx'ps $$'; # that's the new shell's $$ 1175 1176How that string gets evaluated is entirely subject to the command 1177interpreter on your system. On most platforms, you will have to protect 1178shell metacharacters if you want them treated literally. This is in 1179practice difficult to do, as it's unclear how to escape which characters. 1180See L<perlsec> for a clean and safe example of a manual fork() and exec() 1181to emulate backticks safely. 1182 1183On some platforms (notably DOS-like ones), the shell may not be 1184capable of dealing with multiline commands, so putting newlines in 1185the string may not get you what you want. You may be able to evaluate 1186multiple commands in a single line by separating them with the command 1187separator character, if your shell supports that (e.g. C<;> on many Unix 1188shells; C<&> on the Windows NT C<cmd> shell). 1189 1190Beginning with v5.6.0, Perl will attempt to flush all files opened for 1191output before starting the child process, but this may not be supported 1192on some platforms (see L<perlport>). To be safe, you may need to set 1193C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of 1194C<IO::Handle> on any open handles. 1195 1196Beware that some command shells may place restrictions on the length 1197of the command line. You must ensure your strings don't exceed this 1198limit after any necessary interpolations. See the platform-specific 1199release notes for more details about your particular environment. 1200 1201Using this operator can lead to programs that are difficult to port, 1202because the shell commands called vary between systems, and may in 1203fact not be present at all. As one example, the C<type> command under 1204the POSIX shell is very different from the C<type> command under DOS. 1205That doesn't mean you should go out of your way to avoid backticks 1206when they're the right way to get something done. Perl was made to be 1207a glue language, and one of the things it glues together is commands. 1208Just understand what you're getting yourself into. 1209 1210See L<"I/O Operators"> for more discussion. 1211 1212=item qw/STRING/ 1213 1214Evaluates to a list of the words extracted out of STRING, using embedded 1215whitespace as the word delimiters. It can be understood as being roughly 1216equivalent to: 1217 1218 split(' ', q/STRING/); 1219 1220the differences being that it generates a real list at compile time, and 1221in scalar context it returns the last element in the list. So 1222this expression: 1223 1224 qw(foo bar baz) 1225 1226is semantically equivalent to the list: 1227 1228 'foo', 'bar', 'baz' 1229 1230Some frequently seen examples: 1231 1232 use POSIX qw( setlocale localeconv ) 1233 @EXPORT = qw( foo bar baz ); 1234 1235A common mistake is to try to separate the words with comma or to 1236put comments into a multi-line C<qw>-string. For this reason, the 1237C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) 1238produces warnings if the STRING contains the "," or the "#" character. 1239 1240=item s/PATTERN/REPLACEMENT/egimosx 1241 1242Searches a string for a pattern, and if found, replaces that pattern 1243with the replacement text and returns the number of substitutions 1244made. Otherwise it returns false (specifically, the empty string). 1245 1246If no string is specified via the C<=~> or C<!~> operator, the C<$_> 1247variable is searched and modified. (The string specified with C<=~> must 1248be scalar variable, an array element, a hash element, or an assignment 1249to one of those, i.e., an lvalue.) 1250 1251If the delimiter chosen is a single quote, no interpolation is 1252done on either the PATTERN or the REPLACEMENT. Otherwise, if the 1253PATTERN contains a $ that looks like a variable rather than an 1254end-of-string test, the variable will be interpolated into the pattern 1255at run-time. If you want the pattern compiled only once the first time 1256the variable is interpolated, use the C</o> option. If the pattern 1257evaluates to the empty string, the last successfully executed regular 1258expression is used instead. See L<perlre> for further explanation on these. 1259See L<perllocale> for discussion of additional considerations that apply 1260when C<use locale> is in effect. 1261 1262Options are: 1263 1264 e Evaluate the right side as an expression. 1265 g Replace globally, i.e., all occurrences. 1266 i Do case-insensitive pattern matching. 1267 m Treat string as multiple lines. 1268 o Compile pattern only once. 1269 s Treat string as single line. 1270 x Use extended regular expressions. 1271 1272Any non-alphanumeric, non-whitespace delimiter may replace the 1273slashes. If single quotes are used, no interpretation is done on the 1274replacement string (the C</e> modifier overrides this, however). Unlike 1275Perl 4, Perl 5 treats backticks as normal delimiters; the replacement 1276text is not evaluated as a command. If the 1277PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own 1278pair of quotes, which may or may not be bracketing quotes, e.g., 1279C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the 1280replacement portion to be treated as a full-fledged Perl expression 1281and evaluated right then and there. It is, however, syntax checked at 1282compile-time. A second C<e> modifier will cause the replacement portion 1283to be C<eval>ed before being run as a Perl expression. 1284 1285Examples: 1286 1287 s/\bgreen\b/mauve/g; # don't change wintergreen 1288 1289 $path =~ s|/usr/bin|/usr/local/bin|; 1290 1291 s/Login: $foo/Login: $bar/; # run-time pattern 1292 1293 ($foo = $bar) =~ s/this/that/; # copy first, then change 1294 1295 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count 1296 1297 $_ = 'abc123xyz'; 1298 s/\d+/$&*2/e; # yields 'abc246xyz' 1299 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz' 1300 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz' 1301 1302 s/%(.)/$percent{$1}/g; # change percent escapes; no /e 1303 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e 1304 s/^=(\w+)/&pod($1)/ge; # use function call 1305 1306 # expand variables in $_, but dynamics only, using 1307 # symbolic dereferencing 1308 s/\$(\w+)/${$1}/g; 1309 1310 # Add one to the value of any numbers in the string 1311 s/(\d+)/1 + $1/eg; 1312 1313 # This will expand any embedded scalar variable 1314 # (including lexicals) in $_ : First $1 is interpolated 1315 # to the variable name, and then evaluated 1316 s/(\$\w+)/$1/eeg; 1317 1318 # Delete (most) C comments. 1319 $program =~ s { 1320 /\* # Match the opening delimiter. 1321 .*? # Match a minimal number of characters. 1322 \*/ # Match the closing delimiter. 1323 } []gsx; 1324 1325 s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively 1326 1327 for ($variable) { # trim white space in $variable, cheap 1328 s/^\s+//; 1329 s/\s+$//; 1330 } 1331 1332 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields 1333 1334Note the use of $ instead of \ in the last example. Unlike 1335B<sed>, we use the \<I<digit>> form in only the left hand side. 1336Anywhere else it's $<I<digit>>. 1337 1338Occasionally, you can't use just a C</g> to get all the changes 1339to occur that you might want. Here are two common cases: 1340 1341 # put commas in the right places in an integer 1342 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; 1343 1344 # expand tabs to 8-column spacing 1345 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; 1346 1347=item tr/SEARCHLIST/REPLACEMENTLIST/cds 1348 1349=item y/SEARCHLIST/REPLACEMENTLIST/cds 1350 1351Transliterates all occurrences of the characters found in the search list 1352with the corresponding character in the replacement list. It returns 1353the number of characters replaced or deleted. If no string is 1354specified via the =~ or !~ operator, the $_ string is transliterated. (The 1355string specified with =~ must be a scalar variable, an array element, a 1356hash element, or an assignment to one of those, i.e., an lvalue.) 1357 1358A character range may be specified with a hyphen, so C<tr/A-J/0-9/> 1359does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>. 1360For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the 1361SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has 1362its own pair of quotes, which may or may not be bracketing quotes, 1363e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>. 1364 1365Note that C<tr> does B<not> do regular expression character classes 1366such as C<\d> or C<[:lower:]>. The <tr> operator is not equivalent to 1367the tr(1) utility. If you want to map strings between lower/upper 1368cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider 1369using the C<s> operator if you need regular expressions. 1370 1371Note also that the whole range idea is rather unportable between 1372character sets--and even within character sets they may cause results 1373you probably didn't expect. A sound principle is to use only ranges 1374that begin from and end at either alphabets of equal case (a-e, A-E), 1375or digits (0-4). Anything else is unsafe. If in doubt, spell out the 1376character sets in full. 1377 1378Options: 1379 1380 c Complement the SEARCHLIST. 1381 d Delete found but unreplaced characters. 1382 s Squash duplicate replaced characters. 1383 1384If the C</c> modifier is specified, the SEARCHLIST character set 1385is complemented. If the C</d> modifier is specified, any characters 1386specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. 1387(Note that this is slightly more flexible than the behavior of some 1388B<tr> programs, which delete anything they find in the SEARCHLIST, 1389period.) If the C</s> modifier is specified, sequences of characters 1390that were transliterated to the same character are squashed down 1391to a single instance of the character. 1392 1393If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted 1394exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter 1395than the SEARCHLIST, the final character is replicated till it is long 1396enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. 1397This latter is useful for counting characters in a class or for 1398squashing character sequences in a class. 1399 1400Examples: 1401 1402 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case 1403 1404 $cnt = tr/*/*/; # count the stars in $_ 1405 1406 $cnt = $sky =~ tr/*/*/; # count the stars in $sky 1407 1408 $cnt = tr/0-9//; # count the digits in $_ 1409 1410 tr/a-zA-Z//s; # bookkeeper -> bokeper 1411 1412 ($HOST = $host) =~ tr/a-z/A-Z/; 1413 1414 tr/a-zA-Z/ /cs; # change non-alphas to single space 1415 1416 tr [\200-\377] 1417 [\000-\177]; # delete 8th bit 1418 1419If multiple transliterations are given for a character, only the 1420first one is used: 1421 1422 tr/AAA/XYZ/ 1423 1424will transliterate any A to X. 1425 1426Because the transliteration table is built at compile time, neither 1427the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote 1428interpolation. That means that if you want to use variables, you 1429must use an eval(): 1430 1431 eval "tr/$oldlist/$newlist/"; 1432 die $@ if $@; 1433 1434 eval "tr/$oldlist/$newlist/, 1" or die $@; 1435 1436=item <<EOF 1437 1438A line-oriented form of quoting is based on the shell "here-document" 1439syntax. Following a C<< << >> you specify a string to terminate 1440the quoted material, and all lines following the current line down to 1441the terminating string are the value of the item. The terminating 1442string may be either an identifier (a word), or some quoted text. If 1443quoted, the type of quotes you use determines the treatment of the 1444text, just as in regular quoting. An unquoted identifier works like 1445double quotes. There must be no space between the C<< << >> and 1446the identifier, unless the identifier is quoted. (If you put a space it 1447will be treated as a null identifier, which is valid, and matches the first 1448empty line.) The terminating string must appear by itself (unquoted and 1449with no surrounding whitespace) on the terminating line. 1450 1451 print <<EOF; 1452 The price is $Price. 1453 EOF 1454 1455 print << "EOF"; # same as above 1456 The price is $Price. 1457 EOF 1458 1459 print << `EOC`; # execute commands 1460 echo hi there 1461 echo lo there 1462 EOC 1463 1464 print <<"foo", <<"bar"; # you can stack them 1465 I said foo. 1466 foo 1467 I said bar. 1468 bar 1469 1470 myfunc(<< "THIS", 23, <<'THAT'); 1471 Here's a line 1472 or two. 1473 THIS 1474 and here's another. 1475 THAT 1476 1477Just don't forget that you have to put a semicolon on the end 1478to finish the statement, as Perl doesn't know you're not going to 1479try to do this: 1480 1481 print <<ABC 1482 179231 1483 ABC 1484 + 20; 1485 1486If you want your here-docs to be indented with the 1487rest of the code, you'll need to remove leading whitespace 1488from each line manually: 1489 1490 ($quote = <<'FINIS') =~ s/^\s+//gm; 1491 The Road goes ever on and on, 1492 down from the door where it began. 1493 FINIS 1494 1495If you use a here-doc within a delimited construct, such as in C<s///eg>, 1496the quoted material must come on the lines following the final delimiter. 1497So instead of 1498 1499 s/this/<<E . 'that' 1500 the other 1501 E 1502 . 'more '/eg; 1503 1504you have to write 1505 1506 s/this/<<E . 'that' 1507 . 'more '/eg; 1508 the other 1509 E 1510 1511If the terminating identifier is on the last line of the program, you 1512must be sure there is a newline after it; otherwise, Perl will give the 1513warning B<Can't find string terminator "END" anywhere before EOF...>. 1514 1515Additionally, the quoting rules for the identifier are not related to 1516Perl's quoting rules -- C<q()>, C<qq()>, and the like are not supported 1517in place of C<''> and C<"">, and the only interpolation is for backslashing 1518the quoting character: 1519 1520 print << "abc\"def"; 1521 testing... 1522 abc"def 1523 1524Finally, quoted strings cannot span multiple lines. The general rule is 1525that the identifier must be a string literal. Stick with that, and you 1526should be safe. 1527 1528=back 1529 1530=head2 Gory details of parsing quoted constructs 1531 1532When presented with something that might have several different 1533interpretations, Perl uses the B<DWIM> (that's "Do What I Mean") 1534principle to pick the most probable interpretation. This strategy 1535is so successful that Perl programmers often do not suspect the 1536ambivalence of what they write. But from time to time, Perl's 1537notions differ substantially from what the author honestly meant. 1538 1539This section hopes to clarify how Perl handles quoted constructs. 1540Although the most common reason to learn this is to unravel labyrinthine 1541regular expressions, because the initial steps of parsing are the 1542same for all quoting operators, they are all discussed together. 1543 1544The most important Perl parsing rule is the first one discussed 1545below: when processing a quoted construct, Perl first finds the end 1546of that construct, then interprets its contents. If you understand 1547this rule, you may skip the rest of this section on the first 1548reading. The other rules are likely to contradict the user's 1549expectations much less frequently than this first one. 1550 1551Some passes discussed below are performed concurrently, but because 1552their results are the same, we consider them individually. For different 1553quoting constructs, Perl performs different numbers of passes, from 1554one to five, but these passes are always performed in the same order. 1555 1556=over 4 1557 1558=item Finding the end 1559 1560The first pass is finding the end of the quoted construct, whether 1561it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF> 1562construct, a C</> that terminates a C<qq//> construct, a C<]> which 1563terminates C<qq[]> construct, or a C<< > >> which terminates a 1564fileglob started with C<< < >>. 1565 1566When searching for single-character non-pairing delimiters, such 1567as C</>, combinations of C<\\> and C<\/> are skipped. However, 1568when searching for single-character pairing delimiter like C<[>, 1569combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested 1570C<[>, C<]> are skipped as well. When searching for multicharacter 1571delimiters, nothing is skipped. 1572 1573For constructs with three-part delimiters (C<s///>, C<y///>, and 1574C<tr///>), the search is repeated once more. 1575 1576During this search no attention is paid to the semantics of the construct. 1577Thus: 1578 1579 "$hash{"$foo/$bar"}" 1580 1581or: 1582 1583 m/ 1584 bar # NOT a comment, this slash / terminated m//! 1585 /x 1586 1587do not form legal quoted expressions. The quoted part ends on the 1588first C<"> and C</>, and the rest happens to be a syntax error. 1589Because the slash that terminated C<m//> was followed by a C<SPACE>, 1590the example above is not C<m//x>, but rather C<m//> with no C</x> 1591modifier. So the embedded C<#> is interpreted as a literal C<#>. 1592 1593=item Removal of backslashes before delimiters 1594 1595During the second pass, text between the starting and ending 1596delimiters is copied to a safe location, and the C<\> is removed 1597from combinations consisting of C<\> and delimiter--or delimiters, 1598meaning both starting and ending delimiters will should these differ. 1599This removal does not happen for multi-character delimiters. 1600Note that the combination C<\\> is left intact, just as it was. 1601 1602Starting from this step no information about the delimiters is 1603used in parsing. 1604 1605=item Interpolation 1606 1607The next step is interpolation in the text obtained, which is now 1608delimiter-independent. There are four different cases. 1609 1610=over 4 1611 1612=item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///> 1613 1614No interpolation is performed. 1615 1616=item C<''>, C<q//> 1617 1618The only interpolation is removal of C<\> from pairs C<\\>. 1619 1620=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >> 1621 1622C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are 1623converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar"> 1624is converted to C<$foo . (quotemeta("baz" . $bar))> internally. 1625The other combinations are replaced with appropriate expansions. 1626 1627Let it be stressed that I<whatever falls between C<\Q> and C<\E>> 1628is interpolated in the usual way. Something like C<"\Q\\E"> has 1629no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the 1630result is the same as for C<"\\\\E">. As a general rule, backslashes 1631between C<\Q> and C<\E> may lead to counterintuitive results. So, 1632C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same 1633as C<"\\\t"> (since TAB is not alphanumeric). Note also that: 1634 1635 $str = '\t'; 1636 return "\Q$str"; 1637 1638may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">. 1639 1640Interpolated scalars and arrays are converted internally to the C<join> and 1641C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes: 1642 1643 $foo . " XXX '" . (join $", @arr) . "'"; 1644 1645All operations above are performed simultaneously, left to right. 1646 1647Because the result of C<"\Q STRING \E"> has all metacharacters 1648quoted, there is no way to insert a literal C<$> or C<@> inside a 1649C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became 1650C<"\\\$">; if not, it is interpreted as the start of an interpolated 1651scalar. 1652 1653Note also that the interpolation code needs to make a decision on 1654where the interpolated scalar ends. For instance, whether 1655C<< "a $b -> {c}" >> really means: 1656 1657 "a " . $b . " -> {c}"; 1658 1659or: 1660 1661 "a " . $b -> {c}; 1662 1663Most of the time, the longest possible text that does not include 1664spaces between components and which contains matching braces or 1665brackets. because the outcome may be determined by voting based 1666on heuristic estimators, the result is not strictly predictable. 1667Fortunately, it's usually correct for ambiguous cases. 1668 1669=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>, 1670 1671Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation 1672happens (almost) as with C<qq//> constructs, but the substitution 1673of C<\> followed by RE-special chars (including C<\>) is not 1674performed. Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and 1675a C<#>-comment in a C<//x>-regular expression, no processing is 1676performed whatsoever. This is the first step at which the presence 1677of the C<//x> modifier is relevant. 1678 1679Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not 1680interpolated, and constructs C<$var[SOMETHING]> are voted (by several 1681different estimators) to be either an array element or C<$var> 1682followed by an RE alternative. This is where the notation 1683C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as 1684array element C<-9>, not as a regular expression from the variable 1685C<$arr> followed by a digit, which would be the interpretation of 1686C</$arr[0-9]/>. Since voting among different estimators may occur, 1687the result is not predictable. 1688 1689It is at this step that C<\1> is begrudgingly converted to C<$1> in 1690the replacement text of C<s///> to correct the incorrigible 1691I<sed> hackers who haven't picked up the saner idiom yet. A warning 1692is emitted if the C<use warnings> pragma or the B<-w> command-line flag 1693(that is, the C<$^W> variable) was set. 1694 1695The lack of processing of C<\\> creates specific restrictions on 1696the post-processed text. If the delimiter is C</>, one cannot get 1697the combination C<\/> into the result of this step. C</> will 1698finish the regular expression, C<\/> will be stripped to C</> on 1699the previous step, and C<\\/> will be left as is. Because C</> is 1700equivalent to C<\/> inside a regular expression, this does not 1701matter unless the delimiter happens to be character special to the 1702RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an 1703alphanumeric char, as in: 1704 1705 m m ^ a \s* b mmx; 1706 1707In the RE above, which is intentionally obfuscated for illustration, the 1708delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the 1709RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one 1710reason you're encouraged to restrict your delimiters to non-alphanumeric, 1711non-whitespace choices. 1712 1713=back 1714 1715This step is the last one for all constructs except regular expressions, 1716which are processed further. 1717 1718=item Interpolation of regular expressions 1719 1720Previous steps were performed during the compilation of Perl code, 1721but this one happens at run time--although it may be optimized to 1722be calculated at compile time if appropriate. After preprocessing 1723described above, and possibly after evaluation if catenation, 1724joining, casing translation, or metaquoting are involved, the 1725resulting I<string> is passed to the RE engine for compilation. 1726 1727Whatever happens in the RE engine might be better discussed in L<perlre>, 1728but for the sake of continuity, we shall do so here. 1729 1730This is another step where the presence of the C<//x> modifier is 1731relevant. The RE engine scans the string from left to right and 1732converts it to a finite automaton. 1733 1734Backslashed characters are either replaced with corresponding 1735literal strings (as with C<\{>), or else they generate special nodes 1736in the finite automaton (as with C<\b>). Characters special to the 1737RE engine (such as C<|>) generate corresponding nodes or groups of 1738nodes. C<(?#...)> comments are ignored. All the rest is either 1739converted to literal strings to match, or else is ignored (as is 1740whitespace and C<#>-style comments if C<//x> is present). 1741 1742Parsing of the bracketed character class construct, C<[...]>, is 1743rather different than the rule used for the rest of the pattern. 1744The terminator of this construct is found using the same rules as 1745for finding the terminator of a C<{}>-delimited construct, the only 1746exception being that C<]> immediately following C<[> is treated as 1747though preceded by a backslash. Similarly, the terminator of 1748C<(?{...})> is found using the same rules as for finding the 1749terminator of a C<{}>-delimited construct. 1750 1751It is possible to inspect both the string given to RE engine and the 1752resulting finite automaton. See the arguments C<debug>/C<debugcolor> 1753in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line 1754switch documented in L<perlrun/"Command Switches">. 1755 1756=item Optimization of regular expressions 1757 1758This step is listed for completeness only. Since it does not change 1759semantics, details of this step are not documented and are subject 1760to change without notice. This step is performed over the finite 1761automaton that was generated during the previous pass. 1762 1763It is at this stage that C<split()> silently optimizes C</^/> to 1764mean C</^/m>. 1765 1766=back 1767 1768=head2 I/O Operators 1769 1770There are several I/O operators you should know about. 1771 1772A string enclosed by backticks (grave accents) first undergoes 1773double-quote interpolation. It is then interpreted as an external 1774command, and the output of that command is the value of the 1775backtick string, like in a shell. In scalar context, a single string 1776consisting of all output is returned. In list context, a list of 1777values is returned, one per line of output. (You can set C<$/> to use 1778a different line terminator.) The command is executed each time the 1779pseudo-literal is evaluated. The status value of the command is 1780returned in C<$?> (see L<perlvar> for the interpretation of C<$?>). 1781Unlike in B<csh>, no translation is done on the return data--newlines 1782remain newlines. Unlike in any of the shells, single quotes do not 1783hide variable names in the command from interpretation. To pass a 1784literal dollar-sign through to the shell you need to hide it with a 1785backslash. The generalized form of backticks is C<qx//>. (Because 1786backticks always undergo shell expansion as well, see L<perlsec> for 1787security concerns.) 1788 1789In scalar context, evaluating a filehandle in angle brackets yields 1790the next line from that file (the newline, if any, included), or 1791C<undef> at end-of-file or on error. When C<$/> is set to C<undef> 1792(sometimes known as file-slurp mode) and the file is empty, it 1793returns C<''> the first time, followed by C<undef> subsequently. 1794 1795Ordinarily you must assign the returned value to a variable, but 1796there is one situation where an automatic assignment happens. If 1797and only if the input symbol is the only thing inside the conditional 1798of a C<while> statement (even if disguised as a C<for(;;)> loop), 1799the value is automatically assigned to the global variable $_, 1800destroying whatever was there previously. (This may seem like an 1801odd thing to you, but you'll use the construct in almost every Perl 1802script you write.) The $_ variable is not implicitly localized. 1803You'll have to put a C<local $_;> before the loop if you want that 1804to happen. 1805 1806The following lines are equivalent: 1807 1808 while (defined($_ = <STDIN>)) { print; } 1809 while ($_ = <STDIN>) { print; } 1810 while (<STDIN>) { print; } 1811 for (;<STDIN>;) { print; } 1812 print while defined($_ = <STDIN>); 1813 print while ($_ = <STDIN>); 1814 print while <STDIN>; 1815 1816This also behaves similarly, but avoids $_ : 1817 1818 while (my $line = <STDIN>) { print $line } 1819 1820In these loop constructs, the assigned value (whether assignment 1821is automatic or explicit) is then tested to see whether it is 1822defined. The defined test avoids problems where line has a string 1823value that would be treated as false by Perl, for example a "" or 1824a "0" with no trailing newline. If you really mean for such values 1825to terminate the loop, they should be tested for explicitly: 1826 1827 while (($_ = <STDIN>) ne '0') { ... } 1828 while (<STDIN>) { last unless $_; ... } 1829 1830In other boolean contexts, C<< <I<filehandle>> >> without an 1831explicit C<defined> test or comparison elicit a warning if the 1832C<use warnings> pragma or the B<-w> 1833command-line switch (the C<$^W> variable) is in effect. 1834 1835The filehandles STDIN, STDOUT, and STDERR are predefined. (The 1836filehandles C<stdin>, C<stdout>, and C<stderr> will also work except 1837in packages, where they would be interpreted as local identifiers 1838rather than global.) Additional filehandles may be created with 1839the open() function, amongst others. See L<perlopentut> and 1840L<perlfunc/open> for details on this. 1841 1842If a <FILEHANDLE> is used in a context that is looking for 1843a list, a list comprising all input lines is returned, one line per 1844list element. It's easy to grow to a rather large data space this 1845way, so use with care. 1846 1847<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>. 1848See L<perlfunc/readline>. 1849 1850The null filehandle <> is special: it can be used to emulate the 1851behavior of B<sed> and B<awk>. Input from <> comes either from 1852standard input, or from each file listed on the command line. Here's 1853how it works: the first time <> is evaluated, the @ARGV array is 1854checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened 1855gives you standard input. The @ARGV array is then processed as a list 1856of filenames. The loop 1857 1858 while (<>) { 1859 ... # code for each line 1860 } 1861 1862is equivalent to the following Perl-like pseudo code: 1863 1864 unshift(@ARGV, '-') unless @ARGV; 1865 while ($ARGV = shift) { 1866 open(ARGV, $ARGV); 1867 while (<ARGV>) { 1868 ... # code for each line 1869 } 1870 } 1871 1872except that it isn't so cumbersome to say, and will actually work. 1873It really does shift the @ARGV array and put the current filename 1874into the $ARGV variable. It also uses filehandle I<ARGV> 1875internally--<> is just a synonym for <ARGV>, which 1876is magical. (The pseudo code above doesn't work because it treats 1877<ARGV> as non-magical.) 1878 1879You can modify @ARGV before the first <> as long as the array ends up 1880containing the list of filenames you really want. Line numbers (C<$.>) 1881continue as though the input were one big happy file. See the example 1882in L<perlfunc/eof> for how to reset line numbers on each file. 1883 1884If you want to set @ARGV to your own list of files, go right ahead. 1885This sets @ARGV to all plain text files if no @ARGV was given: 1886 1887 @ARGV = grep { -f && -T } glob('*') unless @ARGV; 1888 1889You can even set them to pipe commands. For example, this automatically 1890filters compressed arguments through B<gzip>: 1891 1892 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV; 1893 1894If you want to pass switches into your script, you can use one of the 1895Getopts modules or put a loop on the front like this: 1896 1897 while ($_ = $ARGV[0], /^-/) { 1898 shift; 1899 last if /^--$/; 1900 if (/^-D(.*)/) { $debug = $1 } 1901 if (/^-v/) { $verbose++ } 1902 # ... # other switches 1903 } 1904 1905 while (<>) { 1906 # ... # code for each line 1907 } 1908 1909The <> symbol will return C<undef> for end-of-file only once. 1910If you call it again after this, it will assume you are processing another 1911@ARGV list, and if you haven't set @ARGV, will read input from STDIN. 1912 1913If what the angle brackets contain is a simple scalar variable (e.g., 1914<$foo>), then that variable contains the name of the 1915filehandle to input from, or its typeglob, or a reference to the 1916same. For example: 1917 1918 $fh = \*STDIN; 1919 $line = <$fh>; 1920 1921If what's within the angle brackets is neither a filehandle nor a simple 1922scalar variable containing a filehandle name, typeglob, or typeglob 1923reference, it is interpreted as a filename pattern to be globbed, and 1924either a list of filenames or the next filename in the list is returned, 1925depending on context. This distinction is determined on syntactic 1926grounds alone. That means C<< <$x> >> is always a readline() from 1927an indirect handle, but C<< <$hash{key}> >> is always a glob(). 1928That's because $x is a simple scalar variable, but C<$hash{key}> is 1929not--it's a hash element. 1930 1931One level of double-quote interpretation is done first, but you can't 1932say C<< <$foo> >> because that's an indirect filehandle as explained 1933in the previous paragraph. (In older versions of Perl, programmers 1934would insert curly brackets to force interpretation as a filename glob: 1935C<< <${foo}> >>. These days, it's considered cleaner to call the 1936internal function directly as C<glob($foo)>, which is probably the right 1937way to have done it in the first place.) For example: 1938 1939 while (<*.c>) { 1940 chmod 0644, $_; 1941 } 1942 1943is roughly equivalent to: 1944 1945 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|"); 1946 while (<FOO>) { 1947 chomp; 1948 chmod 0644, $_; 1949 } 1950 1951except that the globbing is actually done internally using the standard 1952C<File::Glob> extension. Of course, the shortest way to do the above is: 1953 1954 chmod 0644, <*.c>; 1955 1956A (file)glob evaluates its (embedded) argument only when it is 1957starting a new list. All values must be read before it will start 1958over. In list context, this isn't important because you automatically 1959get them all anyway. However, in scalar context the operator returns 1960the next value each time it's called, or C<undef> when the list has 1961run out. As with filehandle reads, an automatic C<defined> is 1962generated when the glob occurs in the test part of a C<while>, 1963because legal glob returns (e.g. a file called F<0>) would otherwise 1964terminate the loop. Again, C<undef> is returned only once. So if 1965you're expecting a single value from a glob, it is much better to 1966say 1967 1968 ($file) = <blurch*>; 1969 1970than 1971 1972 $file = <blurch*>; 1973 1974because the latter will alternate between returning a filename and 1975returning false. 1976 1977If you're trying to do variable interpolation, it's definitely better 1978to use the glob() function, because the older notation can cause people 1979to become confused with the indirect filehandle notation. 1980 1981 @files = glob("$dir/*.[ch]"); 1982 @files = glob($files[$i]); 1983 1984=head2 Constant Folding 1985 1986Like C, Perl does a certain amount of expression evaluation at 1987compile time whenever it determines that all arguments to an 1988operator are static and have no side effects. In particular, string 1989concatenation happens at compile time between literals that don't do 1990variable substitution. Backslash interpolation also happens at 1991compile time. You can say 1992 1993 'Now is the time for all' . "\n" . 1994 'good men to come to.' 1995 1996and this all reduces to one string internally. Likewise, if 1997you say 1998 1999 foreach $file (@filenames) { 2000 if (-s $file > 5 + 100 * 2**16) { } 2001 } 2002 2003the compiler will precompute the number which that expression 2004represents so that the interpreter won't have to. 2005 2006=head2 Bitwise String Operators 2007 2008Bitstrings of any size may be manipulated by the bitwise operators 2009(C<~ | & ^>). 2010 2011If the operands to a binary bitwise op are strings of different 2012sizes, B<|> and B<^> ops act as though the shorter operand had 2013additional zero bits on the right, while the B<&> op acts as though 2014the longer operand were truncated to the length of the shorter. 2015The granularity for such extension or truncation is one or more 2016bytes. 2017 2018 # ASCII-based examples 2019 print "j p \n" ^ " a h"; # prints "JAPH\n" 2020 print "JA" | " ph\n"; # prints "japh\n" 2021 print "japh\nJunk" & '_____'; # prints "JAPH\n"; 2022 print 'p N$' ^ " E<H\n"; # prints "Perl\n"; 2023 2024If you are intending to manipulate bitstrings, be certain that 2025you're supplying bitstrings: If an operand is a number, that will imply 2026a B<numeric> bitwise operation. You may explicitly show which type of 2027operation you intend by using C<""> or C<0+>, as in the examples below. 2028 2029 $foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF) 2030 $foo = '150' | 105 ; # yields 255 2031 $foo = 150 | '105'; # yields 255 2032 $foo = '150' | '105'; # yields string '155' (under ASCII) 2033 2034 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric 2035 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy 2036 2037See L<perlfunc/vec> for information on how to manipulate individual bits 2038in a bit vector. 2039 2040=head2 Integer Arithmetic 2041 2042By default, Perl assumes that it must do most of its arithmetic in 2043floating point. But by saying 2044 2045 use integer; 2046 2047you may tell the compiler that it's okay to use integer operations 2048(if it feels like it) from here to the end of the enclosing BLOCK. 2049An inner BLOCK may countermand this by saying 2050 2051 no integer; 2052 2053which lasts until the end of that BLOCK. Note that this doesn't 2054mean everything is only an integer, merely that Perl may use integer 2055operations if it is so inclined. For example, even under C<use 2056integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731> 2057or so. 2058 2059Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<", 2060and ">>") always produce integral results. (But see also 2061L<Bitwise String Operators>.) However, C<use integer> still has meaning for 2062them. By default, their results are interpreted as unsigned integers, but 2063if C<use integer> is in effect, their results are interpreted 2064as signed integers. For example, C<~0> usually evaluates to a large 2065integral value. However, C<use integer; ~0> is C<-1> on twos-complement 2066machines. 2067 2068=head2 Floating-point Arithmetic 2069 2070While C<use integer> provides integer-only arithmetic, there is no 2071analogous mechanism to provide automatic rounding or truncation to a 2072certain number of decimal places. For rounding to a certain number 2073of digits, sprintf() or printf() is usually the easiest route. 2074See L<perlfaq4>. 2075 2076Floating-point numbers are only approximations to what a mathematician 2077would call real numbers. There are infinitely more reals than floats, 2078so some corners must be cut. For example: 2079 2080 printf "%.20g\n", 123456789123456789; 2081 # produces 123456789123456784 2082 2083Testing for exact equality of floating-point equality or inequality is 2084not a good idea. Here's a (relatively expensive) work-around to compare 2085whether two floating-point numbers are equal to a particular number of 2086decimal places. See Knuth, volume II, for a more robust treatment of 2087this topic. 2088 2089 sub fp_equal { 2090 my ($X, $Y, $POINTS) = @_; 2091 my ($tX, $tY); 2092 $tX = sprintf("%.${POINTS}g", $X); 2093 $tY = sprintf("%.${POINTS}g", $Y); 2094 return $tX eq $tY; 2095 } 2096 2097The POSIX module (part of the standard perl distribution) implements 2098ceil(), floor(), and other mathematical and trigonometric functions. 2099The Math::Complex module (part of the standard perl distribution) 2100defines mathematical functions that work on both the reals and the 2101imaginary numbers. Math::Complex not as efficient as POSIX, but 2102POSIX can't work with complex numbers. 2103 2104Rounding in financial applications can have serious implications, and 2105the rounding method used should be specified precisely. In these 2106cases, it probably pays not to trust whichever system rounding is 2107being used by Perl, but to instead implement the rounding function you 2108need yourself. 2109 2110=head2 Bigger Numbers 2111 2112The standard Math::BigInt and Math::BigFloat modules provide 2113variable-precision arithmetic and overloaded operators, although 2114they're currently pretty slow. At the cost of some space and 2115considerable speed, they avoid the normal pitfalls associated with 2116limited-precision representations. 2117 2118 use Math::BigInt; 2119 $x = Math::BigInt->new('123456789123456789'); 2120 print $x * $x; 2121 2122 # prints +15241578780673678515622620750190521 2123 2124There are several modules that let you calculate with (bound only by 2125memory and cpu-time) unlimited or fixed precision. There are also 2126some non-standard modules that provide faster implementations via 2127external C libraries. 2128 2129Here is a short, but incomplete summary: 2130 2131 Math::Fraction big, unlimited fractions like 9973 / 12967 2132 Math::String treat string sequences like numbers 2133 Math::FixedPrecision calculate with a fixed precision 2134 Math::Currency for currency calculations 2135 Bit::Vector manipulate bit vectors fast (uses C) 2136 Math::BigIntFast Bit::Vector wrapper for big numbers 2137 Math::Pari provides access to the Pari C library 2138 Math::BigInteger uses an external C library 2139 Math::Cephes uses external Cephes C library (no big numbers) 2140 Math::Cephes::Fraction fractions via the Cephes library 2141 Math::GMP another one using an external C library 2142 2143Choose wisely. 2144 2145=cut 2146