xref: /onnv-gate/usr/src/cmd/perl/5.8.4/distrib/pod/perlop.pod (revision 0:68f95e015346)
1*0Sstevel@tonic-gate=head1 NAME
2*0Sstevel@tonic-gate
3*0Sstevel@tonic-gateperlop - Perl operators and precedence
4*0Sstevel@tonic-gate
5*0Sstevel@tonic-gate=head1 DESCRIPTION
6*0Sstevel@tonic-gate
7*0Sstevel@tonic-gate=head2 Operator Precedence and Associativity
8*0Sstevel@tonic-gate
9*0Sstevel@tonic-gateOperator precedence and associativity work in Perl more or less like
10*0Sstevel@tonic-gatethey do in mathematics.
11*0Sstevel@tonic-gate
12*0Sstevel@tonic-gateI<Operator precedence> means some operators are evaluated before
13*0Sstevel@tonic-gateothers.  For example, in C<2 + 4 * 5>, the multiplication has higher
14*0Sstevel@tonic-gateprecedence so C<4 * 5> is evaluated first yielding C<2 + 20 ==
15*0Sstevel@tonic-gate22> and not C<6 * 5 == 30>.
16*0Sstevel@tonic-gate
17*0Sstevel@tonic-gateI<Operator associativity> defines what happens if a sequence of the
18*0Sstevel@tonic-gatesame operators is used one after another: whether the evaluator will
19*0Sstevel@tonic-gateevaluate the left operations first or the right.  For example, in C<8
20*0Sstevel@tonic-gate- 4 - 2>, subtraction is left associative so Perl evaluates the
21*0Sstevel@tonic-gateexpression left to right.  C<8 - 4> is evaluated first making the
22*0Sstevel@tonic-gateexpression C<4 - 2 == 2> and not C<8 - 2 == 6>.
23*0Sstevel@tonic-gate
24*0Sstevel@tonic-gatePerl operators have the following associativity and precedence,
25*0Sstevel@tonic-gatelisted from highest precedence to lowest.  Operators borrowed from
26*0Sstevel@tonic-gateC keep the same precedence relationship with each other, even where
27*0Sstevel@tonic-gateC's precedence is slightly screwy.  (This makes learning Perl easier
28*0Sstevel@tonic-gatefor C folks.)  With very few exceptions, these all operate on scalar
29*0Sstevel@tonic-gatevalues only, not array values.
30*0Sstevel@tonic-gate
31*0Sstevel@tonic-gate    left	terms and list operators (leftward)
32*0Sstevel@tonic-gate    left	->
33*0Sstevel@tonic-gate    nonassoc	++ --
34*0Sstevel@tonic-gate    right	**
35*0Sstevel@tonic-gate    right	! ~ \ and unary + and -
36*0Sstevel@tonic-gate    left	=~ !~
37*0Sstevel@tonic-gate    left	* / % x
38*0Sstevel@tonic-gate    left	+ - .
39*0Sstevel@tonic-gate    left	<< >>
40*0Sstevel@tonic-gate    nonassoc	named unary operators
41*0Sstevel@tonic-gate    nonassoc	< > <= >= lt gt le ge
42*0Sstevel@tonic-gate    nonassoc	== != <=> eq ne cmp
43*0Sstevel@tonic-gate    left	&
44*0Sstevel@tonic-gate    left	| ^
45*0Sstevel@tonic-gate    left	&&
46*0Sstevel@tonic-gate    left	||
47*0Sstevel@tonic-gate    nonassoc	..  ...
48*0Sstevel@tonic-gate    right	?:
49*0Sstevel@tonic-gate    right	= += -= *= etc.
50*0Sstevel@tonic-gate    left	, =>
51*0Sstevel@tonic-gate    nonassoc	list operators (rightward)
52*0Sstevel@tonic-gate    right	not
53*0Sstevel@tonic-gate    left	and
54*0Sstevel@tonic-gate    left	or xor
55*0Sstevel@tonic-gate
56*0Sstevel@tonic-gateIn the following sections, these operators are covered in precedence order.
57*0Sstevel@tonic-gate
58*0Sstevel@tonic-gateMany operators can be overloaded for objects.  See L<overload>.
59*0Sstevel@tonic-gate
60*0Sstevel@tonic-gate=head2 Terms and List Operators (Leftward)
61*0Sstevel@tonic-gate
62*0Sstevel@tonic-gateA TERM has the highest precedence in Perl.  They include variables,
63*0Sstevel@tonic-gatequote and quote-like operators, any expression in parentheses,
64*0Sstevel@tonic-gateand any function whose arguments are parenthesized.  Actually, there
65*0Sstevel@tonic-gatearen't really functions in this sense, just list operators and unary
66*0Sstevel@tonic-gateoperators behaving as functions because you put parentheses around
67*0Sstevel@tonic-gatethe arguments.  These are all documented in L<perlfunc>.
68*0Sstevel@tonic-gate
69*0Sstevel@tonic-gateIf any list operator (print(), etc.) or any unary operator (chdir(), etc.)
70*0Sstevel@tonic-gateis followed by a left parenthesis as the next token, the operator and
71*0Sstevel@tonic-gatearguments within parentheses are taken to be of highest precedence,
72*0Sstevel@tonic-gatejust like a normal function call.
73*0Sstevel@tonic-gate
74*0Sstevel@tonic-gateIn the absence of parentheses, the precedence of list operators such as
75*0Sstevel@tonic-gateC<print>, C<sort>, or C<chmod> is either very high or very low depending on
76*0Sstevel@tonic-gatewhether you are looking at the left side or the right side of the operator.
77*0Sstevel@tonic-gateFor example, in
78*0Sstevel@tonic-gate
79*0Sstevel@tonic-gate    @ary = (1, 3, sort 4, 2);
80*0Sstevel@tonic-gate    print @ary;		# prints 1324
81*0Sstevel@tonic-gate
82*0Sstevel@tonic-gatethe commas on the right of the sort are evaluated before the sort,
83*0Sstevel@tonic-gatebut the commas on the left are evaluated after.  In other words,
84*0Sstevel@tonic-gatelist operators tend to gobble up all arguments that follow, and
85*0Sstevel@tonic-gatethen act like a simple TERM with regard to the preceding expression.
86*0Sstevel@tonic-gateBe careful with parentheses:
87*0Sstevel@tonic-gate
88*0Sstevel@tonic-gate    # These evaluate exit before doing the print:
89*0Sstevel@tonic-gate    print($foo, exit);	# Obviously not what you want.
90*0Sstevel@tonic-gate    print $foo, exit;	# Nor is this.
91*0Sstevel@tonic-gate
92*0Sstevel@tonic-gate    # These do the print before evaluating exit:
93*0Sstevel@tonic-gate    (print $foo), exit;	# This is what you want.
94*0Sstevel@tonic-gate    print($foo), exit;	# Or this.
95*0Sstevel@tonic-gate    print ($foo), exit;	# Or even this.
96*0Sstevel@tonic-gate
97*0Sstevel@tonic-gateAlso note that
98*0Sstevel@tonic-gate
99*0Sstevel@tonic-gate    print ($foo & 255) + 1, "\n";
100*0Sstevel@tonic-gate
101*0Sstevel@tonic-gateprobably doesn't do what you expect at first glance.  The parentheses
102*0Sstevel@tonic-gateenclose the argument list for C<print> which is evaluated (printing
103*0Sstevel@tonic-gatethe result of C<$foo & 255>).  Then one is added to the return value
104*0Sstevel@tonic-gateof C<print> (usually 1).  The result is something like this:
105*0Sstevel@tonic-gate
106*0Sstevel@tonic-gate    1 + 1, "\n";    # Obviously not what you meant.
107*0Sstevel@tonic-gate
108*0Sstevel@tonic-gateTo do what you meant properly, you must write:
109*0Sstevel@tonic-gate
110*0Sstevel@tonic-gate    print(($foo & 255) + 1, "\n");
111*0Sstevel@tonic-gate
112*0Sstevel@tonic-gateSee L<Named Unary Operators> for more discussion of this.
113*0Sstevel@tonic-gate
114*0Sstevel@tonic-gateAlso parsed as terms are the C<do {}> and C<eval {}> constructs, as
115*0Sstevel@tonic-gatewell as subroutine and method calls, and the anonymous
116*0Sstevel@tonic-gateconstructors C<[]> and C<{}>.
117*0Sstevel@tonic-gate
118*0Sstevel@tonic-gateSee also L<Quote and Quote-like Operators> toward the end of this section,
119*0Sstevel@tonic-gateas well as L<"I/O Operators">.
120*0Sstevel@tonic-gate
121*0Sstevel@tonic-gate=head2 The Arrow Operator
122*0Sstevel@tonic-gate
123*0Sstevel@tonic-gate"C<< -> >>" is an infix dereference operator, just as it is in C
124*0Sstevel@tonic-gateand C++.  If the right side is either a C<[...]>, C<{...}>, or a
125*0Sstevel@tonic-gateC<(...)> subscript, then the left side must be either a hard or
126*0Sstevel@tonic-gatesymbolic reference to an array, a hash, or a subroutine respectively.
127*0Sstevel@tonic-gate(Or technically speaking, a location capable of holding a hard
128*0Sstevel@tonic-gatereference, if it's an array or hash reference being used for
129*0Sstevel@tonic-gateassignment.)  See L<perlreftut> and L<perlref>.
130*0Sstevel@tonic-gate
131*0Sstevel@tonic-gateOtherwise, the right side is a method name or a simple scalar
132*0Sstevel@tonic-gatevariable containing either the method name or a subroutine reference,
133*0Sstevel@tonic-gateand the left side must be either an object (a blessed reference)
134*0Sstevel@tonic-gateor a class name (that is, a package name).  See L<perlobj>.
135*0Sstevel@tonic-gate
136*0Sstevel@tonic-gate=head2 Auto-increment and Auto-decrement
137*0Sstevel@tonic-gate
138*0Sstevel@tonic-gate"++" and "--" work as in C.  That is, if placed before a variable,
139*0Sstevel@tonic-gatethey increment or decrement the variable by one before returning the
140*0Sstevel@tonic-gatevalue, and if placed after, increment or decrement after returning the
141*0Sstevel@tonic-gatevalue.
142*0Sstevel@tonic-gate
143*0Sstevel@tonic-gate    $i = 0;  $j = 0;
144*0Sstevel@tonic-gate    print $i++;  # prints 0
145*0Sstevel@tonic-gate    print ++$j;  # prints 1
146*0Sstevel@tonic-gate
147*0Sstevel@tonic-gateThe auto-increment operator has a little extra builtin magic to it.  If
148*0Sstevel@tonic-gateyou increment a variable that is numeric, or that has ever been used in
149*0Sstevel@tonic-gatea numeric context, you get a normal increment.  If, however, the
150*0Sstevel@tonic-gatevariable has been used in only string contexts since it was set, and
151*0Sstevel@tonic-gatehas a value that is not the empty string and matches the pattern
152*0Sstevel@tonic-gateC</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
153*0Sstevel@tonic-gatecharacter within its range, with carry:
154*0Sstevel@tonic-gate
155*0Sstevel@tonic-gate    print ++($foo = '99');	# prints '100'
156*0Sstevel@tonic-gate    print ++($foo = 'a0');	# prints 'a1'
157*0Sstevel@tonic-gate    print ++($foo = 'Az');	# prints 'Ba'
158*0Sstevel@tonic-gate    print ++($foo = 'zz');	# prints 'aaa'
159*0Sstevel@tonic-gate
160*0Sstevel@tonic-gateC<undef> is always treated as numeric, and in particular is changed
161*0Sstevel@tonic-gateto C<0> before incrementing (so that a post-increment of an undef value
162*0Sstevel@tonic-gatewill return C<0> rather than C<undef>).
163*0Sstevel@tonic-gate
164*0Sstevel@tonic-gateThe auto-decrement operator is not magical.
165*0Sstevel@tonic-gate
166*0Sstevel@tonic-gate=head2 Exponentiation
167*0Sstevel@tonic-gate
168*0Sstevel@tonic-gateBinary "**" is the exponentiation operator.  It binds even more
169*0Sstevel@tonic-gatetightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
170*0Sstevel@tonic-gateimplemented using C's pow(3) function, which actually works on doubles
171*0Sstevel@tonic-gateinternally.)
172*0Sstevel@tonic-gate
173*0Sstevel@tonic-gate=head2 Symbolic Unary Operators
174*0Sstevel@tonic-gate
175*0Sstevel@tonic-gateUnary "!" performs logical negation, i.e., "not".  See also C<not> for a lower
176*0Sstevel@tonic-gateprecedence version of this.
177*0Sstevel@tonic-gate
178*0Sstevel@tonic-gateUnary "-" performs arithmetic negation if the operand is numeric.  If
179*0Sstevel@tonic-gatethe operand is an identifier, a string consisting of a minus sign
180*0Sstevel@tonic-gateconcatenated with the identifier is returned.  Otherwise, if the string
181*0Sstevel@tonic-gatestarts with a plus or minus, a string starting with the opposite sign
182*0Sstevel@tonic-gateis returned.  One effect of these rules is that C<-bareword> is equivalent
183*0Sstevel@tonic-gateto C<"-bareword">.
184*0Sstevel@tonic-gate
185*0Sstevel@tonic-gateUnary "~" performs bitwise negation, i.e., 1's complement.  For
186*0Sstevel@tonic-gateexample, C<0666 & ~027> is 0640.  (See also L<Integer Arithmetic> and
187*0Sstevel@tonic-gateL<Bitwise String Operators>.)  Note that the width of the result is
188*0Sstevel@tonic-gateplatform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
189*0Sstevel@tonic-gatebits wide on a 64-bit platform, so if you are expecting a certain bit
190*0Sstevel@tonic-gatewidth, remember to use the & operator to mask off the excess bits.
191*0Sstevel@tonic-gate
192*0Sstevel@tonic-gateUnary "+" has no effect whatsoever, even on strings.  It is useful
193*0Sstevel@tonic-gatesyntactically for separating a function name from a parenthesized expression
194*0Sstevel@tonic-gatethat would otherwise be interpreted as the complete list of function
195*0Sstevel@tonic-gatearguments.  (See examples above under L<Terms and List Operators (Leftward)>.)
196*0Sstevel@tonic-gate
197*0Sstevel@tonic-gateUnary "\" creates a reference to whatever follows it.  See L<perlreftut>
198*0Sstevel@tonic-gateand L<perlref>.  Do not confuse this behavior with the behavior of
199*0Sstevel@tonic-gatebackslash within a string, although both forms do convey the notion
200*0Sstevel@tonic-gateof protecting the next thing from interpolation.
201*0Sstevel@tonic-gate
202*0Sstevel@tonic-gate=head2 Binding Operators
203*0Sstevel@tonic-gate
204*0Sstevel@tonic-gateBinary "=~" binds a scalar expression to a pattern match.  Certain operations
205*0Sstevel@tonic-gatesearch or modify the string $_ by default.  This operator makes that kind
206*0Sstevel@tonic-gateof operation work on some other string.  The right argument is a search
207*0Sstevel@tonic-gatepattern, substitution, or transliteration.  The left argument is what is
208*0Sstevel@tonic-gatesupposed to be searched, substituted, or transliterated instead of the default
209*0Sstevel@tonic-gate$_.  When used in scalar context, the return value generally indicates the
210*0Sstevel@tonic-gatesuccess of the operation.  Behavior in list context depends on the particular
211*0Sstevel@tonic-gateoperator.  See L</"Regexp Quote-Like Operators"> for details.
212*0Sstevel@tonic-gate
213*0Sstevel@tonic-gateIf the right argument is an expression rather than a search pattern,
214*0Sstevel@tonic-gatesubstitution, or transliteration, it is interpreted as a search pattern at run
215*0Sstevel@tonic-gatetime.
216*0Sstevel@tonic-gate
217*0Sstevel@tonic-gateBinary "!~" is just like "=~" except the return value is negated in
218*0Sstevel@tonic-gatethe logical sense.
219*0Sstevel@tonic-gate
220*0Sstevel@tonic-gate=head2 Multiplicative Operators
221*0Sstevel@tonic-gate
222*0Sstevel@tonic-gateBinary "*" multiplies two numbers.
223*0Sstevel@tonic-gate
224*0Sstevel@tonic-gateBinary "/" divides two numbers.
225*0Sstevel@tonic-gate
226*0Sstevel@tonic-gateBinary "%" computes the modulus of two numbers.  Given integer
227*0Sstevel@tonic-gateoperands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
228*0Sstevel@tonic-gateC<$a> minus the largest multiple of C<$b> that is not greater than
229*0Sstevel@tonic-gateC<$a>.  If C<$b> is negative, then C<$a % $b> is C<$a> minus the
230*0Sstevel@tonic-gatesmallest multiple of C<$b> that is not less than C<$a> (i.e. the
231*0Sstevel@tonic-gateresult will be less than or equal to zero).
232*0Sstevel@tonic-gateNote that when C<use integer> is in scope, "%" gives you direct access
233*0Sstevel@tonic-gateto the modulus operator as implemented by your C compiler.  This
234*0Sstevel@tonic-gateoperator is not as well defined for negative operands, but it will
235*0Sstevel@tonic-gateexecute faster.
236*0Sstevel@tonic-gate
237*0Sstevel@tonic-gateBinary "x" is the repetition operator.  In scalar context or if the left
238*0Sstevel@tonic-gateoperand is not enclosed in parentheses, it returns a string consisting
239*0Sstevel@tonic-gateof the left operand repeated the number of times specified by the right
240*0Sstevel@tonic-gateoperand.  In list context, if the left operand is enclosed in
241*0Sstevel@tonic-gateparentheses, it repeats the list.  If the right operand is zero or
242*0Sstevel@tonic-gatenegative, it returns an empty string or an empty list, depending on the
243*0Sstevel@tonic-gatecontext.
244*0Sstevel@tonic-gate
245*0Sstevel@tonic-gate    print '-' x 80;		# print row of dashes
246*0Sstevel@tonic-gate
247*0Sstevel@tonic-gate    print "\t" x ($tab/8), ' ' x ($tab%8);	# tab over
248*0Sstevel@tonic-gate
249*0Sstevel@tonic-gate    @ones = (1) x 80;		# a list of 80 1's
250*0Sstevel@tonic-gate    @ones = (5) x @ones;	# set all elements to 5
251*0Sstevel@tonic-gate
252*0Sstevel@tonic-gate
253*0Sstevel@tonic-gate=head2 Additive Operators
254*0Sstevel@tonic-gate
255*0Sstevel@tonic-gateBinary "+" returns the sum of two numbers.
256*0Sstevel@tonic-gate
257*0Sstevel@tonic-gateBinary "-" returns the difference of two numbers.
258*0Sstevel@tonic-gate
259*0Sstevel@tonic-gateBinary "." concatenates two strings.
260*0Sstevel@tonic-gate
261*0Sstevel@tonic-gate=head2 Shift Operators
262*0Sstevel@tonic-gate
263*0Sstevel@tonic-gateBinary "<<" returns the value of its left argument shifted left by the
264*0Sstevel@tonic-gatenumber of bits specified by the right argument.  Arguments should be
265*0Sstevel@tonic-gateintegers.  (See also L<Integer Arithmetic>.)
266*0Sstevel@tonic-gate
267*0Sstevel@tonic-gateBinary ">>" returns the value of its left argument shifted right by
268*0Sstevel@tonic-gatethe number of bits specified by the right argument.  Arguments should
269*0Sstevel@tonic-gatebe integers.  (See also L<Integer Arithmetic>.)
270*0Sstevel@tonic-gate
271*0Sstevel@tonic-gateNote that both "<<" and ">>" in Perl are implemented directly using
272*0Sstevel@tonic-gate"<<" and ">>" in C.  If C<use integer> (see L<Integer Arithmetic>) is
273*0Sstevel@tonic-gatein force then signed C integers are used, else unsigned C integers are
274*0Sstevel@tonic-gateused.  Either way, the implementation isn't going to generate results
275*0Sstevel@tonic-gatelarger than the size of the integer type Perl was built with (32 bits
276*0Sstevel@tonic-gateor 64 bits).
277*0Sstevel@tonic-gate
278*0Sstevel@tonic-gateThe result of overflowing the range of the integers is undefined
279*0Sstevel@tonic-gatebecause it is undefined also in C.  In other words, using 32-bit
280*0Sstevel@tonic-gateintegers, C<< 1 << 32 >> is undefined.  Shifting by a negative number
281*0Sstevel@tonic-gateof bits is also undefined.
282*0Sstevel@tonic-gate
283*0Sstevel@tonic-gate=head2 Named Unary Operators
284*0Sstevel@tonic-gate
285*0Sstevel@tonic-gateThe various named unary operators are treated as functions with one
286*0Sstevel@tonic-gateargument, with optional parentheses.
287*0Sstevel@tonic-gate
288*0Sstevel@tonic-gateIf any list operator (print(), etc.) or any unary operator (chdir(), etc.)
289*0Sstevel@tonic-gateis followed by a left parenthesis as the next token, the operator and
290*0Sstevel@tonic-gatearguments within parentheses are taken to be of highest precedence,
291*0Sstevel@tonic-gatejust like a normal function call.  For example,
292*0Sstevel@tonic-gatebecause named unary operators are higher precedence than ||:
293*0Sstevel@tonic-gate
294*0Sstevel@tonic-gate    chdir $foo    || die;	# (chdir $foo) || die
295*0Sstevel@tonic-gate    chdir($foo)   || die;	# (chdir $foo) || die
296*0Sstevel@tonic-gate    chdir ($foo)  || die;	# (chdir $foo) || die
297*0Sstevel@tonic-gate    chdir +($foo) || die;	# (chdir $foo) || die
298*0Sstevel@tonic-gate
299*0Sstevel@tonic-gatebut, because * is higher precedence than named operators:
300*0Sstevel@tonic-gate
301*0Sstevel@tonic-gate    chdir $foo * 20;	# chdir ($foo * 20)
302*0Sstevel@tonic-gate    chdir($foo) * 20;	# (chdir $foo) * 20
303*0Sstevel@tonic-gate    chdir ($foo) * 20;	# (chdir $foo) * 20
304*0Sstevel@tonic-gate    chdir +($foo) * 20;	# chdir ($foo * 20)
305*0Sstevel@tonic-gate
306*0Sstevel@tonic-gate    rand 10 * 20;	# rand (10 * 20)
307*0Sstevel@tonic-gate    rand(10) * 20;	# (rand 10) * 20
308*0Sstevel@tonic-gate    rand (10) * 20;	# (rand 10) * 20
309*0Sstevel@tonic-gate    rand +(10) * 20;	# rand (10 * 20)
310*0Sstevel@tonic-gate
311*0Sstevel@tonic-gateRegarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
312*0Sstevel@tonic-gatetreated like named unary operators, but they don't follow this functional
313*0Sstevel@tonic-gateparenthesis rule.  That means, for example, that C<-f($file).".bak"> is
314*0Sstevel@tonic-gateequivalent to C<-f "$file.bak">.
315*0Sstevel@tonic-gate
316*0Sstevel@tonic-gateSee also L<"Terms and List Operators (Leftward)">.
317*0Sstevel@tonic-gate
318*0Sstevel@tonic-gate=head2 Relational Operators
319*0Sstevel@tonic-gate
320*0Sstevel@tonic-gateBinary "<" returns true if the left argument is numerically less than
321*0Sstevel@tonic-gatethe right argument.
322*0Sstevel@tonic-gate
323*0Sstevel@tonic-gateBinary ">" returns true if the left argument is numerically greater
324*0Sstevel@tonic-gatethan the right argument.
325*0Sstevel@tonic-gate
326*0Sstevel@tonic-gateBinary "<=" returns true if the left argument is numerically less than
327*0Sstevel@tonic-gateor equal to the right argument.
328*0Sstevel@tonic-gate
329*0Sstevel@tonic-gateBinary ">=" returns true if the left argument is numerically greater
330*0Sstevel@tonic-gatethan or equal to the right argument.
331*0Sstevel@tonic-gate
332*0Sstevel@tonic-gateBinary "lt" returns true if the left argument is stringwise less than
333*0Sstevel@tonic-gatethe right argument.
334*0Sstevel@tonic-gate
335*0Sstevel@tonic-gateBinary "gt" returns true if the left argument is stringwise greater
336*0Sstevel@tonic-gatethan the right argument.
337*0Sstevel@tonic-gate
338*0Sstevel@tonic-gateBinary "le" returns true if the left argument is stringwise less than
339*0Sstevel@tonic-gateor equal to the right argument.
340*0Sstevel@tonic-gate
341*0Sstevel@tonic-gateBinary "ge" returns true if the left argument is stringwise greater
342*0Sstevel@tonic-gatethan or equal to the right argument.
343*0Sstevel@tonic-gate
344*0Sstevel@tonic-gate=head2 Equality Operators
345*0Sstevel@tonic-gate
346*0Sstevel@tonic-gateBinary "==" returns true if the left argument is numerically equal to
347*0Sstevel@tonic-gatethe right argument.
348*0Sstevel@tonic-gate
349*0Sstevel@tonic-gateBinary "!=" returns true if the left argument is numerically not equal
350*0Sstevel@tonic-gateto the right argument.
351*0Sstevel@tonic-gate
352*0Sstevel@tonic-gateBinary "<=>" returns -1, 0, or 1 depending on whether the left
353*0Sstevel@tonic-gateargument is numerically less than, equal to, or greater than the right
354*0Sstevel@tonic-gateargument.  If your platform supports NaNs (not-a-numbers) as numeric
355*0Sstevel@tonic-gatevalues, using them with "<=>" returns undef.  NaN is not "<", "==", ">",
356*0Sstevel@tonic-gate"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
357*0Sstevel@tonic-gatereturns true, as does NaN != anything else. If your platform doesn't
358*0Sstevel@tonic-gatesupport NaNs then NaN is just a string with numeric value 0.
359*0Sstevel@tonic-gate
360*0Sstevel@tonic-gate    perl -le '$a = NaN; print "No NaN support here" if $a == $a'
361*0Sstevel@tonic-gate    perl -le '$a = NaN; print "NaN support here" if $a != $a'
362*0Sstevel@tonic-gate
363*0Sstevel@tonic-gateBinary "eq" returns true if the left argument is stringwise equal to
364*0Sstevel@tonic-gatethe right argument.
365*0Sstevel@tonic-gate
366*0Sstevel@tonic-gateBinary "ne" returns true if the left argument is stringwise not equal
367*0Sstevel@tonic-gateto the right argument.
368*0Sstevel@tonic-gate
369*0Sstevel@tonic-gateBinary "cmp" returns -1, 0, or 1 depending on whether the left
370*0Sstevel@tonic-gateargument is stringwise less than, equal to, or greater than the right
371*0Sstevel@tonic-gateargument.
372*0Sstevel@tonic-gate
373*0Sstevel@tonic-gate"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
374*0Sstevel@tonic-gateby the current locale if C<use locale> is in effect.  See L<perllocale>.
375*0Sstevel@tonic-gate
376*0Sstevel@tonic-gate=head2 Bitwise And
377*0Sstevel@tonic-gate
378*0Sstevel@tonic-gateBinary "&" returns its operands ANDed together bit by bit.
379*0Sstevel@tonic-gate(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
380*0Sstevel@tonic-gate
381*0Sstevel@tonic-gateNote that "&" has lower priority than relational operators, so for example
382*0Sstevel@tonic-gatethe brackets are essential in a test like
383*0Sstevel@tonic-gate
384*0Sstevel@tonic-gate	print "Even\n" if ($x & 1) == 0;
385*0Sstevel@tonic-gate
386*0Sstevel@tonic-gate=head2 Bitwise Or and Exclusive Or
387*0Sstevel@tonic-gate
388*0Sstevel@tonic-gateBinary "|" returns its operands ORed together bit by bit.
389*0Sstevel@tonic-gate(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
390*0Sstevel@tonic-gate
391*0Sstevel@tonic-gateBinary "^" returns its operands XORed together bit by bit.
392*0Sstevel@tonic-gate(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
393*0Sstevel@tonic-gate
394*0Sstevel@tonic-gateNote that "|" and "^" have lower priority than relational operators, so
395*0Sstevel@tonic-gatefor example the brackets are essential in a test like
396*0Sstevel@tonic-gate
397*0Sstevel@tonic-gate	print "false\n" if (8 | 2) != 10;
398*0Sstevel@tonic-gate
399*0Sstevel@tonic-gate=head2 C-style Logical And
400*0Sstevel@tonic-gate
401*0Sstevel@tonic-gateBinary "&&" performs a short-circuit logical AND operation.  That is,
402*0Sstevel@tonic-gateif the left operand is false, the right operand is not even evaluated.
403*0Sstevel@tonic-gateScalar or list context propagates down to the right operand if it
404*0Sstevel@tonic-gateis evaluated.
405*0Sstevel@tonic-gate
406*0Sstevel@tonic-gate=head2 C-style Logical Or
407*0Sstevel@tonic-gate
408*0Sstevel@tonic-gateBinary "||" performs a short-circuit logical OR operation.  That is,
409*0Sstevel@tonic-gateif the left operand is true, the right operand is not even evaluated.
410*0Sstevel@tonic-gateScalar or list context propagates down to the right operand if it
411*0Sstevel@tonic-gateis evaluated.
412*0Sstevel@tonic-gate
413*0Sstevel@tonic-gateThe C<||> and C<&&> operators return the last value evaluated
414*0Sstevel@tonic-gate(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
415*0Sstevel@tonic-gateportable way to find out the home directory might be:
416*0Sstevel@tonic-gate
417*0Sstevel@tonic-gate    $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
418*0Sstevel@tonic-gate	(getpwuid($<))[7] || die "You're homeless!\n";
419*0Sstevel@tonic-gate
420*0Sstevel@tonic-gateIn particular, this means that you shouldn't use this
421*0Sstevel@tonic-gatefor selecting between two aggregates for assignment:
422*0Sstevel@tonic-gate
423*0Sstevel@tonic-gate    @a = @b || @c;		# this is wrong
424*0Sstevel@tonic-gate    @a = scalar(@b) || @c;	# really meant this
425*0Sstevel@tonic-gate    @a = @b ? @b : @c;		# this works fine, though
426*0Sstevel@tonic-gate
427*0Sstevel@tonic-gateAs more readable alternatives to C<&&> and C<||> when used for
428*0Sstevel@tonic-gatecontrol flow, Perl provides C<and> and C<or> operators (see below).
429*0Sstevel@tonic-gateThe short-circuit behavior is identical.  The precedence of "and" and
430*0Sstevel@tonic-gate"or" is much lower, however, so that you can safely use them after a
431*0Sstevel@tonic-gatelist operator without the need for parentheses:
432*0Sstevel@tonic-gate
433*0Sstevel@tonic-gate    unlink "alpha", "beta", "gamma"
434*0Sstevel@tonic-gate	    or gripe(), next LINE;
435*0Sstevel@tonic-gate
436*0Sstevel@tonic-gateWith the C-style operators that would have been written like this:
437*0Sstevel@tonic-gate
438*0Sstevel@tonic-gate    unlink("alpha", "beta", "gamma")
439*0Sstevel@tonic-gate	    || (gripe(), next LINE);
440*0Sstevel@tonic-gate
441*0Sstevel@tonic-gateUsing "or" for assignment is unlikely to do what you want; see below.
442*0Sstevel@tonic-gate
443*0Sstevel@tonic-gate=head2 Range Operators
444*0Sstevel@tonic-gate
445*0Sstevel@tonic-gateBinary ".." is the range operator, which is really two different
446*0Sstevel@tonic-gateoperators depending on the context.  In list context, it returns a
447*0Sstevel@tonic-gatelist of values counting (up by ones) from the left value to the right
448*0Sstevel@tonic-gatevalue.  If the left value is greater than the right value then it
449*0Sstevel@tonic-gatereturns the empty list.  The range operator is useful for writing
450*0Sstevel@tonic-gateC<foreach (1..10)> loops and for doing slice operations on arrays. In
451*0Sstevel@tonic-gatethe current implementation, no temporary array is created when the
452*0Sstevel@tonic-gaterange operator is used as the expression in C<foreach> loops, but older
453*0Sstevel@tonic-gateversions of Perl might burn a lot of memory when you write something
454*0Sstevel@tonic-gatelike this:
455*0Sstevel@tonic-gate
456*0Sstevel@tonic-gate    for (1 .. 1_000_000) {
457*0Sstevel@tonic-gate	# code
458*0Sstevel@tonic-gate    }
459*0Sstevel@tonic-gate
460*0Sstevel@tonic-gateThe range operator also works on strings, using the magical auto-increment,
461*0Sstevel@tonic-gatesee below.
462*0Sstevel@tonic-gate
463*0Sstevel@tonic-gateIn scalar context, ".." returns a boolean value.  The operator is
464*0Sstevel@tonic-gatebistable, like a flip-flop, and emulates the line-range (comma) operator
465*0Sstevel@tonic-gateof B<sed>, B<awk>, and various editors.  Each ".." operator maintains its
466*0Sstevel@tonic-gateown boolean state.  It is false as long as its left operand is false.
467*0Sstevel@tonic-gateOnce the left operand is true, the range operator stays true until the
468*0Sstevel@tonic-gateright operand is true, I<AFTER> which the range operator becomes false
469*0Sstevel@tonic-gateagain.  It doesn't become false till the next time the range operator is
470*0Sstevel@tonic-gateevaluated.  It can test the right operand and become false on the same
471*0Sstevel@tonic-gateevaluation it became true (as in B<awk>), but it still returns true once.
472*0Sstevel@tonic-gateIf you don't want it to test the right operand till the next
473*0Sstevel@tonic-gateevaluation, as in B<sed>, just use three dots ("...") instead of
474*0Sstevel@tonic-gatetwo.  In all other regards, "..." behaves just like ".." does.
475*0Sstevel@tonic-gate
476*0Sstevel@tonic-gateThe right operand is not evaluated while the operator is in the
477*0Sstevel@tonic-gate"false" state, and the left operand is not evaluated while the
478*0Sstevel@tonic-gateoperator is in the "true" state.  The precedence is a little lower
479*0Sstevel@tonic-gatethan || and &&.  The value returned is either the empty string for
480*0Sstevel@tonic-gatefalse, or a sequence number (beginning with 1) for true.  The
481*0Sstevel@tonic-gatesequence number is reset for each range encountered.  The final
482*0Sstevel@tonic-gatesequence number in a range has the string "E0" appended to it, which
483*0Sstevel@tonic-gatedoesn't affect its numeric value, but gives you something to search
484*0Sstevel@tonic-gatefor if you want to exclude the endpoint.  You can exclude the
485*0Sstevel@tonic-gatebeginning point by waiting for the sequence number to be greater
486*0Sstevel@tonic-gatethan 1.
487*0Sstevel@tonic-gate
488*0Sstevel@tonic-gateIf either operand of scalar ".." is a constant expression,
489*0Sstevel@tonic-gatethat operand is considered true if it is equal (C<==>) to the current
490*0Sstevel@tonic-gateinput line number (the C<$.> variable).
491*0Sstevel@tonic-gate
492*0Sstevel@tonic-gateTo be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>,
493*0Sstevel@tonic-gatebut that is only an issue if you use a floating point expression; when
494*0Sstevel@tonic-gateimplicitly using C<$.> as described in the previous paragraph, the
495*0Sstevel@tonic-gatecomparison is C<int(EXPR) == int($.)> which is only an issue when C<$.>
496*0Sstevel@tonic-gateis set to a floating point value and you are not reading from a file.
497*0Sstevel@tonic-gateFurthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what
498*0Sstevel@tonic-gateyou want in scalar context because each of the operands are evaluated
499*0Sstevel@tonic-gateusing their integer representation.
500*0Sstevel@tonic-gate
501*0Sstevel@tonic-gateExamples:
502*0Sstevel@tonic-gate
503*0Sstevel@tonic-gateAs a scalar operator:
504*0Sstevel@tonic-gate
505*0Sstevel@tonic-gate    if (101 .. 200) { print; } # print 2nd hundred lines, short for
506*0Sstevel@tonic-gate                               #   if ($. == 101 .. $. == 200) ...
507*0Sstevel@tonic-gate    next line if (1 .. /^$/);  # skip header lines, short for
508*0Sstevel@tonic-gate                               #   ... if ($. == 1 .. /^$/);
509*0Sstevel@tonic-gate    s/^/> / if (/^$/ .. eof());	# quote body
510*0Sstevel@tonic-gate
511*0Sstevel@tonic-gate    # parse mail messages
512*0Sstevel@tonic-gate    while (<>) {
513*0Sstevel@tonic-gate        $in_header =   1  .. /^$/;
514*0Sstevel@tonic-gate        $in_body   = /^$/ .. eof;
515*0Sstevel@tonic-gate        if ($in_header) {
516*0Sstevel@tonic-gate            # ...
517*0Sstevel@tonic-gate        } else { # in body
518*0Sstevel@tonic-gate            # ...
519*0Sstevel@tonic-gate        }
520*0Sstevel@tonic-gate    } continue {
521*0Sstevel@tonic-gate        close ARGV if eof;             # reset $. each file
522*0Sstevel@tonic-gate    }
523*0Sstevel@tonic-gate
524*0Sstevel@tonic-gateAs a list operator:
525*0Sstevel@tonic-gate
526*0Sstevel@tonic-gate    for (101 .. 200) { print; }	# print $_ 100 times
527*0Sstevel@tonic-gate    @foo = @foo[0 .. $#foo];	# an expensive no-op
528*0Sstevel@tonic-gate    @foo = @foo[$#foo-4 .. $#foo];	# slice last 5 items
529*0Sstevel@tonic-gate
530*0Sstevel@tonic-gateThe range operator (in list context) makes use of the magical
531*0Sstevel@tonic-gateauto-increment algorithm if the operands are strings.  You
532*0Sstevel@tonic-gatecan say
533*0Sstevel@tonic-gate
534*0Sstevel@tonic-gate    @alphabet = ('A' .. 'Z');
535*0Sstevel@tonic-gate
536*0Sstevel@tonic-gateto get all normal letters of the English alphabet, or
537*0Sstevel@tonic-gate
538*0Sstevel@tonic-gate    $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
539*0Sstevel@tonic-gate
540*0Sstevel@tonic-gateto get a hexadecimal digit, or
541*0Sstevel@tonic-gate
542*0Sstevel@tonic-gate    @z2 = ('01' .. '31');  print $z2[$mday];
543*0Sstevel@tonic-gate
544*0Sstevel@tonic-gateto get dates with leading zeros.  If the final value specified is not
545*0Sstevel@tonic-gatein the sequence that the magical increment would produce, the sequence
546*0Sstevel@tonic-gategoes until the next value would be longer than the final value
547*0Sstevel@tonic-gatespecified.
548*0Sstevel@tonic-gate
549*0Sstevel@tonic-gateBecause each operand is evaluated in integer form, C<2.18 .. 3.14> will
550*0Sstevel@tonic-gatereturn two elements in list context.
551*0Sstevel@tonic-gate
552*0Sstevel@tonic-gate    @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
553*0Sstevel@tonic-gate
554*0Sstevel@tonic-gate=head2 Conditional Operator
555*0Sstevel@tonic-gate
556*0Sstevel@tonic-gateTernary "?:" is the conditional operator, just as in C.  It works much
557*0Sstevel@tonic-gatelike an if-then-else.  If the argument before the ? is true, the
558*0Sstevel@tonic-gateargument before the : is returned, otherwise the argument after the :
559*0Sstevel@tonic-gateis returned.  For example:
560*0Sstevel@tonic-gate
561*0Sstevel@tonic-gate    printf "I have %d dog%s.\n", $n,
562*0Sstevel@tonic-gate	    ($n == 1) ? '' : "s";
563*0Sstevel@tonic-gate
564*0Sstevel@tonic-gateScalar or list context propagates downward into the 2nd
565*0Sstevel@tonic-gateor 3rd argument, whichever is selected.
566*0Sstevel@tonic-gate
567*0Sstevel@tonic-gate    $a = $ok ? $b : $c;  # get a scalar
568*0Sstevel@tonic-gate    @a = $ok ? @b : @c;  # get an array
569*0Sstevel@tonic-gate    $a = $ok ? @b : @c;  # oops, that's just a count!
570*0Sstevel@tonic-gate
571*0Sstevel@tonic-gateThe operator may be assigned to if both the 2nd and 3rd arguments are
572*0Sstevel@tonic-gatelegal lvalues (meaning that you can assign to them):
573*0Sstevel@tonic-gate
574*0Sstevel@tonic-gate    ($a_or_b ? $a : $b) = $c;
575*0Sstevel@tonic-gate
576*0Sstevel@tonic-gateBecause this operator produces an assignable result, using assignments
577*0Sstevel@tonic-gatewithout parentheses will get you in trouble.  For example, this:
578*0Sstevel@tonic-gate
579*0Sstevel@tonic-gate    $a % 2 ? $a += 10 : $a += 2
580*0Sstevel@tonic-gate
581*0Sstevel@tonic-gateReally means this:
582*0Sstevel@tonic-gate
583*0Sstevel@tonic-gate    (($a % 2) ? ($a += 10) : $a) += 2
584*0Sstevel@tonic-gate
585*0Sstevel@tonic-gateRather than this:
586*0Sstevel@tonic-gate
587*0Sstevel@tonic-gate    ($a % 2) ? ($a += 10) : ($a += 2)
588*0Sstevel@tonic-gate
589*0Sstevel@tonic-gateThat should probably be written more simply as:
590*0Sstevel@tonic-gate
591*0Sstevel@tonic-gate    $a += ($a % 2) ? 10 : 2;
592*0Sstevel@tonic-gate
593*0Sstevel@tonic-gate=head2 Assignment Operators
594*0Sstevel@tonic-gate
595*0Sstevel@tonic-gate"=" is the ordinary assignment operator.
596*0Sstevel@tonic-gate
597*0Sstevel@tonic-gateAssignment operators work as in C.  That is,
598*0Sstevel@tonic-gate
599*0Sstevel@tonic-gate    $a += 2;
600*0Sstevel@tonic-gate
601*0Sstevel@tonic-gateis equivalent to
602*0Sstevel@tonic-gate
603*0Sstevel@tonic-gate    $a = $a + 2;
604*0Sstevel@tonic-gate
605*0Sstevel@tonic-gatealthough without duplicating any side effects that dereferencing the lvalue
606*0Sstevel@tonic-gatemight trigger, such as from tie().  Other assignment operators work similarly.
607*0Sstevel@tonic-gateThe following are recognized:
608*0Sstevel@tonic-gate
609*0Sstevel@tonic-gate    **=    +=    *=    &=    <<=    &&=
610*0Sstevel@tonic-gate           -=    /=    |=    >>=    ||=
611*0Sstevel@tonic-gate           .=    %=    ^=
612*0Sstevel@tonic-gate	         x=
613*0Sstevel@tonic-gate
614*0Sstevel@tonic-gateAlthough these are grouped by family, they all have the precedence
615*0Sstevel@tonic-gateof assignment.
616*0Sstevel@tonic-gate
617*0Sstevel@tonic-gateUnlike in C, the scalar assignment operator produces a valid lvalue.
618*0Sstevel@tonic-gateModifying an assignment is equivalent to doing the assignment and
619*0Sstevel@tonic-gatethen modifying the variable that was assigned to.  This is useful
620*0Sstevel@tonic-gatefor modifying a copy of something, like this:
621*0Sstevel@tonic-gate
622*0Sstevel@tonic-gate    ($tmp = $global) =~ tr [A-Z] [a-z];
623*0Sstevel@tonic-gate
624*0Sstevel@tonic-gateLikewise,
625*0Sstevel@tonic-gate
626*0Sstevel@tonic-gate    ($a += 2) *= 3;
627*0Sstevel@tonic-gate
628*0Sstevel@tonic-gateis equivalent to
629*0Sstevel@tonic-gate
630*0Sstevel@tonic-gate    $a += 2;
631*0Sstevel@tonic-gate    $a *= 3;
632*0Sstevel@tonic-gate
633*0Sstevel@tonic-gateSimilarly, a list assignment in list context produces the list of
634*0Sstevel@tonic-gatelvalues assigned to, and a list assignment in scalar context returns
635*0Sstevel@tonic-gatethe number of elements produced by the expression on the right hand
636*0Sstevel@tonic-gateside of the assignment.
637*0Sstevel@tonic-gate
638*0Sstevel@tonic-gate=head2 Comma Operator
639*0Sstevel@tonic-gate
640*0Sstevel@tonic-gateBinary "," is the comma operator.  In scalar context it evaluates
641*0Sstevel@tonic-gateits left argument, throws that value away, then evaluates its right
642*0Sstevel@tonic-gateargument and returns that value.  This is just like C's comma operator.
643*0Sstevel@tonic-gate
644*0Sstevel@tonic-gateIn list context, it's just the list argument separator, and inserts
645*0Sstevel@tonic-gateboth its arguments into the list.
646*0Sstevel@tonic-gate
647*0Sstevel@tonic-gateThe C<< => >> operator is a synonym for the comma, but forces any word
648*0Sstevel@tonic-gateto its left to be interpreted as a string (as of 5.001). It is helpful
649*0Sstevel@tonic-gatein documenting the correspondence between keys and values in hashes,
650*0Sstevel@tonic-gateand other paired elements in lists.
651*0Sstevel@tonic-gate
652*0Sstevel@tonic-gate=head2 List Operators (Rightward)
653*0Sstevel@tonic-gate
654*0Sstevel@tonic-gateOn the right side of a list operator, it has very low precedence,
655*0Sstevel@tonic-gatesuch that it controls all comma-separated expressions found there.
656*0Sstevel@tonic-gateThe only operators with lower precedence are the logical operators
657*0Sstevel@tonic-gate"and", "or", and "not", which may be used to evaluate calls to list
658*0Sstevel@tonic-gateoperators without the need for extra parentheses:
659*0Sstevel@tonic-gate
660*0Sstevel@tonic-gate    open HANDLE, "filename"
661*0Sstevel@tonic-gate	or die "Can't open: $!\n";
662*0Sstevel@tonic-gate
663*0Sstevel@tonic-gateSee also discussion of list operators in L<Terms and List Operators (Leftward)>.
664*0Sstevel@tonic-gate
665*0Sstevel@tonic-gate=head2 Logical Not
666*0Sstevel@tonic-gate
667*0Sstevel@tonic-gateUnary "not" returns the logical negation of the expression to its right.
668*0Sstevel@tonic-gateIt's the equivalent of "!" except for the very low precedence.
669*0Sstevel@tonic-gate
670*0Sstevel@tonic-gate=head2 Logical And
671*0Sstevel@tonic-gate
672*0Sstevel@tonic-gateBinary "and" returns the logical conjunction of the two surrounding
673*0Sstevel@tonic-gateexpressions.  It's equivalent to && except for the very low
674*0Sstevel@tonic-gateprecedence.  This means that it short-circuits: i.e., the right
675*0Sstevel@tonic-gateexpression is evaluated only if the left expression is true.
676*0Sstevel@tonic-gate
677*0Sstevel@tonic-gate=head2 Logical or and Exclusive Or
678*0Sstevel@tonic-gate
679*0Sstevel@tonic-gateBinary "or" returns the logical disjunction of the two surrounding
680*0Sstevel@tonic-gateexpressions.  It's equivalent to || except for the very low precedence.
681*0Sstevel@tonic-gateThis makes it useful for control flow
682*0Sstevel@tonic-gate
683*0Sstevel@tonic-gate    print FH $data		or die "Can't write to FH: $!";
684*0Sstevel@tonic-gate
685*0Sstevel@tonic-gateThis means that it short-circuits: i.e., the right expression is evaluated
686*0Sstevel@tonic-gateonly if the left expression is false.  Due to its precedence, you should
687*0Sstevel@tonic-gateprobably avoid using this for assignment, only for control flow.
688*0Sstevel@tonic-gate
689*0Sstevel@tonic-gate    $a = $b or $c;		# bug: this is wrong
690*0Sstevel@tonic-gate    ($a = $b) or $c;		# really means this
691*0Sstevel@tonic-gate    $a = $b || $c;		# better written this way
692*0Sstevel@tonic-gate
693*0Sstevel@tonic-gateHowever, when it's a list-context assignment and you're trying to use
694*0Sstevel@tonic-gate"||" for control flow, you probably need "or" so that the assignment
695*0Sstevel@tonic-gatetakes higher precedence.
696*0Sstevel@tonic-gate
697*0Sstevel@tonic-gate    @info = stat($file) || die;     # oops, scalar sense of stat!
698*0Sstevel@tonic-gate    @info = stat($file) or die;     # better, now @info gets its due
699*0Sstevel@tonic-gate
700*0Sstevel@tonic-gateThen again, you could always use parentheses.
701*0Sstevel@tonic-gate
702*0Sstevel@tonic-gateBinary "xor" returns the exclusive-OR of the two surrounding expressions.
703*0Sstevel@tonic-gateIt cannot short circuit, of course.
704*0Sstevel@tonic-gate
705*0Sstevel@tonic-gate=head2 C Operators Missing From Perl
706*0Sstevel@tonic-gate
707*0Sstevel@tonic-gateHere is what C has that Perl doesn't:
708*0Sstevel@tonic-gate
709*0Sstevel@tonic-gate=over 8
710*0Sstevel@tonic-gate
711*0Sstevel@tonic-gate=item unary &
712*0Sstevel@tonic-gate
713*0Sstevel@tonic-gateAddress-of operator.  (But see the "\" operator for taking a reference.)
714*0Sstevel@tonic-gate
715*0Sstevel@tonic-gate=item unary *
716*0Sstevel@tonic-gate
717*0Sstevel@tonic-gateDereference-address operator. (Perl's prefix dereferencing
718*0Sstevel@tonic-gateoperators are typed: $, @, %, and &.)
719*0Sstevel@tonic-gate
720*0Sstevel@tonic-gate=item (TYPE)
721*0Sstevel@tonic-gate
722*0Sstevel@tonic-gateType-casting operator.
723*0Sstevel@tonic-gate
724*0Sstevel@tonic-gate=back
725*0Sstevel@tonic-gate
726*0Sstevel@tonic-gate=head2 Quote and Quote-like Operators
727*0Sstevel@tonic-gate
728*0Sstevel@tonic-gateWhile we usually think of quotes as literal values, in Perl they
729*0Sstevel@tonic-gatefunction as operators, providing various kinds of interpolating and
730*0Sstevel@tonic-gatepattern matching capabilities.  Perl provides customary quote characters
731*0Sstevel@tonic-gatefor these behaviors, but also provides a way for you to choose your
732*0Sstevel@tonic-gatequote character for any of them.  In the following table, a C<{}> represents
733*0Sstevel@tonic-gateany pair of delimiters you choose.
734*0Sstevel@tonic-gate
735*0Sstevel@tonic-gate    Customary  Generic        Meaning	     Interpolates
736*0Sstevel@tonic-gate	''	 q{}	      Literal		  no
737*0Sstevel@tonic-gate	""	qq{}	      Literal		  yes
738*0Sstevel@tonic-gate	``	qx{}	      Command		  yes*
739*0Sstevel@tonic-gate		qw{}	     Word list		  no
740*0Sstevel@tonic-gate	//	 m{}	   Pattern match	  yes*
741*0Sstevel@tonic-gate		qr{}	      Pattern		  yes*
742*0Sstevel@tonic-gate		 s{}{}	    Substitution	  yes*
743*0Sstevel@tonic-gate		tr{}{}	  Transliteration	  no (but see below)
744*0Sstevel@tonic-gate        <<EOF                 here-doc            yes*
745*0Sstevel@tonic-gate
746*0Sstevel@tonic-gate	* unless the delimiter is ''.
747*0Sstevel@tonic-gate
748*0Sstevel@tonic-gateNon-bracketing delimiters use the same character fore and aft, but the four
749*0Sstevel@tonic-gatesorts of brackets (round, angle, square, curly) will all nest, which means
750*0Sstevel@tonic-gatethat
751*0Sstevel@tonic-gate
752*0Sstevel@tonic-gate	q{foo{bar}baz}
753*0Sstevel@tonic-gate
754*0Sstevel@tonic-gateis the same as
755*0Sstevel@tonic-gate
756*0Sstevel@tonic-gate	'foo{bar}baz'
757*0Sstevel@tonic-gate
758*0Sstevel@tonic-gateNote, however, that this does not always work for quoting Perl code:
759*0Sstevel@tonic-gate
760*0Sstevel@tonic-gate	$s = q{ if($a eq "}") ... }; # WRONG
761*0Sstevel@tonic-gate
762*0Sstevel@tonic-gateis a syntax error. The C<Text::Balanced> module (from CPAN, and
763*0Sstevel@tonic-gatestarting from Perl 5.8 part of the standard distribution) is able
764*0Sstevel@tonic-gateto do this properly.
765*0Sstevel@tonic-gate
766*0Sstevel@tonic-gateThere can be whitespace between the operator and the quoting
767*0Sstevel@tonic-gatecharacters, except when C<#> is being used as the quoting character.
768*0Sstevel@tonic-gateC<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
769*0Sstevel@tonic-gateoperator C<q> followed by a comment.  Its argument will be taken
770*0Sstevel@tonic-gatefrom the next line.  This allows you to write:
771*0Sstevel@tonic-gate
772*0Sstevel@tonic-gate    s {foo}  # Replace foo
773*0Sstevel@tonic-gate      {bar}  # with bar.
774*0Sstevel@tonic-gate
775*0Sstevel@tonic-gateThe following escape sequences are available in constructs that interpolate
776*0Sstevel@tonic-gateand in transliterations.
777*0Sstevel@tonic-gate
778*0Sstevel@tonic-gate    \t		tab             (HT, TAB)
779*0Sstevel@tonic-gate    \n		newline         (NL)
780*0Sstevel@tonic-gate    \r		return          (CR)
781*0Sstevel@tonic-gate    \f		form feed       (FF)
782*0Sstevel@tonic-gate    \b		backspace       (BS)
783*0Sstevel@tonic-gate    \a		alarm (bell)    (BEL)
784*0Sstevel@tonic-gate    \e		escape          (ESC)
785*0Sstevel@tonic-gate    \033	octal char	(ESC)
786*0Sstevel@tonic-gate    \x1b	hex char	(ESC)
787*0Sstevel@tonic-gate    \x{263a}	wide hex char	(SMILEY)
788*0Sstevel@tonic-gate    \c[		control char    (ESC)
789*0Sstevel@tonic-gate    \N{name}	named Unicode character
790*0Sstevel@tonic-gate
791*0Sstevel@tonic-gateB<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for
792*0Sstevel@tonic-gatethe vertical tab (VT - ASCII 11).
793*0Sstevel@tonic-gate
794*0Sstevel@tonic-gateThe following escape sequences are available in constructs that interpolate
795*0Sstevel@tonic-gatebut not in transliterations.
796*0Sstevel@tonic-gate
797*0Sstevel@tonic-gate    \l		lowercase next char
798*0Sstevel@tonic-gate    \u		uppercase next char
799*0Sstevel@tonic-gate    \L		lowercase till \E
800*0Sstevel@tonic-gate    \U		uppercase till \E
801*0Sstevel@tonic-gate    \E		end case modification
802*0Sstevel@tonic-gate    \Q		quote non-word characters till \E
803*0Sstevel@tonic-gate
804*0Sstevel@tonic-gateIf C<use locale> is in effect, the case map used by C<\l>, C<\L>,
805*0Sstevel@tonic-gateC<\u> and C<\U> is taken from the current locale.  See L<perllocale>.
806*0Sstevel@tonic-gateIf Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
807*0Sstevel@tonic-gatebeyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
808*0Sstevel@tonic-gateC<\U> is as defined by Unicode.  For documentation of C<\N{name}>,
809*0Sstevel@tonic-gatesee L<charnames>.
810*0Sstevel@tonic-gate
811*0Sstevel@tonic-gateAll systems use the virtual C<"\n"> to represent a line terminator,
812*0Sstevel@tonic-gatecalled a "newline".  There is no such thing as an unvarying, physical
813*0Sstevel@tonic-gatenewline character.  It is only an illusion that the operating system,
814*0Sstevel@tonic-gatedevice drivers, C libraries, and Perl all conspire to preserve.  Not all
815*0Sstevel@tonic-gatesystems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF.  For example,
816*0Sstevel@tonic-gateon a Mac, these are reversed, and on systems without line terminator,
817*0Sstevel@tonic-gateprinting C<"\n"> may emit no actual data.  In general, use C<"\n"> when
818*0Sstevel@tonic-gateyou mean a "newline" for your system, but use the literal ASCII when you
819*0Sstevel@tonic-gateneed an exact character.  For example, most networking protocols expect
820*0Sstevel@tonic-gateand prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
821*0Sstevel@tonic-gateand although they often accept just C<"\012">, they seldom tolerate just
822*0Sstevel@tonic-gateC<"\015">.  If you get in the habit of using C<"\n"> for networking,
823*0Sstevel@tonic-gateyou may be burned some day.
824*0Sstevel@tonic-gate
825*0Sstevel@tonic-gateFor constructs that do interpolate, variables beginning with "C<$>"
826*0Sstevel@tonic-gateor "C<@>" are interpolated.  Subscripted variables such as C<$a[3]> or
827*0Sstevel@tonic-gateC<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
828*0Sstevel@tonic-gateBut method calls such as C<< $obj->meth >> are not.
829*0Sstevel@tonic-gate
830*0Sstevel@tonic-gateInterpolating an array or slice interpolates the elements in order,
831*0Sstevel@tonic-gateseparated by the value of C<$">, so is equivalent to interpolating
832*0Sstevel@tonic-gateC<join $", @array>.    "Punctuation" arrays such as C<@+> are only
833*0Sstevel@tonic-gateinterpolated if the name is enclosed in braces C<@{+}>.
834*0Sstevel@tonic-gate
835*0Sstevel@tonic-gateYou cannot include a literal C<$> or C<@> within a C<\Q> sequence.
836*0Sstevel@tonic-gateAn unescaped C<$> or C<@> interpolates the corresponding variable,
837*0Sstevel@tonic-gatewhile escaping will cause the literal string C<\$> to be inserted.
838*0Sstevel@tonic-gateYou'll need to write something like C<m/\Quser\E\@\Qhost/>.
839*0Sstevel@tonic-gate
840*0Sstevel@tonic-gatePatterns are subject to an additional level of interpretation as a
841*0Sstevel@tonic-gateregular expression.  This is done as a second pass, after variables are
842*0Sstevel@tonic-gateinterpolated, so that regular expressions may be incorporated into the
843*0Sstevel@tonic-gatepattern from the variables.  If this is not what you want, use C<\Q> to
844*0Sstevel@tonic-gateinterpolate a variable literally.
845*0Sstevel@tonic-gate
846*0Sstevel@tonic-gateApart from the behavior described above, Perl does not expand
847*0Sstevel@tonic-gatemultiple levels of interpolation.  In particular, contrary to the
848*0Sstevel@tonic-gateexpectations of shell programmers, back-quotes do I<NOT> interpolate
849*0Sstevel@tonic-gatewithin double quotes, nor do single quotes impede evaluation of
850*0Sstevel@tonic-gatevariables when used within double quotes.
851*0Sstevel@tonic-gate
852*0Sstevel@tonic-gate=head2 Regexp Quote-Like Operators
853*0Sstevel@tonic-gate
854*0Sstevel@tonic-gateHere are the quote-like operators that apply to pattern
855*0Sstevel@tonic-gatematching and related activities.
856*0Sstevel@tonic-gate
857*0Sstevel@tonic-gate=over 8
858*0Sstevel@tonic-gate
859*0Sstevel@tonic-gate=item ?PATTERN?
860*0Sstevel@tonic-gate
861*0Sstevel@tonic-gateThis is just like the C</pattern/> search, except that it matches only
862*0Sstevel@tonic-gateonce between calls to the reset() operator.  This is a useful
863*0Sstevel@tonic-gateoptimization when you want to see only the first occurrence of
864*0Sstevel@tonic-gatesomething in each file of a set of files, for instance.  Only C<??>
865*0Sstevel@tonic-gatepatterns local to the current package are reset.
866*0Sstevel@tonic-gate
867*0Sstevel@tonic-gate    while (<>) {
868*0Sstevel@tonic-gate	if (?^$?) {
869*0Sstevel@tonic-gate			    # blank line between header and body
870*0Sstevel@tonic-gate	}
871*0Sstevel@tonic-gate    } continue {
872*0Sstevel@tonic-gate	reset if eof;	    # clear ?? status for next file
873*0Sstevel@tonic-gate    }
874*0Sstevel@tonic-gate
875*0Sstevel@tonic-gateThis usage is vaguely deprecated, which means it just might possibly
876*0Sstevel@tonic-gatebe removed in some distant future version of Perl, perhaps somewhere
877*0Sstevel@tonic-gatearound the year 2168.
878*0Sstevel@tonic-gate
879*0Sstevel@tonic-gate=item m/PATTERN/cgimosx
880*0Sstevel@tonic-gate
881*0Sstevel@tonic-gate=item /PATTERN/cgimosx
882*0Sstevel@tonic-gate
883*0Sstevel@tonic-gateSearches a string for a pattern match, and in scalar context returns
884*0Sstevel@tonic-gatetrue if it succeeds, false if it fails.  If no string is specified
885*0Sstevel@tonic-gatevia the C<=~> or C<!~> operator, the $_ string is searched.  (The
886*0Sstevel@tonic-gatestring specified with C<=~> need not be an lvalue--it may be the
887*0Sstevel@tonic-gateresult of an expression evaluation, but remember the C<=~> binds
888*0Sstevel@tonic-gaterather tightly.)  See also L<perlre>.  See L<perllocale> for
889*0Sstevel@tonic-gatediscussion of additional considerations that apply when C<use locale>
890*0Sstevel@tonic-gateis in effect.
891*0Sstevel@tonic-gate
892*0Sstevel@tonic-gateOptions are:
893*0Sstevel@tonic-gate
894*0Sstevel@tonic-gate    c	Do not reset search position on a failed match when /g is in effect.
895*0Sstevel@tonic-gate    g	Match globally, i.e., find all occurrences.
896*0Sstevel@tonic-gate    i	Do case-insensitive pattern matching.
897*0Sstevel@tonic-gate    m	Treat string as multiple lines.
898*0Sstevel@tonic-gate    o	Compile pattern only once.
899*0Sstevel@tonic-gate    s	Treat string as single line.
900*0Sstevel@tonic-gate    x	Use extended regular expressions.
901*0Sstevel@tonic-gate
902*0Sstevel@tonic-gateIf "/" is the delimiter then the initial C<m> is optional.  With the C<m>
903*0Sstevel@tonic-gateyou can use any pair of non-alphanumeric, non-whitespace characters
904*0Sstevel@tonic-gateas delimiters.  This is particularly useful for matching path names
905*0Sstevel@tonic-gatethat contain "/", to avoid LTS (leaning toothpick syndrome).  If "?" is
906*0Sstevel@tonic-gatethe delimiter, then the match-only-once rule of C<?PATTERN?> applies.
907*0Sstevel@tonic-gateIf "'" is the delimiter, no interpolation is performed on the PATTERN.
908*0Sstevel@tonic-gate
909*0Sstevel@tonic-gatePATTERN may contain variables, which will be interpolated (and the
910*0Sstevel@tonic-gatepattern recompiled) every time the pattern search is evaluated, except
911*0Sstevel@tonic-gatefor when the delimiter is a single quote.  (Note that C<$(>, C<$)>, and
912*0Sstevel@tonic-gateC<$|> are not interpolated because they look like end-of-string tests.)
913*0Sstevel@tonic-gateIf you want such a pattern to be compiled only once, add a C</o> after
914*0Sstevel@tonic-gatethe trailing delimiter.  This avoids expensive run-time recompilations,
915*0Sstevel@tonic-gateand is useful when the value you are interpolating won't change over
916*0Sstevel@tonic-gatethe life of the script.  However, mentioning C</o> constitutes a promise
917*0Sstevel@tonic-gatethat you won't change the variables in the pattern.  If you change them,
918*0Sstevel@tonic-gatePerl won't even notice.  See also L<"qr/STRING/imosx">.
919*0Sstevel@tonic-gate
920*0Sstevel@tonic-gateIf the PATTERN evaluates to the empty string, the last
921*0Sstevel@tonic-gateI<successfully> matched regular expression is used instead. In this
922*0Sstevel@tonic-gatecase, only the C<g> and C<c> flags on the empty pattern is honoured -
923*0Sstevel@tonic-gatethe other flags are taken from the original pattern. If no match has
924*0Sstevel@tonic-gatepreviously succeeded, this will (silently) act instead as a genuine
925*0Sstevel@tonic-gateempty pattern (which will always match).
926*0Sstevel@tonic-gate
927*0Sstevel@tonic-gateIf the C</g> option is not used, C<m//> in list context returns a
928*0Sstevel@tonic-gatelist consisting of the subexpressions matched by the parentheses in the
929*0Sstevel@tonic-gatepattern, i.e., (C<$1>, C<$2>, C<$3>...).  (Note that here C<$1> etc. are
930*0Sstevel@tonic-gatealso set, and that this differs from Perl 4's behavior.)  When there are
931*0Sstevel@tonic-gateno parentheses in the pattern, the return value is the list C<(1)> for
932*0Sstevel@tonic-gatesuccess.  With or without parentheses, an empty list is returned upon
933*0Sstevel@tonic-gatefailure.
934*0Sstevel@tonic-gate
935*0Sstevel@tonic-gateExamples:
936*0Sstevel@tonic-gate
937*0Sstevel@tonic-gate    open(TTY, '/dev/tty');
938*0Sstevel@tonic-gate    <TTY> =~ /^y/i && foo();	# do foo if desired
939*0Sstevel@tonic-gate
940*0Sstevel@tonic-gate    if (/Version: *([0-9.]*)/) { $version = $1; }
941*0Sstevel@tonic-gate
942*0Sstevel@tonic-gate    next if m#^/usr/spool/uucp#;
943*0Sstevel@tonic-gate
944*0Sstevel@tonic-gate    # poor man's grep
945*0Sstevel@tonic-gate    $arg = shift;
946*0Sstevel@tonic-gate    while (<>) {
947*0Sstevel@tonic-gate	print if /$arg/o;	# compile only once
948*0Sstevel@tonic-gate    }
949*0Sstevel@tonic-gate
950*0Sstevel@tonic-gate    if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
951*0Sstevel@tonic-gate
952*0Sstevel@tonic-gateThis last example splits $foo into the first two words and the
953*0Sstevel@tonic-gateremainder of the line, and assigns those three fields to $F1, $F2, and
954*0Sstevel@tonic-gate$Etc.  The conditional is true if any variables were assigned, i.e., if
955*0Sstevel@tonic-gatethe pattern matched.
956*0Sstevel@tonic-gate
957*0Sstevel@tonic-gateThe C</g> modifier specifies global pattern matching--that is,
958*0Sstevel@tonic-gatematching as many times as possible within the string.  How it behaves
959*0Sstevel@tonic-gatedepends on the context.  In list context, it returns a list of the
960*0Sstevel@tonic-gatesubstrings matched by any capturing parentheses in the regular
961*0Sstevel@tonic-gateexpression.  If there are no parentheses, it returns a list of all
962*0Sstevel@tonic-gatethe matched strings, as if there were parentheses around the whole
963*0Sstevel@tonic-gatepattern.
964*0Sstevel@tonic-gate
965*0Sstevel@tonic-gateIn scalar context, each execution of C<m//g> finds the next match,
966*0Sstevel@tonic-gatereturning true if it matches, and false if there is no further match.
967*0Sstevel@tonic-gateThe position after the last match can be read or set using the pos()
968*0Sstevel@tonic-gatefunction; see L<perlfunc/pos>.   A failed match normally resets the
969*0Sstevel@tonic-gatesearch position to the beginning of the string, but you can avoid that
970*0Sstevel@tonic-gateby adding the C</c> modifier (e.g. C<m//gc>).  Modifying the target
971*0Sstevel@tonic-gatestring also resets the search position.
972*0Sstevel@tonic-gate
973*0Sstevel@tonic-gateYou can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
974*0Sstevel@tonic-gatezero-width assertion that matches the exact position where the previous
975*0Sstevel@tonic-gateC<m//g>, if any, left off.  Without the C</g> modifier, the C<\G> assertion
976*0Sstevel@tonic-gatestill anchors at pos(), but the match is of course only attempted once.
977*0Sstevel@tonic-gateUsing C<\G> without C</g> on a target string that has not previously had a
978*0Sstevel@tonic-gateC</g> match applied to it is the same as using the C<\A> assertion to match
979*0Sstevel@tonic-gatethe beginning of the string.  Note also that, currently, C<\G> is only
980*0Sstevel@tonic-gateproperly supported when anchored at the very beginning of the pattern.
981*0Sstevel@tonic-gate
982*0Sstevel@tonic-gateExamples:
983*0Sstevel@tonic-gate
984*0Sstevel@tonic-gate    # list context
985*0Sstevel@tonic-gate    ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
986*0Sstevel@tonic-gate
987*0Sstevel@tonic-gate    # scalar context
988*0Sstevel@tonic-gate    $/ = "";
989*0Sstevel@tonic-gate    while (defined($paragraph = <>)) {
990*0Sstevel@tonic-gate	while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
991*0Sstevel@tonic-gate	    $sentences++;
992*0Sstevel@tonic-gate	}
993*0Sstevel@tonic-gate    }
994*0Sstevel@tonic-gate    print "$sentences\n";
995*0Sstevel@tonic-gate
996*0Sstevel@tonic-gate    # using m//gc with \G
997*0Sstevel@tonic-gate    $_ = "ppooqppqq";
998*0Sstevel@tonic-gate    while ($i++ < 2) {
999*0Sstevel@tonic-gate        print "1: '";
1000*0Sstevel@tonic-gate        print $1 while /(o)/gc; print "', pos=", pos, "\n";
1001*0Sstevel@tonic-gate        print "2: '";
1002*0Sstevel@tonic-gate        print $1 if /\G(q)/gc;  print "', pos=", pos, "\n";
1003*0Sstevel@tonic-gate        print "3: '";
1004*0Sstevel@tonic-gate        print $1 while /(p)/gc; print "', pos=", pos, "\n";
1005*0Sstevel@tonic-gate    }
1006*0Sstevel@tonic-gate    print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
1007*0Sstevel@tonic-gate
1008*0Sstevel@tonic-gateThe last example should print:
1009*0Sstevel@tonic-gate
1010*0Sstevel@tonic-gate    1: 'oo', pos=4
1011*0Sstevel@tonic-gate    2: 'q', pos=5
1012*0Sstevel@tonic-gate    3: 'pp', pos=7
1013*0Sstevel@tonic-gate    1: '', pos=7
1014*0Sstevel@tonic-gate    2: 'q', pos=8
1015*0Sstevel@tonic-gate    3: '', pos=8
1016*0Sstevel@tonic-gate    Final: 'q', pos=8
1017*0Sstevel@tonic-gate
1018*0Sstevel@tonic-gateNotice that the final match matched C<q> instead of C<p>, which a match
1019*0Sstevel@tonic-gatewithout the C<\G> anchor would have done. Also note that the final match
1020*0Sstevel@tonic-gatedid not update C<pos> -- C<pos> is only updated on a C</g> match. If the
1021*0Sstevel@tonic-gatefinal match did indeed match C<p>, it's a good bet that you're running an
1022*0Sstevel@tonic-gateolder (pre-5.6.0) Perl.
1023*0Sstevel@tonic-gate
1024*0Sstevel@tonic-gateA useful idiom for C<lex>-like scanners is C</\G.../gc>.  You can
1025*0Sstevel@tonic-gatecombine several regexps like this to process a string part-by-part,
1026*0Sstevel@tonic-gatedoing different actions depending on which regexp matched.  Each
1027*0Sstevel@tonic-gateregexp tries to match where the previous one leaves off.
1028*0Sstevel@tonic-gate
1029*0Sstevel@tonic-gate $_ = <<'EOL';
1030*0Sstevel@tonic-gate      $url = new URI::URL "http://www/";   die if $url eq "xXx";
1031*0Sstevel@tonic-gate EOL
1032*0Sstevel@tonic-gate LOOP:
1033*0Sstevel@tonic-gate    {
1034*0Sstevel@tonic-gate      print(" digits"),		redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1035*0Sstevel@tonic-gate      print(" lowercase"),	redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
1036*0Sstevel@tonic-gate      print(" UPPERCASE"),	redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
1037*0Sstevel@tonic-gate      print(" Capitalized"),	redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
1038*0Sstevel@tonic-gate      print(" MiXeD"),		redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
1039*0Sstevel@tonic-gate      print(" alphanumeric"),	redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
1040*0Sstevel@tonic-gate      print(" line-noise"),	redo LOOP if /\G[^A-Za-z0-9]+/gc;
1041*0Sstevel@tonic-gate      print ". That's all!\n";
1042*0Sstevel@tonic-gate    }
1043*0Sstevel@tonic-gate
1044*0Sstevel@tonic-gateHere is the output (split into several lines):
1045*0Sstevel@tonic-gate
1046*0Sstevel@tonic-gate line-noise lowercase line-noise lowercase UPPERCASE line-noise
1047*0Sstevel@tonic-gate UPPERCASE line-noise lowercase line-noise lowercase line-noise
1048*0Sstevel@tonic-gate lowercase lowercase line-noise lowercase lowercase line-noise
1049*0Sstevel@tonic-gate MiXeD line-noise. That's all!
1050*0Sstevel@tonic-gate
1051*0Sstevel@tonic-gate=item q/STRING/
1052*0Sstevel@tonic-gate
1053*0Sstevel@tonic-gate=item C<'STRING'>
1054*0Sstevel@tonic-gate
1055*0Sstevel@tonic-gateA single-quoted, literal string.  A backslash represents a backslash
1056*0Sstevel@tonic-gateunless followed by the delimiter or another backslash, in which case
1057*0Sstevel@tonic-gatethe delimiter or backslash is interpolated.
1058*0Sstevel@tonic-gate
1059*0Sstevel@tonic-gate    $foo = q!I said, "You said, 'She said it.'"!;
1060*0Sstevel@tonic-gate    $bar = q('This is it.');
1061*0Sstevel@tonic-gate    $baz = '\n';		# a two-character string
1062*0Sstevel@tonic-gate
1063*0Sstevel@tonic-gate=item qq/STRING/
1064*0Sstevel@tonic-gate
1065*0Sstevel@tonic-gate=item "STRING"
1066*0Sstevel@tonic-gate
1067*0Sstevel@tonic-gateA double-quoted, interpolated string.
1068*0Sstevel@tonic-gate
1069*0Sstevel@tonic-gate    $_ .= qq
1070*0Sstevel@tonic-gate     (*** The previous line contains the naughty word "$1".\n)
1071*0Sstevel@tonic-gate		if /\b(tcl|java|python)\b/i;      # :-)
1072*0Sstevel@tonic-gate    $baz = "\n";		# a one-character string
1073*0Sstevel@tonic-gate
1074*0Sstevel@tonic-gate=item qr/STRING/imosx
1075*0Sstevel@tonic-gate
1076*0Sstevel@tonic-gateThis operator quotes (and possibly compiles) its I<STRING> as a regular
1077*0Sstevel@tonic-gateexpression.  I<STRING> is interpolated the same way as I<PATTERN>
1078*0Sstevel@tonic-gatein C<m/PATTERN/>.  If "'" is used as the delimiter, no interpolation
1079*0Sstevel@tonic-gateis done.  Returns a Perl value which may be used instead of the
1080*0Sstevel@tonic-gatecorresponding C</STRING/imosx> expression.
1081*0Sstevel@tonic-gate
1082*0Sstevel@tonic-gateFor example,
1083*0Sstevel@tonic-gate
1084*0Sstevel@tonic-gate    $rex = qr/my.STRING/is;
1085*0Sstevel@tonic-gate    s/$rex/foo/;
1086*0Sstevel@tonic-gate
1087*0Sstevel@tonic-gateis equivalent to
1088*0Sstevel@tonic-gate
1089*0Sstevel@tonic-gate    s/my.STRING/foo/is;
1090*0Sstevel@tonic-gate
1091*0Sstevel@tonic-gateThe result may be used as a subpattern in a match:
1092*0Sstevel@tonic-gate
1093*0Sstevel@tonic-gate    $re = qr/$pattern/;
1094*0Sstevel@tonic-gate    $string =~ /foo${re}bar/;	# can be interpolated in other patterns
1095*0Sstevel@tonic-gate    $string =~ $re;		# or used standalone
1096*0Sstevel@tonic-gate    $string =~ /$re/;		# or this way
1097*0Sstevel@tonic-gate
1098*0Sstevel@tonic-gateSince Perl may compile the pattern at the moment of execution of qr()
1099*0Sstevel@tonic-gateoperator, using qr() may have speed advantages in some situations,
1100*0Sstevel@tonic-gatenotably if the result of qr() is used standalone:
1101*0Sstevel@tonic-gate
1102*0Sstevel@tonic-gate    sub match {
1103*0Sstevel@tonic-gate	my $patterns = shift;
1104*0Sstevel@tonic-gate	my @compiled = map qr/$_/i, @$patterns;
1105*0Sstevel@tonic-gate	grep {
1106*0Sstevel@tonic-gate	    my $success = 0;
1107*0Sstevel@tonic-gate	    foreach my $pat (@compiled) {
1108*0Sstevel@tonic-gate		$success = 1, last if /$pat/;
1109*0Sstevel@tonic-gate	    }
1110*0Sstevel@tonic-gate	    $success;
1111*0Sstevel@tonic-gate	} @_;
1112*0Sstevel@tonic-gate    }
1113*0Sstevel@tonic-gate
1114*0Sstevel@tonic-gatePrecompilation of the pattern into an internal representation at
1115*0Sstevel@tonic-gatethe moment of qr() avoids a need to recompile the pattern every
1116*0Sstevel@tonic-gatetime a match C</$pat/> is attempted.  (Perl has many other internal
1117*0Sstevel@tonic-gateoptimizations, but none would be triggered in the above example if
1118*0Sstevel@tonic-gatewe did not use qr() operator.)
1119*0Sstevel@tonic-gate
1120*0Sstevel@tonic-gateOptions are:
1121*0Sstevel@tonic-gate
1122*0Sstevel@tonic-gate    i	Do case-insensitive pattern matching.
1123*0Sstevel@tonic-gate    m	Treat string as multiple lines.
1124*0Sstevel@tonic-gate    o	Compile pattern only once.
1125*0Sstevel@tonic-gate    s	Treat string as single line.
1126*0Sstevel@tonic-gate    x	Use extended regular expressions.
1127*0Sstevel@tonic-gate
1128*0Sstevel@tonic-gateSee L<perlre> for additional information on valid syntax for STRING, and
1129*0Sstevel@tonic-gatefor a detailed look at the semantics of regular expressions.
1130*0Sstevel@tonic-gate
1131*0Sstevel@tonic-gate=item qx/STRING/
1132*0Sstevel@tonic-gate
1133*0Sstevel@tonic-gate=item `STRING`
1134*0Sstevel@tonic-gate
1135*0Sstevel@tonic-gateA string which is (possibly) interpolated and then executed as a
1136*0Sstevel@tonic-gatesystem command with C</bin/sh> or its equivalent.  Shell wildcards,
1137*0Sstevel@tonic-gatepipes, and redirections will be honored.  The collected standard
1138*0Sstevel@tonic-gateoutput of the command is returned; standard error is unaffected.  In
1139*0Sstevel@tonic-gatescalar context, it comes back as a single (potentially multi-line)
1140*0Sstevel@tonic-gatestring, or undef if the command failed.  In list context, returns a
1141*0Sstevel@tonic-gatelist of lines (however you've defined lines with $/ or
1142*0Sstevel@tonic-gate$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
1143*0Sstevel@tonic-gate
1144*0Sstevel@tonic-gateBecause backticks do not affect standard error, use shell file descriptor
1145*0Sstevel@tonic-gatesyntax (assuming the shell supports this) if you care to address this.
1146*0Sstevel@tonic-gateTo capture a command's STDERR and STDOUT together:
1147*0Sstevel@tonic-gate
1148*0Sstevel@tonic-gate    $output = `cmd 2>&1`;
1149*0Sstevel@tonic-gate
1150*0Sstevel@tonic-gateTo capture a command's STDOUT but discard its STDERR:
1151*0Sstevel@tonic-gate
1152*0Sstevel@tonic-gate    $output = `cmd 2>/dev/null`;
1153*0Sstevel@tonic-gate
1154*0Sstevel@tonic-gateTo capture a command's STDERR but discard its STDOUT (ordering is
1155*0Sstevel@tonic-gateimportant here):
1156*0Sstevel@tonic-gate
1157*0Sstevel@tonic-gate    $output = `cmd 2>&1 1>/dev/null`;
1158*0Sstevel@tonic-gate
1159*0Sstevel@tonic-gateTo exchange a command's STDOUT and STDERR in order to capture the STDERR
1160*0Sstevel@tonic-gatebut leave its STDOUT to come out the old STDERR:
1161*0Sstevel@tonic-gate
1162*0Sstevel@tonic-gate    $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1163*0Sstevel@tonic-gate
1164*0Sstevel@tonic-gateTo read both a command's STDOUT and its STDERR separately, it's easiest
1165*0Sstevel@tonic-gateto redirect them separately to files, and then read from those files
1166*0Sstevel@tonic-gatewhen the program is done:
1167*0Sstevel@tonic-gate
1168*0Sstevel@tonic-gate    system("program args 1>program.stdout 2>program.stderr");
1169*0Sstevel@tonic-gate
1170*0Sstevel@tonic-gateUsing single-quote as a delimiter protects the command from Perl's
1171*0Sstevel@tonic-gatedouble-quote interpolation, passing it on to the shell instead:
1172*0Sstevel@tonic-gate
1173*0Sstevel@tonic-gate    $perl_info  = qx(ps $$);            # that's Perl's $$
1174*0Sstevel@tonic-gate    $shell_info = qx'ps $$';            # that's the new shell's $$
1175*0Sstevel@tonic-gate
1176*0Sstevel@tonic-gateHow that string gets evaluated is entirely subject to the command
1177*0Sstevel@tonic-gateinterpreter on your system.  On most platforms, you will have to protect
1178*0Sstevel@tonic-gateshell metacharacters if you want them treated literally.  This is in
1179*0Sstevel@tonic-gatepractice difficult to do, as it's unclear how to escape which characters.
1180*0Sstevel@tonic-gateSee L<perlsec> for a clean and safe example of a manual fork() and exec()
1181*0Sstevel@tonic-gateto emulate backticks safely.
1182*0Sstevel@tonic-gate
1183*0Sstevel@tonic-gateOn some platforms (notably DOS-like ones), the shell may not be
1184*0Sstevel@tonic-gatecapable of dealing with multiline commands, so putting newlines in
1185*0Sstevel@tonic-gatethe string may not get you what you want.  You may be able to evaluate
1186*0Sstevel@tonic-gatemultiple commands in a single line by separating them with the command
1187*0Sstevel@tonic-gateseparator character, if your shell supports that (e.g. C<;> on many Unix
1188*0Sstevel@tonic-gateshells; C<&> on the Windows NT C<cmd> shell).
1189*0Sstevel@tonic-gate
1190*0Sstevel@tonic-gateBeginning with v5.6.0, Perl will attempt to flush all files opened for
1191*0Sstevel@tonic-gateoutput before starting the child process, but this may not be supported
1192*0Sstevel@tonic-gateon some platforms (see L<perlport>).  To be safe, you may need to set
1193*0Sstevel@tonic-gateC<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
1194*0Sstevel@tonic-gateC<IO::Handle> on any open handles.
1195*0Sstevel@tonic-gate
1196*0Sstevel@tonic-gateBeware that some command shells may place restrictions on the length
1197*0Sstevel@tonic-gateof the command line.  You must ensure your strings don't exceed this
1198*0Sstevel@tonic-gatelimit after any necessary interpolations.  See the platform-specific
1199*0Sstevel@tonic-gaterelease notes for more details about your particular environment.
1200*0Sstevel@tonic-gate
1201*0Sstevel@tonic-gateUsing this operator can lead to programs that are difficult to port,
1202*0Sstevel@tonic-gatebecause the shell commands called vary between systems, and may in
1203*0Sstevel@tonic-gatefact not be present at all.  As one example, the C<type> command under
1204*0Sstevel@tonic-gatethe POSIX shell is very different from the C<type> command under DOS.
1205*0Sstevel@tonic-gateThat doesn't mean you should go out of your way to avoid backticks
1206*0Sstevel@tonic-gatewhen they're the right way to get something done.  Perl was made to be
1207*0Sstevel@tonic-gatea glue language, and one of the things it glues together is commands.
1208*0Sstevel@tonic-gateJust understand what you're getting yourself into.
1209*0Sstevel@tonic-gate
1210*0Sstevel@tonic-gateSee L<"I/O Operators"> for more discussion.
1211*0Sstevel@tonic-gate
1212*0Sstevel@tonic-gate=item qw/STRING/
1213*0Sstevel@tonic-gate
1214*0Sstevel@tonic-gateEvaluates to a list of the words extracted out of STRING, using embedded
1215*0Sstevel@tonic-gatewhitespace as the word delimiters.  It can be understood as being roughly
1216*0Sstevel@tonic-gateequivalent to:
1217*0Sstevel@tonic-gate
1218*0Sstevel@tonic-gate    split(' ', q/STRING/);
1219*0Sstevel@tonic-gate
1220*0Sstevel@tonic-gatethe differences being that it generates a real list at compile time, and
1221*0Sstevel@tonic-gatein scalar context it returns the last element in the list.  So
1222*0Sstevel@tonic-gatethis expression:
1223*0Sstevel@tonic-gate
1224*0Sstevel@tonic-gate    qw(foo bar baz)
1225*0Sstevel@tonic-gate
1226*0Sstevel@tonic-gateis semantically equivalent to the list:
1227*0Sstevel@tonic-gate
1228*0Sstevel@tonic-gate    'foo', 'bar', 'baz'
1229*0Sstevel@tonic-gate
1230*0Sstevel@tonic-gateSome frequently seen examples:
1231*0Sstevel@tonic-gate
1232*0Sstevel@tonic-gate    use POSIX qw( setlocale localeconv )
1233*0Sstevel@tonic-gate    @EXPORT = qw( foo bar baz );
1234*0Sstevel@tonic-gate
1235*0Sstevel@tonic-gateA common mistake is to try to separate the words with comma or to
1236*0Sstevel@tonic-gateput comments into a multi-line C<qw>-string.  For this reason, the
1237*0Sstevel@tonic-gateC<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
1238*0Sstevel@tonic-gateproduces warnings if the STRING contains the "," or the "#" character.
1239*0Sstevel@tonic-gate
1240*0Sstevel@tonic-gate=item s/PATTERN/REPLACEMENT/egimosx
1241*0Sstevel@tonic-gate
1242*0Sstevel@tonic-gateSearches a string for a pattern, and if found, replaces that pattern
1243*0Sstevel@tonic-gatewith the replacement text and returns the number of substitutions
1244*0Sstevel@tonic-gatemade.  Otherwise it returns false (specifically, the empty string).
1245*0Sstevel@tonic-gate
1246*0Sstevel@tonic-gateIf no string is specified via the C<=~> or C<!~> operator, the C<$_>
1247*0Sstevel@tonic-gatevariable is searched and modified.  (The string specified with C<=~> must
1248*0Sstevel@tonic-gatebe scalar variable, an array element, a hash element, or an assignment
1249*0Sstevel@tonic-gateto one of those, i.e., an lvalue.)
1250*0Sstevel@tonic-gate
1251*0Sstevel@tonic-gateIf the delimiter chosen is a single quote, no interpolation is
1252*0Sstevel@tonic-gatedone on either the PATTERN or the REPLACEMENT.  Otherwise, if the
1253*0Sstevel@tonic-gatePATTERN contains a $ that looks like a variable rather than an
1254*0Sstevel@tonic-gateend-of-string test, the variable will be interpolated into the pattern
1255*0Sstevel@tonic-gateat run-time.  If you want the pattern compiled only once the first time
1256*0Sstevel@tonic-gatethe variable is interpolated, use the C</o> option.  If the pattern
1257*0Sstevel@tonic-gateevaluates to the empty string, the last successfully executed regular
1258*0Sstevel@tonic-gateexpression is used instead.  See L<perlre> for further explanation on these.
1259*0Sstevel@tonic-gateSee L<perllocale> for discussion of additional considerations that apply
1260*0Sstevel@tonic-gatewhen C<use locale> is in effect.
1261*0Sstevel@tonic-gate
1262*0Sstevel@tonic-gateOptions are:
1263*0Sstevel@tonic-gate
1264*0Sstevel@tonic-gate    e	Evaluate the right side as an expression.
1265*0Sstevel@tonic-gate    g	Replace globally, i.e., all occurrences.
1266*0Sstevel@tonic-gate    i	Do case-insensitive pattern matching.
1267*0Sstevel@tonic-gate    m	Treat string as multiple lines.
1268*0Sstevel@tonic-gate    o	Compile pattern only once.
1269*0Sstevel@tonic-gate    s	Treat string as single line.
1270*0Sstevel@tonic-gate    x	Use extended regular expressions.
1271*0Sstevel@tonic-gate
1272*0Sstevel@tonic-gateAny non-alphanumeric, non-whitespace delimiter may replace the
1273*0Sstevel@tonic-gateslashes.  If single quotes are used, no interpretation is done on the
1274*0Sstevel@tonic-gatereplacement string (the C</e> modifier overrides this, however).  Unlike
1275*0Sstevel@tonic-gatePerl 4, Perl 5 treats backticks as normal delimiters; the replacement
1276*0Sstevel@tonic-gatetext is not evaluated as a command.  If the
1277*0Sstevel@tonic-gatePATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
1278*0Sstevel@tonic-gatepair of quotes, which may or may not be bracketing quotes, e.g.,
1279*0Sstevel@tonic-gateC<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
1280*0Sstevel@tonic-gatereplacement portion to be treated as a full-fledged Perl expression
1281*0Sstevel@tonic-gateand evaluated right then and there.  It is, however, syntax checked at
1282*0Sstevel@tonic-gatecompile-time. A second C<e> modifier will cause the replacement portion
1283*0Sstevel@tonic-gateto be C<eval>ed before being run as a Perl expression.
1284*0Sstevel@tonic-gate
1285*0Sstevel@tonic-gateExamples:
1286*0Sstevel@tonic-gate
1287*0Sstevel@tonic-gate    s/\bgreen\b/mauve/g;		# don't change wintergreen
1288*0Sstevel@tonic-gate
1289*0Sstevel@tonic-gate    $path =~ s|/usr/bin|/usr/local/bin|;
1290*0Sstevel@tonic-gate
1291*0Sstevel@tonic-gate    s/Login: $foo/Login: $bar/; # run-time pattern
1292*0Sstevel@tonic-gate
1293*0Sstevel@tonic-gate    ($foo = $bar) =~ s/this/that/;	# copy first, then change
1294*0Sstevel@tonic-gate
1295*0Sstevel@tonic-gate    $count = ($paragraph =~ s/Mister\b/Mr./g);  # get change-count
1296*0Sstevel@tonic-gate
1297*0Sstevel@tonic-gate    $_ = 'abc123xyz';
1298*0Sstevel@tonic-gate    s/\d+/$&*2/e;		# yields 'abc246xyz'
1299*0Sstevel@tonic-gate    s/\d+/sprintf("%5d",$&)/e;	# yields 'abc  246xyz'
1300*0Sstevel@tonic-gate    s/\w/$& x 2/eg;		# yields 'aabbcc  224466xxyyzz'
1301*0Sstevel@tonic-gate
1302*0Sstevel@tonic-gate    s/%(.)/$percent{$1}/g;	# change percent escapes; no /e
1303*0Sstevel@tonic-gate    s/%(.)/$percent{$1} || $&/ge;	# expr now, so /e
1304*0Sstevel@tonic-gate    s/^=(\w+)/&pod($1)/ge;	# use function call
1305*0Sstevel@tonic-gate
1306*0Sstevel@tonic-gate    # expand variables in $_, but dynamics only, using
1307*0Sstevel@tonic-gate    # symbolic dereferencing
1308*0Sstevel@tonic-gate    s/\$(\w+)/${$1}/g;
1309*0Sstevel@tonic-gate
1310*0Sstevel@tonic-gate    # Add one to the value of any numbers in the string
1311*0Sstevel@tonic-gate    s/(\d+)/1 + $1/eg;
1312*0Sstevel@tonic-gate
1313*0Sstevel@tonic-gate    # This will expand any embedded scalar variable
1314*0Sstevel@tonic-gate    # (including lexicals) in $_ : First $1 is interpolated
1315*0Sstevel@tonic-gate    # to the variable name, and then evaluated
1316*0Sstevel@tonic-gate    s/(\$\w+)/$1/eeg;
1317*0Sstevel@tonic-gate
1318*0Sstevel@tonic-gate    # Delete (most) C comments.
1319*0Sstevel@tonic-gate    $program =~ s {
1320*0Sstevel@tonic-gate	/\*	# Match the opening delimiter.
1321*0Sstevel@tonic-gate	.*?	# Match a minimal number of characters.
1322*0Sstevel@tonic-gate	\*/	# Match the closing delimiter.
1323*0Sstevel@tonic-gate    } []gsx;
1324*0Sstevel@tonic-gate
1325*0Sstevel@tonic-gate    s/^\s*(.*?)\s*$/$1/;	# trim white space in $_, expensively
1326*0Sstevel@tonic-gate
1327*0Sstevel@tonic-gate    for ($variable) {		# trim white space in $variable, cheap
1328*0Sstevel@tonic-gate	s/^\s+//;
1329*0Sstevel@tonic-gate	s/\s+$//;
1330*0Sstevel@tonic-gate    }
1331*0Sstevel@tonic-gate
1332*0Sstevel@tonic-gate    s/([^ ]*) *([^ ]*)/$2 $1/;	# reverse 1st two fields
1333*0Sstevel@tonic-gate
1334*0Sstevel@tonic-gateNote the use of $ instead of \ in the last example.  Unlike
1335*0Sstevel@tonic-gateB<sed>, we use the \<I<digit>> form in only the left hand side.
1336*0Sstevel@tonic-gateAnywhere else it's $<I<digit>>.
1337*0Sstevel@tonic-gate
1338*0Sstevel@tonic-gateOccasionally, you can't use just a C</g> to get all the changes
1339*0Sstevel@tonic-gateto occur that you might want.  Here are two common cases:
1340*0Sstevel@tonic-gate
1341*0Sstevel@tonic-gate    # put commas in the right places in an integer
1342*0Sstevel@tonic-gate    1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
1343*0Sstevel@tonic-gate
1344*0Sstevel@tonic-gate    # expand tabs to 8-column spacing
1345*0Sstevel@tonic-gate    1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1346*0Sstevel@tonic-gate
1347*0Sstevel@tonic-gate=item tr/SEARCHLIST/REPLACEMENTLIST/cds
1348*0Sstevel@tonic-gate
1349*0Sstevel@tonic-gate=item y/SEARCHLIST/REPLACEMENTLIST/cds
1350*0Sstevel@tonic-gate
1351*0Sstevel@tonic-gateTransliterates all occurrences of the characters found in the search list
1352*0Sstevel@tonic-gatewith the corresponding character in the replacement list.  It returns
1353*0Sstevel@tonic-gatethe number of characters replaced or deleted.  If no string is
1354*0Sstevel@tonic-gatespecified via the =~ or !~ operator, the $_ string is transliterated.  (The
1355*0Sstevel@tonic-gatestring specified with =~ must be a scalar variable, an array element, a
1356*0Sstevel@tonic-gatehash element, or an assignment to one of those, i.e., an lvalue.)
1357*0Sstevel@tonic-gate
1358*0Sstevel@tonic-gateA character range may be specified with a hyphen, so C<tr/A-J/0-9/>
1359*0Sstevel@tonic-gatedoes the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
1360*0Sstevel@tonic-gateFor B<sed> devotees, C<y> is provided as a synonym for C<tr>.  If the
1361*0Sstevel@tonic-gateSEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
1362*0Sstevel@tonic-gateits own pair of quotes, which may or may not be bracketing quotes,
1363*0Sstevel@tonic-gatee.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
1364*0Sstevel@tonic-gate
1365*0Sstevel@tonic-gateNote that C<tr> does B<not> do regular expression character classes
1366*0Sstevel@tonic-gatesuch as C<\d> or C<[:lower:]>.  The <tr> operator is not equivalent to
1367*0Sstevel@tonic-gatethe tr(1) utility.  If you want to map strings between lower/upper
1368*0Sstevel@tonic-gatecases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
1369*0Sstevel@tonic-gateusing the C<s> operator if you need regular expressions.
1370*0Sstevel@tonic-gate
1371*0Sstevel@tonic-gateNote also that the whole range idea is rather unportable between
1372*0Sstevel@tonic-gatecharacter sets--and even within character sets they may cause results
1373*0Sstevel@tonic-gateyou probably didn't expect.  A sound principle is to use only ranges
1374*0Sstevel@tonic-gatethat begin from and end at either alphabets of equal case (a-e, A-E),
1375*0Sstevel@tonic-gateor digits (0-4).  Anything else is unsafe.  If in doubt, spell out the
1376*0Sstevel@tonic-gatecharacter sets in full.
1377*0Sstevel@tonic-gate
1378*0Sstevel@tonic-gateOptions:
1379*0Sstevel@tonic-gate
1380*0Sstevel@tonic-gate    c	Complement the SEARCHLIST.
1381*0Sstevel@tonic-gate    d	Delete found but unreplaced characters.
1382*0Sstevel@tonic-gate    s	Squash duplicate replaced characters.
1383*0Sstevel@tonic-gate
1384*0Sstevel@tonic-gateIf the C</c> modifier is specified, the SEARCHLIST character set
1385*0Sstevel@tonic-gateis complemented.  If the C</d> modifier is specified, any characters
1386*0Sstevel@tonic-gatespecified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
1387*0Sstevel@tonic-gate(Note that this is slightly more flexible than the behavior of some
1388*0Sstevel@tonic-gateB<tr> programs, which delete anything they find in the SEARCHLIST,
1389*0Sstevel@tonic-gateperiod.) If the C</s> modifier is specified, sequences of characters
1390*0Sstevel@tonic-gatethat were transliterated to the same character are squashed down
1391*0Sstevel@tonic-gateto a single instance of the character.
1392*0Sstevel@tonic-gate
1393*0Sstevel@tonic-gateIf the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
1394*0Sstevel@tonic-gateexactly as specified.  Otherwise, if the REPLACEMENTLIST is shorter
1395*0Sstevel@tonic-gatethan the SEARCHLIST, the final character is replicated till it is long
1396*0Sstevel@tonic-gateenough.  If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
1397*0Sstevel@tonic-gateThis latter is useful for counting characters in a class or for
1398*0Sstevel@tonic-gatesquashing character sequences in a class.
1399*0Sstevel@tonic-gate
1400*0Sstevel@tonic-gateExamples:
1401*0Sstevel@tonic-gate
1402*0Sstevel@tonic-gate    $ARGV[1] =~ tr/A-Z/a-z/;	# canonicalize to lower case
1403*0Sstevel@tonic-gate
1404*0Sstevel@tonic-gate    $cnt = tr/*/*/;		# count the stars in $_
1405*0Sstevel@tonic-gate
1406*0Sstevel@tonic-gate    $cnt = $sky =~ tr/*/*/;	# count the stars in $sky
1407*0Sstevel@tonic-gate
1408*0Sstevel@tonic-gate    $cnt = tr/0-9//;		# count the digits in $_
1409*0Sstevel@tonic-gate
1410*0Sstevel@tonic-gate    tr/a-zA-Z//s;		# bookkeeper -> bokeper
1411*0Sstevel@tonic-gate
1412*0Sstevel@tonic-gate    ($HOST = $host) =~ tr/a-z/A-Z/;
1413*0Sstevel@tonic-gate
1414*0Sstevel@tonic-gate    tr/a-zA-Z/ /cs;		# change non-alphas to single space
1415*0Sstevel@tonic-gate
1416*0Sstevel@tonic-gate    tr [\200-\377]
1417*0Sstevel@tonic-gate       [\000-\177];		# delete 8th bit
1418*0Sstevel@tonic-gate
1419*0Sstevel@tonic-gateIf multiple transliterations are given for a character, only the
1420*0Sstevel@tonic-gatefirst one is used:
1421*0Sstevel@tonic-gate
1422*0Sstevel@tonic-gate    tr/AAA/XYZ/
1423*0Sstevel@tonic-gate
1424*0Sstevel@tonic-gatewill transliterate any A to X.
1425*0Sstevel@tonic-gate
1426*0Sstevel@tonic-gateBecause the transliteration table is built at compile time, neither
1427*0Sstevel@tonic-gatethe SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
1428*0Sstevel@tonic-gateinterpolation.  That means that if you want to use variables, you
1429*0Sstevel@tonic-gatemust use an eval():
1430*0Sstevel@tonic-gate
1431*0Sstevel@tonic-gate    eval "tr/$oldlist/$newlist/";
1432*0Sstevel@tonic-gate    die $@ if $@;
1433*0Sstevel@tonic-gate
1434*0Sstevel@tonic-gate    eval "tr/$oldlist/$newlist/, 1" or die $@;
1435*0Sstevel@tonic-gate
1436*0Sstevel@tonic-gate=item <<EOF
1437*0Sstevel@tonic-gate
1438*0Sstevel@tonic-gateA line-oriented form of quoting is based on the shell "here-document"
1439*0Sstevel@tonic-gatesyntax.  Following a C<< << >> you specify a string to terminate
1440*0Sstevel@tonic-gatethe quoted material, and all lines following the current line down to
1441*0Sstevel@tonic-gatethe terminating string are the value of the item.  The terminating
1442*0Sstevel@tonic-gatestring may be either an identifier (a word), or some quoted text.  If
1443*0Sstevel@tonic-gatequoted, the type of quotes you use determines the treatment of the
1444*0Sstevel@tonic-gatetext, just as in regular quoting.  An unquoted identifier works like
1445*0Sstevel@tonic-gatedouble quotes.  There must be no space between the C<< << >> and
1446*0Sstevel@tonic-gatethe identifier, unless the identifier is quoted.  (If you put a space it
1447*0Sstevel@tonic-gatewill be treated as a null identifier, which is valid, and matches the first
1448*0Sstevel@tonic-gateempty line.)  The terminating string must appear by itself (unquoted and
1449*0Sstevel@tonic-gatewith no surrounding whitespace) on the terminating line.
1450*0Sstevel@tonic-gate
1451*0Sstevel@tonic-gate       print <<EOF;
1452*0Sstevel@tonic-gate    The price is $Price.
1453*0Sstevel@tonic-gate    EOF
1454*0Sstevel@tonic-gate
1455*0Sstevel@tonic-gate       print << "EOF"; # same as above
1456*0Sstevel@tonic-gate    The price is $Price.
1457*0Sstevel@tonic-gate    EOF
1458*0Sstevel@tonic-gate
1459*0Sstevel@tonic-gate       print << `EOC`; # execute commands
1460*0Sstevel@tonic-gate    echo hi there
1461*0Sstevel@tonic-gate    echo lo there
1462*0Sstevel@tonic-gate    EOC
1463*0Sstevel@tonic-gate
1464*0Sstevel@tonic-gate       print <<"foo", <<"bar"; # you can stack them
1465*0Sstevel@tonic-gate    I said foo.
1466*0Sstevel@tonic-gate    foo
1467*0Sstevel@tonic-gate    I said bar.
1468*0Sstevel@tonic-gate    bar
1469*0Sstevel@tonic-gate
1470*0Sstevel@tonic-gate       myfunc(<< "THIS", 23, <<'THAT');
1471*0Sstevel@tonic-gate    Here's a line
1472*0Sstevel@tonic-gate    or two.
1473*0Sstevel@tonic-gate    THIS
1474*0Sstevel@tonic-gate    and here's another.
1475*0Sstevel@tonic-gate    THAT
1476*0Sstevel@tonic-gate
1477*0Sstevel@tonic-gateJust don't forget that you have to put a semicolon on the end
1478*0Sstevel@tonic-gateto finish the statement, as Perl doesn't know you're not going to
1479*0Sstevel@tonic-gatetry to do this:
1480*0Sstevel@tonic-gate
1481*0Sstevel@tonic-gate       print <<ABC
1482*0Sstevel@tonic-gate    179231
1483*0Sstevel@tonic-gate    ABC
1484*0Sstevel@tonic-gate       + 20;
1485*0Sstevel@tonic-gate
1486*0Sstevel@tonic-gateIf you want your here-docs to be indented with the
1487*0Sstevel@tonic-gaterest of the code, you'll need to remove leading whitespace
1488*0Sstevel@tonic-gatefrom each line manually:
1489*0Sstevel@tonic-gate
1490*0Sstevel@tonic-gate    ($quote = <<'FINIS') =~ s/^\s+//gm;
1491*0Sstevel@tonic-gate       The Road goes ever on and on,
1492*0Sstevel@tonic-gate       down from the door where it began.
1493*0Sstevel@tonic-gate    FINIS
1494*0Sstevel@tonic-gate
1495*0Sstevel@tonic-gateIf you use a here-doc within a delimited construct, such as in C<s///eg>,
1496*0Sstevel@tonic-gatethe quoted material must come on the lines following the final delimiter.
1497*0Sstevel@tonic-gateSo instead of
1498*0Sstevel@tonic-gate
1499*0Sstevel@tonic-gate    s/this/<<E . 'that'
1500*0Sstevel@tonic-gate    the other
1501*0Sstevel@tonic-gate    E
1502*0Sstevel@tonic-gate     . 'more '/eg;
1503*0Sstevel@tonic-gate
1504*0Sstevel@tonic-gateyou have to write
1505*0Sstevel@tonic-gate
1506*0Sstevel@tonic-gate    s/this/<<E . 'that'
1507*0Sstevel@tonic-gate     . 'more '/eg;
1508*0Sstevel@tonic-gate    the other
1509*0Sstevel@tonic-gate    E
1510*0Sstevel@tonic-gate
1511*0Sstevel@tonic-gateIf the terminating identifier is on the last line of the program, you
1512*0Sstevel@tonic-gatemust be sure there is a newline after it; otherwise, Perl will give the
1513*0Sstevel@tonic-gatewarning B<Can't find string terminator "END" anywhere before EOF...>.
1514*0Sstevel@tonic-gate
1515*0Sstevel@tonic-gateAdditionally, the quoting rules for the identifier are not related to
1516*0Sstevel@tonic-gatePerl's quoting rules -- C<q()>, C<qq()>, and the like are not supported
1517*0Sstevel@tonic-gatein place of C<''> and C<"">, and the only interpolation is for backslashing
1518*0Sstevel@tonic-gatethe quoting character:
1519*0Sstevel@tonic-gate
1520*0Sstevel@tonic-gate    print << "abc\"def";
1521*0Sstevel@tonic-gate    testing...
1522*0Sstevel@tonic-gate    abc"def
1523*0Sstevel@tonic-gate
1524*0Sstevel@tonic-gateFinally, quoted strings cannot span multiple lines.  The general rule is
1525*0Sstevel@tonic-gatethat the identifier must be a string literal.  Stick with that, and you
1526*0Sstevel@tonic-gateshould be safe.
1527*0Sstevel@tonic-gate
1528*0Sstevel@tonic-gate=back
1529*0Sstevel@tonic-gate
1530*0Sstevel@tonic-gate=head2 Gory details of parsing quoted constructs
1531*0Sstevel@tonic-gate
1532*0Sstevel@tonic-gateWhen presented with something that might have several different
1533*0Sstevel@tonic-gateinterpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
1534*0Sstevel@tonic-gateprinciple to pick the most probable interpretation.  This strategy
1535*0Sstevel@tonic-gateis so successful that Perl programmers often do not suspect the
1536*0Sstevel@tonic-gateambivalence of what they write.  But from time to time, Perl's
1537*0Sstevel@tonic-gatenotions differ substantially from what the author honestly meant.
1538*0Sstevel@tonic-gate
1539*0Sstevel@tonic-gateThis section hopes to clarify how Perl handles quoted constructs.
1540*0Sstevel@tonic-gateAlthough the most common reason to learn this is to unravel labyrinthine
1541*0Sstevel@tonic-gateregular expressions, because the initial steps of parsing are the
1542*0Sstevel@tonic-gatesame for all quoting operators, they are all discussed together.
1543*0Sstevel@tonic-gate
1544*0Sstevel@tonic-gateThe most important Perl parsing rule is the first one discussed
1545*0Sstevel@tonic-gatebelow: when processing a quoted construct, Perl first finds the end
1546*0Sstevel@tonic-gateof that construct, then interprets its contents.  If you understand
1547*0Sstevel@tonic-gatethis rule, you may skip the rest of this section on the first
1548*0Sstevel@tonic-gatereading.  The other rules are likely to contradict the user's
1549*0Sstevel@tonic-gateexpectations much less frequently than this first one.
1550*0Sstevel@tonic-gate
1551*0Sstevel@tonic-gateSome passes discussed below are performed concurrently, but because
1552*0Sstevel@tonic-gatetheir results are the same, we consider them individually.  For different
1553*0Sstevel@tonic-gatequoting constructs, Perl performs different numbers of passes, from
1554*0Sstevel@tonic-gateone to five, but these passes are always performed in the same order.
1555*0Sstevel@tonic-gate
1556*0Sstevel@tonic-gate=over 4
1557*0Sstevel@tonic-gate
1558*0Sstevel@tonic-gate=item Finding the end
1559*0Sstevel@tonic-gate
1560*0Sstevel@tonic-gateThe first pass is finding the end of the quoted construct, whether
1561*0Sstevel@tonic-gateit be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
1562*0Sstevel@tonic-gateconstruct, a C</> that terminates a C<qq//> construct, a C<]> which
1563*0Sstevel@tonic-gateterminates C<qq[]> construct, or a C<< > >> which terminates a
1564*0Sstevel@tonic-gatefileglob started with C<< < >>.
1565*0Sstevel@tonic-gate
1566*0Sstevel@tonic-gateWhen searching for single-character non-pairing delimiters, such
1567*0Sstevel@tonic-gateas C</>, combinations of C<\\> and C<\/> are skipped.  However,
1568*0Sstevel@tonic-gatewhen searching for single-character pairing delimiter like C<[>,
1569*0Sstevel@tonic-gatecombinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
1570*0Sstevel@tonic-gateC<[>, C<]> are skipped as well.  When searching for multicharacter
1571*0Sstevel@tonic-gatedelimiters, nothing is skipped.
1572*0Sstevel@tonic-gate
1573*0Sstevel@tonic-gateFor constructs with three-part delimiters (C<s///>, C<y///>, and
1574*0Sstevel@tonic-gateC<tr///>), the search is repeated once more.
1575*0Sstevel@tonic-gate
1576*0Sstevel@tonic-gateDuring this search no attention is paid to the semantics of the construct.
1577*0Sstevel@tonic-gateThus:
1578*0Sstevel@tonic-gate
1579*0Sstevel@tonic-gate    "$hash{"$foo/$bar"}"
1580*0Sstevel@tonic-gate
1581*0Sstevel@tonic-gateor:
1582*0Sstevel@tonic-gate
1583*0Sstevel@tonic-gate    m/
1584*0Sstevel@tonic-gate      bar	# NOT a comment, this slash / terminated m//!
1585*0Sstevel@tonic-gate     /x
1586*0Sstevel@tonic-gate
1587*0Sstevel@tonic-gatedo not form legal quoted expressions.   The quoted part ends on the
1588*0Sstevel@tonic-gatefirst C<"> and C</>, and the rest happens to be a syntax error.
1589*0Sstevel@tonic-gateBecause the slash that terminated C<m//> was followed by a C<SPACE>,
1590*0Sstevel@tonic-gatethe example above is not C<m//x>, but rather C<m//> with no C</x>
1591*0Sstevel@tonic-gatemodifier.  So the embedded C<#> is interpreted as a literal C<#>.
1592*0Sstevel@tonic-gate
1593*0Sstevel@tonic-gate=item Removal of backslashes before delimiters
1594*0Sstevel@tonic-gate
1595*0Sstevel@tonic-gateDuring the second pass, text between the starting and ending
1596*0Sstevel@tonic-gatedelimiters is copied to a safe location, and the C<\> is removed
1597*0Sstevel@tonic-gatefrom combinations consisting of C<\> and delimiter--or delimiters,
1598*0Sstevel@tonic-gatemeaning both starting and ending delimiters will should these differ.
1599*0Sstevel@tonic-gateThis removal does not happen for multi-character delimiters.
1600*0Sstevel@tonic-gateNote that the combination C<\\> is left intact, just as it was.
1601*0Sstevel@tonic-gate
1602*0Sstevel@tonic-gateStarting from this step no information about the delimiters is
1603*0Sstevel@tonic-gateused in parsing.
1604*0Sstevel@tonic-gate
1605*0Sstevel@tonic-gate=item Interpolation
1606*0Sstevel@tonic-gate
1607*0Sstevel@tonic-gateThe next step is interpolation in the text obtained, which is now
1608*0Sstevel@tonic-gatedelimiter-independent.  There are four different cases.
1609*0Sstevel@tonic-gate
1610*0Sstevel@tonic-gate=over 4
1611*0Sstevel@tonic-gate
1612*0Sstevel@tonic-gate=item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>
1613*0Sstevel@tonic-gate
1614*0Sstevel@tonic-gateNo interpolation is performed.
1615*0Sstevel@tonic-gate
1616*0Sstevel@tonic-gate=item C<''>, C<q//>
1617*0Sstevel@tonic-gate
1618*0Sstevel@tonic-gateThe only interpolation is removal of C<\> from pairs C<\\>.
1619*0Sstevel@tonic-gate
1620*0Sstevel@tonic-gate=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>
1621*0Sstevel@tonic-gate
1622*0Sstevel@tonic-gateC<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
1623*0Sstevel@tonic-gateconverted to corresponding Perl constructs.  Thus, C<"$foo\Qbaz$bar">
1624*0Sstevel@tonic-gateis converted to C<$foo . (quotemeta("baz" . $bar))> internally.
1625*0Sstevel@tonic-gateThe other combinations are replaced with appropriate expansions.
1626*0Sstevel@tonic-gate
1627*0Sstevel@tonic-gateLet it be stressed that I<whatever falls between C<\Q> and C<\E>>
1628*0Sstevel@tonic-gateis interpolated in the usual way.  Something like C<"\Q\\E"> has
1629*0Sstevel@tonic-gateno C<\E> inside.  instead, it has C<\Q>, C<\\>, and C<E>, so the
1630*0Sstevel@tonic-gateresult is the same as for C<"\\\\E">.  As a general rule, backslashes
1631*0Sstevel@tonic-gatebetween C<\Q> and C<\E> may lead to counterintuitive results.  So,
1632*0Sstevel@tonic-gateC<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
1633*0Sstevel@tonic-gateas C<"\\\t"> (since TAB is not alphanumeric).  Note also that:
1634*0Sstevel@tonic-gate
1635*0Sstevel@tonic-gate  $str = '\t';
1636*0Sstevel@tonic-gate  return "\Q$str";
1637*0Sstevel@tonic-gate
1638*0Sstevel@tonic-gatemay be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
1639*0Sstevel@tonic-gate
1640*0Sstevel@tonic-gateInterpolated scalars and arrays are converted internally to the C<join> and
1641*0Sstevel@tonic-gateC<.> catenation operations.  Thus, C<"$foo XXX '@arr'"> becomes:
1642*0Sstevel@tonic-gate
1643*0Sstevel@tonic-gate  $foo . " XXX '" . (join $", @arr) . "'";
1644*0Sstevel@tonic-gate
1645*0Sstevel@tonic-gateAll operations above are performed simultaneously, left to right.
1646*0Sstevel@tonic-gate
1647*0Sstevel@tonic-gateBecause the result of C<"\Q STRING \E"> has all metacharacters
1648*0Sstevel@tonic-gatequoted, there is no way to insert a literal C<$> or C<@> inside a
1649*0Sstevel@tonic-gateC<\Q\E> pair.  If protected by C<\>, C<$> will be quoted to became
1650*0Sstevel@tonic-gateC<"\\\$">; if not, it is interpreted as the start of an interpolated
1651*0Sstevel@tonic-gatescalar.
1652*0Sstevel@tonic-gate
1653*0Sstevel@tonic-gateNote also that the interpolation code needs to make a decision on
1654*0Sstevel@tonic-gatewhere the interpolated scalar ends.  For instance, whether
1655*0Sstevel@tonic-gateC<< "a $b -> {c}" >> really means:
1656*0Sstevel@tonic-gate
1657*0Sstevel@tonic-gate  "a " . $b . " -> {c}";
1658*0Sstevel@tonic-gate
1659*0Sstevel@tonic-gateor:
1660*0Sstevel@tonic-gate
1661*0Sstevel@tonic-gate  "a " . $b -> {c};
1662*0Sstevel@tonic-gate
1663*0Sstevel@tonic-gateMost of the time, the longest possible text that does not include
1664*0Sstevel@tonic-gatespaces between components and which contains matching braces or
1665*0Sstevel@tonic-gatebrackets.  because the outcome may be determined by voting based
1666*0Sstevel@tonic-gateon heuristic estimators, the result is not strictly predictable.
1667*0Sstevel@tonic-gateFortunately, it's usually correct for ambiguous cases.
1668*0Sstevel@tonic-gate
1669*0Sstevel@tonic-gate=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
1670*0Sstevel@tonic-gate
1671*0Sstevel@tonic-gateProcessing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
1672*0Sstevel@tonic-gatehappens (almost) as with C<qq//> constructs, but the substitution
1673*0Sstevel@tonic-gateof C<\> followed by RE-special chars (including C<\>) is not
1674*0Sstevel@tonic-gateperformed.  Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
1675*0Sstevel@tonic-gatea C<#>-comment in a C<//x>-regular expression, no processing is
1676*0Sstevel@tonic-gateperformed whatsoever.  This is the first step at which the presence
1677*0Sstevel@tonic-gateof the C<//x> modifier is relevant.
1678*0Sstevel@tonic-gate
1679*0Sstevel@tonic-gateInterpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
1680*0Sstevel@tonic-gateinterpolated, and constructs C<$var[SOMETHING]> are voted (by several
1681*0Sstevel@tonic-gatedifferent estimators) to be either an array element or C<$var>
1682*0Sstevel@tonic-gatefollowed by an RE alternative.  This is where the notation
1683*0Sstevel@tonic-gateC<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
1684*0Sstevel@tonic-gatearray element C<-9>, not as a regular expression from the variable
1685*0Sstevel@tonic-gateC<$arr> followed by a digit, which would be the interpretation of
1686*0Sstevel@tonic-gateC</$arr[0-9]/>.  Since voting among different estimators may occur,
1687*0Sstevel@tonic-gatethe result is not predictable.
1688*0Sstevel@tonic-gate
1689*0Sstevel@tonic-gateIt is at this step that C<\1> is begrudgingly converted to C<$1> in
1690*0Sstevel@tonic-gatethe replacement text of C<s///> to correct the incorrigible
1691*0Sstevel@tonic-gateI<sed> hackers who haven't picked up the saner idiom yet.  A warning
1692*0Sstevel@tonic-gateis emitted if the C<use warnings> pragma or the B<-w> command-line flag
1693*0Sstevel@tonic-gate(that is, the C<$^W> variable) was set.
1694*0Sstevel@tonic-gate
1695*0Sstevel@tonic-gateThe lack of processing of C<\\> creates specific restrictions on
1696*0Sstevel@tonic-gatethe post-processed text.  If the delimiter is C</>, one cannot get
1697*0Sstevel@tonic-gatethe combination C<\/> into the result of this step.  C</> will
1698*0Sstevel@tonic-gatefinish the regular expression, C<\/> will be stripped to C</> on
1699*0Sstevel@tonic-gatethe previous step, and C<\\/> will be left as is.  Because C</> is
1700*0Sstevel@tonic-gateequivalent to C<\/> inside a regular expression, this does not
1701*0Sstevel@tonic-gatematter unless the delimiter happens to be character special to the
1702*0Sstevel@tonic-gateRE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
1703*0Sstevel@tonic-gatealphanumeric char, as in:
1704*0Sstevel@tonic-gate
1705*0Sstevel@tonic-gate  m m ^ a \s* b mmx;
1706*0Sstevel@tonic-gate
1707*0Sstevel@tonic-gateIn the RE above, which is intentionally obfuscated for illustration, the
1708*0Sstevel@tonic-gatedelimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
1709*0Sstevel@tonic-gateRE is the same as for C<m/ ^ a \s* b /mx>.  There's more than one
1710*0Sstevel@tonic-gatereason you're encouraged to restrict your delimiters to non-alphanumeric,
1711*0Sstevel@tonic-gatenon-whitespace choices.
1712*0Sstevel@tonic-gate
1713*0Sstevel@tonic-gate=back
1714*0Sstevel@tonic-gate
1715*0Sstevel@tonic-gateThis step is the last one for all constructs except regular expressions,
1716*0Sstevel@tonic-gatewhich are processed further.
1717*0Sstevel@tonic-gate
1718*0Sstevel@tonic-gate=item Interpolation of regular expressions
1719*0Sstevel@tonic-gate
1720*0Sstevel@tonic-gatePrevious steps were performed during the compilation of Perl code,
1721*0Sstevel@tonic-gatebut this one happens at run time--although it may be optimized to
1722*0Sstevel@tonic-gatebe calculated at compile time if appropriate.  After preprocessing
1723*0Sstevel@tonic-gatedescribed above, and possibly after evaluation if catenation,
1724*0Sstevel@tonic-gatejoining, casing translation, or metaquoting are involved, the
1725*0Sstevel@tonic-gateresulting I<string> is passed to the RE engine for compilation.
1726*0Sstevel@tonic-gate
1727*0Sstevel@tonic-gateWhatever happens in the RE engine might be better discussed in L<perlre>,
1728*0Sstevel@tonic-gatebut for the sake of continuity, we shall do so here.
1729*0Sstevel@tonic-gate
1730*0Sstevel@tonic-gateThis is another step where the presence of the C<//x> modifier is
1731*0Sstevel@tonic-gaterelevant.  The RE engine scans the string from left to right and
1732*0Sstevel@tonic-gateconverts it to a finite automaton.
1733*0Sstevel@tonic-gate
1734*0Sstevel@tonic-gateBackslashed characters are either replaced with corresponding
1735*0Sstevel@tonic-gateliteral strings (as with C<\{>), or else they generate special nodes
1736*0Sstevel@tonic-gatein the finite automaton (as with C<\b>).  Characters special to the
1737*0Sstevel@tonic-gateRE engine (such as C<|>) generate corresponding nodes or groups of
1738*0Sstevel@tonic-gatenodes.  C<(?#...)> comments are ignored.  All the rest is either
1739*0Sstevel@tonic-gateconverted to literal strings to match, or else is ignored (as is
1740*0Sstevel@tonic-gatewhitespace and C<#>-style comments if C<//x> is present).
1741*0Sstevel@tonic-gate
1742*0Sstevel@tonic-gateParsing of the bracketed character class construct, C<[...]>, is
1743*0Sstevel@tonic-gaterather different than the rule used for the rest of the pattern.
1744*0Sstevel@tonic-gateThe terminator of this construct is found using the same rules as
1745*0Sstevel@tonic-gatefor finding the terminator of a C<{}>-delimited construct, the only
1746*0Sstevel@tonic-gateexception being that C<]> immediately following C<[> is treated as
1747*0Sstevel@tonic-gatethough preceded by a backslash.  Similarly, the terminator of
1748*0Sstevel@tonic-gateC<(?{...})> is found using the same rules as for finding the
1749*0Sstevel@tonic-gateterminator of a C<{}>-delimited construct.
1750*0Sstevel@tonic-gate
1751*0Sstevel@tonic-gateIt is possible to inspect both the string given to RE engine and the
1752*0Sstevel@tonic-gateresulting finite automaton.  See the arguments C<debug>/C<debugcolor>
1753*0Sstevel@tonic-gatein the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
1754*0Sstevel@tonic-gateswitch documented in L<perlrun/"Command Switches">.
1755*0Sstevel@tonic-gate
1756*0Sstevel@tonic-gate=item Optimization of regular expressions
1757*0Sstevel@tonic-gate
1758*0Sstevel@tonic-gateThis step is listed for completeness only.  Since it does not change
1759*0Sstevel@tonic-gatesemantics, details of this step are not documented and are subject
1760*0Sstevel@tonic-gateto change without notice.  This step is performed over the finite
1761*0Sstevel@tonic-gateautomaton that was generated during the previous pass.
1762*0Sstevel@tonic-gate
1763*0Sstevel@tonic-gateIt is at this stage that C<split()> silently optimizes C</^/> to
1764*0Sstevel@tonic-gatemean C</^/m>.
1765*0Sstevel@tonic-gate
1766*0Sstevel@tonic-gate=back
1767*0Sstevel@tonic-gate
1768*0Sstevel@tonic-gate=head2 I/O Operators
1769*0Sstevel@tonic-gate
1770*0Sstevel@tonic-gateThere are several I/O operators you should know about.
1771*0Sstevel@tonic-gate
1772*0Sstevel@tonic-gateA string enclosed by backticks (grave accents) first undergoes
1773*0Sstevel@tonic-gatedouble-quote interpolation.  It is then interpreted as an external
1774*0Sstevel@tonic-gatecommand, and the output of that command is the value of the
1775*0Sstevel@tonic-gatebacktick string, like in a shell.  In scalar context, a single string
1776*0Sstevel@tonic-gateconsisting of all output is returned.  In list context, a list of
1777*0Sstevel@tonic-gatevalues is returned, one per line of output.  (You can set C<$/> to use
1778*0Sstevel@tonic-gatea different line terminator.)  The command is executed each time the
1779*0Sstevel@tonic-gatepseudo-literal is evaluated.  The status value of the command is
1780*0Sstevel@tonic-gatereturned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
1781*0Sstevel@tonic-gateUnlike in B<csh>, no translation is done on the return data--newlines
1782*0Sstevel@tonic-gateremain newlines.  Unlike in any of the shells, single quotes do not
1783*0Sstevel@tonic-gatehide variable names in the command from interpretation.  To pass a
1784*0Sstevel@tonic-gateliteral dollar-sign through to the shell you need to hide it with a
1785*0Sstevel@tonic-gatebackslash.  The generalized form of backticks is C<qx//>.  (Because
1786*0Sstevel@tonic-gatebackticks always undergo shell expansion as well, see L<perlsec> for
1787*0Sstevel@tonic-gatesecurity concerns.)
1788*0Sstevel@tonic-gate
1789*0Sstevel@tonic-gateIn scalar context, evaluating a filehandle in angle brackets yields
1790*0Sstevel@tonic-gatethe next line from that file (the newline, if any, included), or
1791*0Sstevel@tonic-gateC<undef> at end-of-file or on error.  When C<$/> is set to C<undef>
1792*0Sstevel@tonic-gate(sometimes known as file-slurp mode) and the file is empty, it
1793*0Sstevel@tonic-gatereturns C<''> the first time, followed by C<undef> subsequently.
1794*0Sstevel@tonic-gate
1795*0Sstevel@tonic-gateOrdinarily you must assign the returned value to a variable, but
1796*0Sstevel@tonic-gatethere is one situation where an automatic assignment happens.  If
1797*0Sstevel@tonic-gateand only if the input symbol is the only thing inside the conditional
1798*0Sstevel@tonic-gateof a C<while> statement (even if disguised as a C<for(;;)> loop),
1799*0Sstevel@tonic-gatethe value is automatically assigned to the global variable $_,
1800*0Sstevel@tonic-gatedestroying whatever was there previously.  (This may seem like an
1801*0Sstevel@tonic-gateodd thing to you, but you'll use the construct in almost every Perl
1802*0Sstevel@tonic-gatescript you write.)  The $_ variable is not implicitly localized.
1803*0Sstevel@tonic-gateYou'll have to put a C<local $_;> before the loop if you want that
1804*0Sstevel@tonic-gateto happen.
1805*0Sstevel@tonic-gate
1806*0Sstevel@tonic-gateThe following lines are equivalent:
1807*0Sstevel@tonic-gate
1808*0Sstevel@tonic-gate    while (defined($_ = <STDIN>)) { print; }
1809*0Sstevel@tonic-gate    while ($_ = <STDIN>) { print; }
1810*0Sstevel@tonic-gate    while (<STDIN>) { print; }
1811*0Sstevel@tonic-gate    for (;<STDIN>;) { print; }
1812*0Sstevel@tonic-gate    print while defined($_ = <STDIN>);
1813*0Sstevel@tonic-gate    print while ($_ = <STDIN>);
1814*0Sstevel@tonic-gate    print while <STDIN>;
1815*0Sstevel@tonic-gate
1816*0Sstevel@tonic-gateThis also behaves similarly, but avoids $_ :
1817*0Sstevel@tonic-gate
1818*0Sstevel@tonic-gate    while (my $line = <STDIN>) { print $line }
1819*0Sstevel@tonic-gate
1820*0Sstevel@tonic-gateIn these loop constructs, the assigned value (whether assignment
1821*0Sstevel@tonic-gateis automatic or explicit) is then tested to see whether it is
1822*0Sstevel@tonic-gatedefined.  The defined test avoids problems where line has a string
1823*0Sstevel@tonic-gatevalue that would be treated as false by Perl, for example a "" or
1824*0Sstevel@tonic-gatea "0" with no trailing newline.  If you really mean for such values
1825*0Sstevel@tonic-gateto terminate the loop, they should be tested for explicitly:
1826*0Sstevel@tonic-gate
1827*0Sstevel@tonic-gate    while (($_ = <STDIN>) ne '0') { ... }
1828*0Sstevel@tonic-gate    while (<STDIN>) { last unless $_; ... }
1829*0Sstevel@tonic-gate
1830*0Sstevel@tonic-gateIn other boolean contexts, C<< <I<filehandle>> >> without an
1831*0Sstevel@tonic-gateexplicit C<defined> test or comparison elicit a warning if the
1832*0Sstevel@tonic-gateC<use warnings> pragma or the B<-w>
1833*0Sstevel@tonic-gatecommand-line switch (the C<$^W> variable) is in effect.
1834*0Sstevel@tonic-gate
1835*0Sstevel@tonic-gateThe filehandles STDIN, STDOUT, and STDERR are predefined.  (The
1836*0Sstevel@tonic-gatefilehandles C<stdin>, C<stdout>, and C<stderr> will also work except
1837*0Sstevel@tonic-gatein packages, where they would be interpreted as local identifiers
1838*0Sstevel@tonic-gaterather than global.)  Additional filehandles may be created with
1839*0Sstevel@tonic-gatethe open() function, amongst others.  See L<perlopentut> and
1840*0Sstevel@tonic-gateL<perlfunc/open> for details on this.
1841*0Sstevel@tonic-gate
1842*0Sstevel@tonic-gateIf a <FILEHANDLE> is used in a context that is looking for
1843*0Sstevel@tonic-gatea list, a list comprising all input lines is returned, one line per
1844*0Sstevel@tonic-gatelist element.  It's easy to grow to a rather large data space this
1845*0Sstevel@tonic-gateway, so use with care.
1846*0Sstevel@tonic-gate
1847*0Sstevel@tonic-gate<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
1848*0Sstevel@tonic-gateSee L<perlfunc/readline>.
1849*0Sstevel@tonic-gate
1850*0Sstevel@tonic-gateThe null filehandle <> is special: it can be used to emulate the
1851*0Sstevel@tonic-gatebehavior of B<sed> and B<awk>.  Input from <> comes either from
1852*0Sstevel@tonic-gatestandard input, or from each file listed on the command line.  Here's
1853*0Sstevel@tonic-gatehow it works: the first time <> is evaluated, the @ARGV array is
1854*0Sstevel@tonic-gatechecked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
1855*0Sstevel@tonic-gategives you standard input.  The @ARGV array is then processed as a list
1856*0Sstevel@tonic-gateof filenames.  The loop
1857*0Sstevel@tonic-gate
1858*0Sstevel@tonic-gate    while (<>) {
1859*0Sstevel@tonic-gate	...			# code for each line
1860*0Sstevel@tonic-gate    }
1861*0Sstevel@tonic-gate
1862*0Sstevel@tonic-gateis equivalent to the following Perl-like pseudo code:
1863*0Sstevel@tonic-gate
1864*0Sstevel@tonic-gate    unshift(@ARGV, '-') unless @ARGV;
1865*0Sstevel@tonic-gate    while ($ARGV = shift) {
1866*0Sstevel@tonic-gate	open(ARGV, $ARGV);
1867*0Sstevel@tonic-gate	while (<ARGV>) {
1868*0Sstevel@tonic-gate	    ...		# code for each line
1869*0Sstevel@tonic-gate	}
1870*0Sstevel@tonic-gate    }
1871*0Sstevel@tonic-gate
1872*0Sstevel@tonic-gateexcept that it isn't so cumbersome to say, and will actually work.
1873*0Sstevel@tonic-gateIt really does shift the @ARGV array and put the current filename
1874*0Sstevel@tonic-gateinto the $ARGV variable.  It also uses filehandle I<ARGV>
1875*0Sstevel@tonic-gateinternally--<> is just a synonym for <ARGV>, which
1876*0Sstevel@tonic-gateis magical.  (The pseudo code above doesn't work because it treats
1877*0Sstevel@tonic-gate<ARGV> as non-magical.)
1878*0Sstevel@tonic-gate
1879*0Sstevel@tonic-gateYou can modify @ARGV before the first <> as long as the array ends up
1880*0Sstevel@tonic-gatecontaining the list of filenames you really want.  Line numbers (C<$.>)
1881*0Sstevel@tonic-gatecontinue as though the input were one big happy file.  See the example
1882*0Sstevel@tonic-gatein L<perlfunc/eof> for how to reset line numbers on each file.
1883*0Sstevel@tonic-gate
1884*0Sstevel@tonic-gateIf you want to set @ARGV to your own list of files, go right ahead.
1885*0Sstevel@tonic-gateThis sets @ARGV to all plain text files if no @ARGV was given:
1886*0Sstevel@tonic-gate
1887*0Sstevel@tonic-gate    @ARGV = grep { -f && -T } glob('*') unless @ARGV;
1888*0Sstevel@tonic-gate
1889*0Sstevel@tonic-gateYou can even set them to pipe commands.  For example, this automatically
1890*0Sstevel@tonic-gatefilters compressed arguments through B<gzip>:
1891*0Sstevel@tonic-gate
1892*0Sstevel@tonic-gate    @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
1893*0Sstevel@tonic-gate
1894*0Sstevel@tonic-gateIf you want to pass switches into your script, you can use one of the
1895*0Sstevel@tonic-gateGetopts modules or put a loop on the front like this:
1896*0Sstevel@tonic-gate
1897*0Sstevel@tonic-gate    while ($_ = $ARGV[0], /^-/) {
1898*0Sstevel@tonic-gate	shift;
1899*0Sstevel@tonic-gate        last if /^--$/;
1900*0Sstevel@tonic-gate	if (/^-D(.*)/) { $debug = $1 }
1901*0Sstevel@tonic-gate	if (/^-v/)     { $verbose++  }
1902*0Sstevel@tonic-gate	# ...		# other switches
1903*0Sstevel@tonic-gate    }
1904*0Sstevel@tonic-gate
1905*0Sstevel@tonic-gate    while (<>) {
1906*0Sstevel@tonic-gate	# ...		# code for each line
1907*0Sstevel@tonic-gate    }
1908*0Sstevel@tonic-gate
1909*0Sstevel@tonic-gateThe <> symbol will return C<undef> for end-of-file only once.
1910*0Sstevel@tonic-gateIf you call it again after this, it will assume you are processing another
1911*0Sstevel@tonic-gate@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
1912*0Sstevel@tonic-gate
1913*0Sstevel@tonic-gateIf what the angle brackets contain is a simple scalar variable (e.g.,
1914*0Sstevel@tonic-gate<$foo>), then that variable contains the name of the
1915*0Sstevel@tonic-gatefilehandle to input from, or its typeglob, or a reference to the
1916*0Sstevel@tonic-gatesame.  For example:
1917*0Sstevel@tonic-gate
1918*0Sstevel@tonic-gate    $fh = \*STDIN;
1919*0Sstevel@tonic-gate    $line = <$fh>;
1920*0Sstevel@tonic-gate
1921*0Sstevel@tonic-gateIf what's within the angle brackets is neither a filehandle nor a simple
1922*0Sstevel@tonic-gatescalar variable containing a filehandle name, typeglob, or typeglob
1923*0Sstevel@tonic-gatereference, it is interpreted as a filename pattern to be globbed, and
1924*0Sstevel@tonic-gateeither a list of filenames or the next filename in the list is returned,
1925*0Sstevel@tonic-gatedepending on context.  This distinction is determined on syntactic
1926*0Sstevel@tonic-gategrounds alone.  That means C<< <$x> >> is always a readline() from
1927*0Sstevel@tonic-gatean indirect handle, but C<< <$hash{key}> >> is always a glob().
1928*0Sstevel@tonic-gateThat's because $x is a simple scalar variable, but C<$hash{key}> is
1929*0Sstevel@tonic-gatenot--it's a hash element.
1930*0Sstevel@tonic-gate
1931*0Sstevel@tonic-gateOne level of double-quote interpretation is done first, but you can't
1932*0Sstevel@tonic-gatesay C<< <$foo> >> because that's an indirect filehandle as explained
1933*0Sstevel@tonic-gatein the previous paragraph.  (In older versions of Perl, programmers
1934*0Sstevel@tonic-gatewould insert curly brackets to force interpretation as a filename glob:
1935*0Sstevel@tonic-gateC<< <${foo}> >>.  These days, it's considered cleaner to call the
1936*0Sstevel@tonic-gateinternal function directly as C<glob($foo)>, which is probably the right
1937*0Sstevel@tonic-gateway to have done it in the first place.)  For example:
1938*0Sstevel@tonic-gate
1939*0Sstevel@tonic-gate    while (<*.c>) {
1940*0Sstevel@tonic-gate	chmod 0644, $_;
1941*0Sstevel@tonic-gate    }
1942*0Sstevel@tonic-gate
1943*0Sstevel@tonic-gateis roughly equivalent to:
1944*0Sstevel@tonic-gate
1945*0Sstevel@tonic-gate    open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1946*0Sstevel@tonic-gate    while (<FOO>) {
1947*0Sstevel@tonic-gate	chomp;
1948*0Sstevel@tonic-gate	chmod 0644, $_;
1949*0Sstevel@tonic-gate    }
1950*0Sstevel@tonic-gate
1951*0Sstevel@tonic-gateexcept that the globbing is actually done internally using the standard
1952*0Sstevel@tonic-gateC<File::Glob> extension.  Of course, the shortest way to do the above is:
1953*0Sstevel@tonic-gate
1954*0Sstevel@tonic-gate    chmod 0644, <*.c>;
1955*0Sstevel@tonic-gate
1956*0Sstevel@tonic-gateA (file)glob evaluates its (embedded) argument only when it is
1957*0Sstevel@tonic-gatestarting a new list.  All values must be read before it will start
1958*0Sstevel@tonic-gateover.  In list context, this isn't important because you automatically
1959*0Sstevel@tonic-gateget them all anyway.  However, in scalar context the operator returns
1960*0Sstevel@tonic-gatethe next value each time it's called, or C<undef> when the list has
1961*0Sstevel@tonic-gaterun out.  As with filehandle reads, an automatic C<defined> is
1962*0Sstevel@tonic-gategenerated when the glob occurs in the test part of a C<while>,
1963*0Sstevel@tonic-gatebecause legal glob returns (e.g. a file called F<0>) would otherwise
1964*0Sstevel@tonic-gateterminate the loop.  Again, C<undef> is returned only once.  So if
1965*0Sstevel@tonic-gateyou're expecting a single value from a glob, it is much better to
1966*0Sstevel@tonic-gatesay
1967*0Sstevel@tonic-gate
1968*0Sstevel@tonic-gate    ($file) = <blurch*>;
1969*0Sstevel@tonic-gate
1970*0Sstevel@tonic-gatethan
1971*0Sstevel@tonic-gate
1972*0Sstevel@tonic-gate    $file = <blurch*>;
1973*0Sstevel@tonic-gate
1974*0Sstevel@tonic-gatebecause the latter will alternate between returning a filename and
1975*0Sstevel@tonic-gatereturning false.
1976*0Sstevel@tonic-gate
1977*0Sstevel@tonic-gateIf you're trying to do variable interpolation, it's definitely better
1978*0Sstevel@tonic-gateto use the glob() function, because the older notation can cause people
1979*0Sstevel@tonic-gateto become confused with the indirect filehandle notation.
1980*0Sstevel@tonic-gate
1981*0Sstevel@tonic-gate    @files = glob("$dir/*.[ch]");
1982*0Sstevel@tonic-gate    @files = glob($files[$i]);
1983*0Sstevel@tonic-gate
1984*0Sstevel@tonic-gate=head2 Constant Folding
1985*0Sstevel@tonic-gate
1986*0Sstevel@tonic-gateLike C, Perl does a certain amount of expression evaluation at
1987*0Sstevel@tonic-gatecompile time whenever it determines that all arguments to an
1988*0Sstevel@tonic-gateoperator are static and have no side effects.  In particular, string
1989*0Sstevel@tonic-gateconcatenation happens at compile time between literals that don't do
1990*0Sstevel@tonic-gatevariable substitution.  Backslash interpolation also happens at
1991*0Sstevel@tonic-gatecompile time.  You can say
1992*0Sstevel@tonic-gate
1993*0Sstevel@tonic-gate    'Now is the time for all' . "\n" .
1994*0Sstevel@tonic-gate	'good men to come to.'
1995*0Sstevel@tonic-gate
1996*0Sstevel@tonic-gateand this all reduces to one string internally.  Likewise, if
1997*0Sstevel@tonic-gateyou say
1998*0Sstevel@tonic-gate
1999*0Sstevel@tonic-gate    foreach $file (@filenames) {
2000*0Sstevel@tonic-gate	if (-s $file > 5 + 100 * 2**16) {  }
2001*0Sstevel@tonic-gate    }
2002*0Sstevel@tonic-gate
2003*0Sstevel@tonic-gatethe compiler will precompute the number which that expression
2004*0Sstevel@tonic-gaterepresents so that the interpreter won't have to.
2005*0Sstevel@tonic-gate
2006*0Sstevel@tonic-gate=head2 Bitwise String Operators
2007*0Sstevel@tonic-gate
2008*0Sstevel@tonic-gateBitstrings of any size may be manipulated by the bitwise operators
2009*0Sstevel@tonic-gate(C<~ | & ^>).
2010*0Sstevel@tonic-gate
2011*0Sstevel@tonic-gateIf the operands to a binary bitwise op are strings of different
2012*0Sstevel@tonic-gatesizes, B<|> and B<^> ops act as though the shorter operand had
2013*0Sstevel@tonic-gateadditional zero bits on the right, while the B<&> op acts as though
2014*0Sstevel@tonic-gatethe longer operand were truncated to the length of the shorter.
2015*0Sstevel@tonic-gateThe granularity for such extension or truncation is one or more
2016*0Sstevel@tonic-gatebytes.
2017*0Sstevel@tonic-gate
2018*0Sstevel@tonic-gate    # ASCII-based examples
2019*0Sstevel@tonic-gate    print "j p \n" ^ " a h";        	# prints "JAPH\n"
2020*0Sstevel@tonic-gate    print "JA" | "  ph\n";          	# prints "japh\n"
2021*0Sstevel@tonic-gate    print "japh\nJunk" & '_____';   	# prints "JAPH\n";
2022*0Sstevel@tonic-gate    print 'p N$' ^ " E<H\n";		# prints "Perl\n";
2023*0Sstevel@tonic-gate
2024*0Sstevel@tonic-gateIf you are intending to manipulate bitstrings, be certain that
2025*0Sstevel@tonic-gateyou're supplying bitstrings: If an operand is a number, that will imply
2026*0Sstevel@tonic-gatea B<numeric> bitwise operation.  You may explicitly show which type of
2027*0Sstevel@tonic-gateoperation you intend by using C<""> or C<0+>, as in the examples below.
2028*0Sstevel@tonic-gate
2029*0Sstevel@tonic-gate    $foo =  150  |  105 ;	# yields 255  (0x96 | 0x69 is 0xFF)
2030*0Sstevel@tonic-gate    $foo = '150' |  105 ;	# yields 255
2031*0Sstevel@tonic-gate    $foo =  150  | '105';	# yields 255
2032*0Sstevel@tonic-gate    $foo = '150' | '105';	# yields string '155' (under ASCII)
2033*0Sstevel@tonic-gate
2034*0Sstevel@tonic-gate    $baz = 0+$foo & 0+$bar;	# both ops explicitly numeric
2035*0Sstevel@tonic-gate    $biz = "$foo" ^ "$bar";	# both ops explicitly stringy
2036*0Sstevel@tonic-gate
2037*0Sstevel@tonic-gateSee L<perlfunc/vec> for information on how to manipulate individual bits
2038*0Sstevel@tonic-gatein a bit vector.
2039*0Sstevel@tonic-gate
2040*0Sstevel@tonic-gate=head2 Integer Arithmetic
2041*0Sstevel@tonic-gate
2042*0Sstevel@tonic-gateBy default, Perl assumes that it must do most of its arithmetic in
2043*0Sstevel@tonic-gatefloating point.  But by saying
2044*0Sstevel@tonic-gate
2045*0Sstevel@tonic-gate    use integer;
2046*0Sstevel@tonic-gate
2047*0Sstevel@tonic-gateyou may tell the compiler that it's okay to use integer operations
2048*0Sstevel@tonic-gate(if it feels like it) from here to the end of the enclosing BLOCK.
2049*0Sstevel@tonic-gateAn inner BLOCK may countermand this by saying
2050*0Sstevel@tonic-gate
2051*0Sstevel@tonic-gate    no integer;
2052*0Sstevel@tonic-gate
2053*0Sstevel@tonic-gatewhich lasts until the end of that BLOCK.  Note that this doesn't
2054*0Sstevel@tonic-gatemean everything is only an integer, merely that Perl may use integer
2055*0Sstevel@tonic-gateoperations if it is so inclined.  For example, even under C<use
2056*0Sstevel@tonic-gateinteger>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
2057*0Sstevel@tonic-gateor so.
2058*0Sstevel@tonic-gate
2059*0Sstevel@tonic-gateUsed on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
2060*0Sstevel@tonic-gateand ">>") always produce integral results.  (But see also
2061*0Sstevel@tonic-gateL<Bitwise String Operators>.)  However, C<use integer> still has meaning for
2062*0Sstevel@tonic-gatethem.  By default, their results are interpreted as unsigned integers, but
2063*0Sstevel@tonic-gateif C<use integer> is in effect, their results are interpreted
2064*0Sstevel@tonic-gateas signed integers.  For example, C<~0> usually evaluates to a large
2065*0Sstevel@tonic-gateintegral value.  However, C<use integer; ~0> is C<-1> on twos-complement
2066*0Sstevel@tonic-gatemachines.
2067*0Sstevel@tonic-gate
2068*0Sstevel@tonic-gate=head2 Floating-point Arithmetic
2069*0Sstevel@tonic-gate
2070*0Sstevel@tonic-gateWhile C<use integer> provides integer-only arithmetic, there is no
2071*0Sstevel@tonic-gateanalogous mechanism to provide automatic rounding or truncation to a
2072*0Sstevel@tonic-gatecertain number of decimal places.  For rounding to a certain number
2073*0Sstevel@tonic-gateof digits, sprintf() or printf() is usually the easiest route.
2074*0Sstevel@tonic-gateSee L<perlfaq4>.
2075*0Sstevel@tonic-gate
2076*0Sstevel@tonic-gateFloating-point numbers are only approximations to what a mathematician
2077*0Sstevel@tonic-gatewould call real numbers.  There are infinitely more reals than floats,
2078*0Sstevel@tonic-gateso some corners must be cut.  For example:
2079*0Sstevel@tonic-gate
2080*0Sstevel@tonic-gate    printf "%.20g\n", 123456789123456789;
2081*0Sstevel@tonic-gate    #        produces 123456789123456784
2082*0Sstevel@tonic-gate
2083*0Sstevel@tonic-gateTesting for exact equality of floating-point equality or inequality is
2084*0Sstevel@tonic-gatenot a good idea.  Here's a (relatively expensive) work-around to compare
2085*0Sstevel@tonic-gatewhether two floating-point numbers are equal to a particular number of
2086*0Sstevel@tonic-gatedecimal places.  See Knuth, volume II, for a more robust treatment of
2087*0Sstevel@tonic-gatethis topic.
2088*0Sstevel@tonic-gate
2089*0Sstevel@tonic-gate    sub fp_equal {
2090*0Sstevel@tonic-gate	my ($X, $Y, $POINTS) = @_;
2091*0Sstevel@tonic-gate	my ($tX, $tY);
2092*0Sstevel@tonic-gate	$tX = sprintf("%.${POINTS}g", $X);
2093*0Sstevel@tonic-gate	$tY = sprintf("%.${POINTS}g", $Y);
2094*0Sstevel@tonic-gate	return $tX eq $tY;
2095*0Sstevel@tonic-gate    }
2096*0Sstevel@tonic-gate
2097*0Sstevel@tonic-gateThe POSIX module (part of the standard perl distribution) implements
2098*0Sstevel@tonic-gateceil(), floor(), and other mathematical and trigonometric functions.
2099*0Sstevel@tonic-gateThe Math::Complex module (part of the standard perl distribution)
2100*0Sstevel@tonic-gatedefines mathematical functions that work on both the reals and the
2101*0Sstevel@tonic-gateimaginary numbers.  Math::Complex not as efficient as POSIX, but
2102*0Sstevel@tonic-gatePOSIX can't work with complex numbers.
2103*0Sstevel@tonic-gate
2104*0Sstevel@tonic-gateRounding in financial applications can have serious implications, and
2105*0Sstevel@tonic-gatethe rounding method used should be specified precisely.  In these
2106*0Sstevel@tonic-gatecases, it probably pays not to trust whichever system rounding is
2107*0Sstevel@tonic-gatebeing used by Perl, but to instead implement the rounding function you
2108*0Sstevel@tonic-gateneed yourself.
2109*0Sstevel@tonic-gate
2110*0Sstevel@tonic-gate=head2 Bigger Numbers
2111*0Sstevel@tonic-gate
2112*0Sstevel@tonic-gateThe standard Math::BigInt and Math::BigFloat modules provide
2113*0Sstevel@tonic-gatevariable-precision arithmetic and overloaded operators, although
2114*0Sstevel@tonic-gatethey're currently pretty slow. At the cost of some space and
2115*0Sstevel@tonic-gateconsiderable speed, they avoid the normal pitfalls associated with
2116*0Sstevel@tonic-gatelimited-precision representations.
2117*0Sstevel@tonic-gate
2118*0Sstevel@tonic-gate    use Math::BigInt;
2119*0Sstevel@tonic-gate    $x = Math::BigInt->new('123456789123456789');
2120*0Sstevel@tonic-gate    print $x * $x;
2121*0Sstevel@tonic-gate
2122*0Sstevel@tonic-gate    # prints +15241578780673678515622620750190521
2123*0Sstevel@tonic-gate
2124*0Sstevel@tonic-gateThere are several modules that let you calculate with (bound only by
2125*0Sstevel@tonic-gatememory and cpu-time) unlimited or fixed precision. There are also
2126*0Sstevel@tonic-gatesome non-standard modules that provide faster implementations via
2127*0Sstevel@tonic-gateexternal C libraries.
2128*0Sstevel@tonic-gate
2129*0Sstevel@tonic-gateHere is a short, but incomplete summary:
2130*0Sstevel@tonic-gate
2131*0Sstevel@tonic-gate	Math::Fraction		big, unlimited fractions like 9973 / 12967
2132*0Sstevel@tonic-gate	Math::String		treat string sequences like numbers
2133*0Sstevel@tonic-gate	Math::FixedPrecision	calculate with a fixed precision
2134*0Sstevel@tonic-gate	Math::Currency		for currency calculations
2135*0Sstevel@tonic-gate	Bit::Vector		manipulate bit vectors fast (uses C)
2136*0Sstevel@tonic-gate	Math::BigIntFast	Bit::Vector wrapper for big numbers
2137*0Sstevel@tonic-gate	Math::Pari		provides access to the Pari C library
2138*0Sstevel@tonic-gate	Math::BigInteger	uses an external C library
2139*0Sstevel@tonic-gate	Math::Cephes		uses external Cephes C library (no big numbers)
2140*0Sstevel@tonic-gate	Math::Cephes::Fraction	fractions via the Cephes library
2141*0Sstevel@tonic-gate	Math::GMP		another one using an external C library
2142*0Sstevel@tonic-gate
2143*0Sstevel@tonic-gateChoose wisely.
2144*0Sstevel@tonic-gate
2145*0Sstevel@tonic-gate=cut
2146