xref: /onnv-gate/usr/src/cmd/perl/5.8.4/distrib/pod/perlreref.pod (revision 0:68f95e015346)
1*0Sstevel@tonic-gate=head1 NAME
2*0Sstevel@tonic-gate
3*0Sstevel@tonic-gateperlreref - Perl Regular Expressions Reference
4*0Sstevel@tonic-gate
5*0Sstevel@tonic-gate=head1 DESCRIPTION
6*0Sstevel@tonic-gate
7*0Sstevel@tonic-gateThis is a quick reference to Perl's regular expressions.
8*0Sstevel@tonic-gateFor full information see L<perlre> and L<perlop>, as well
9*0Sstevel@tonic-gateas the L</"SEE ALSO"> section in this document.
10*0Sstevel@tonic-gate
11*0Sstevel@tonic-gate=head2 OPERATORS
12*0Sstevel@tonic-gate
13*0Sstevel@tonic-gate  =~ determines to which variable the regex is applied.
14*0Sstevel@tonic-gate     In its absence, $_ is used.
15*0Sstevel@tonic-gate
16*0Sstevel@tonic-gate        $var =~ /foo/;
17*0Sstevel@tonic-gate
18*0Sstevel@tonic-gate  !~ determines to which variable the regex is applied,
19*0Sstevel@tonic-gate     and negates the result of the match; it returns
20*0Sstevel@tonic-gate     false if the match succeeds, and true if it fails.
21*0Sstevel@tonic-gate
22*0Sstevel@tonic-gate       $var !~ /foo/;
23*0Sstevel@tonic-gate
24*0Sstevel@tonic-gate  m/pattern/igmsoxc searches a string for a pattern match,
25*0Sstevel@tonic-gate     applying the given options.
26*0Sstevel@tonic-gate
27*0Sstevel@tonic-gate        i  case-Insensitive
28*0Sstevel@tonic-gate        g  Global - all occurrences
29*0Sstevel@tonic-gate        m  Multiline mode - ^ and $ match internal lines
30*0Sstevel@tonic-gate        s  match as a Single line - . matches \n
31*0Sstevel@tonic-gate        o  compile pattern Once
32*0Sstevel@tonic-gate        x  eXtended legibility - free whitespace and comments
33*0Sstevel@tonic-gate        c  don't reset pos on failed matches when using /g
34*0Sstevel@tonic-gate
35*0Sstevel@tonic-gate     If 'pattern' is an empty string, the last I<successfully> matched
36*0Sstevel@tonic-gate     regex is used. Delimiters other than '/' may be used for both this
37*0Sstevel@tonic-gate     operator and the following ones.
38*0Sstevel@tonic-gate
39*0Sstevel@tonic-gate  qr/pattern/imsox lets you store a regex in a variable,
40*0Sstevel@tonic-gate     or pass one around. Modifiers as for m// and are stored
41*0Sstevel@tonic-gate     within the regex.
42*0Sstevel@tonic-gate
43*0Sstevel@tonic-gate  s/pattern/replacement/igmsoxe substitutes matches of
44*0Sstevel@tonic-gate     'pattern' with 'replacement'. Modifiers as for m//
45*0Sstevel@tonic-gate     with one addition:
46*0Sstevel@tonic-gate
47*0Sstevel@tonic-gate        e  Evaluate replacement as an expression
48*0Sstevel@tonic-gate
49*0Sstevel@tonic-gate     'e' may be specified multiple times. 'replacement' is interpreted
50*0Sstevel@tonic-gate     as a double quoted string unless a single-quote (') is the delimiter.
51*0Sstevel@tonic-gate
52*0Sstevel@tonic-gate  ?pattern? is like m/pattern/ but matches only once. No alternate
53*0Sstevel@tonic-gate      delimiters can be used. Must be reset with L<reset|perlfunc/reset>.
54*0Sstevel@tonic-gate
55*0Sstevel@tonic-gate=head2 SYNTAX
56*0Sstevel@tonic-gate
57*0Sstevel@tonic-gate   \       Escapes the character immediately following it
58*0Sstevel@tonic-gate   .       Matches any single character except a newline (unless /s is used)
59*0Sstevel@tonic-gate   ^       Matches at the beginning of the string (or line, if /m is used)
60*0Sstevel@tonic-gate   $       Matches at the end of the string (or line, if /m is used)
61*0Sstevel@tonic-gate   *       Matches the preceding element 0 or more times
62*0Sstevel@tonic-gate   +       Matches the preceding element 1 or more times
63*0Sstevel@tonic-gate   ?       Matches the preceding element 0 or 1 times
64*0Sstevel@tonic-gate   {...}   Specifies a range of occurrences for the element preceding it
65*0Sstevel@tonic-gate   [...]   Matches any one of the characters contained within the brackets
66*0Sstevel@tonic-gate   (...)   Groups subexpressions for capturing to $1, $2...
67*0Sstevel@tonic-gate   (?:...) Groups subexpressions without capturing (cluster)
68*0Sstevel@tonic-gate   |       Matches either the subexpression preceding or following it
69*0Sstevel@tonic-gate   \1, \2 ...  The text from the Nth group
70*0Sstevel@tonic-gate
71*0Sstevel@tonic-gate=head2 ESCAPE SEQUENCES
72*0Sstevel@tonic-gate
73*0Sstevel@tonic-gateThese work as in normal strings.
74*0Sstevel@tonic-gate
75*0Sstevel@tonic-gate   \a       Alarm (beep)
76*0Sstevel@tonic-gate   \e       Escape
77*0Sstevel@tonic-gate   \f       Formfeed
78*0Sstevel@tonic-gate   \n       Newline
79*0Sstevel@tonic-gate   \r       Carriage return
80*0Sstevel@tonic-gate   \t       Tab
81*0Sstevel@tonic-gate   \038     Any octal ASCII value
82*0Sstevel@tonic-gate   \x7f     Any hexadecimal ASCII value
83*0Sstevel@tonic-gate   \x{263a} A wide hexadecimal value
84*0Sstevel@tonic-gate   \cx      Control-x
85*0Sstevel@tonic-gate   \N{name} A named character
86*0Sstevel@tonic-gate
87*0Sstevel@tonic-gate   \l  Lowercase next character
88*0Sstevel@tonic-gate   \u  Titlecase next character
89*0Sstevel@tonic-gate   \L  Lowercase until \E
90*0Sstevel@tonic-gate   \U  Uppercase until \E
91*0Sstevel@tonic-gate   \Q  Disable pattern metacharacters until \E
92*0Sstevel@tonic-gate   \E  End case modification
93*0Sstevel@tonic-gate
94*0Sstevel@tonic-gateFor Titlecase, see L</Titlecase>.
95*0Sstevel@tonic-gate
96*0Sstevel@tonic-gateThis one works differently from normal strings:
97*0Sstevel@tonic-gate
98*0Sstevel@tonic-gate   \b  An assertion, not backspace, except in a character class
99*0Sstevel@tonic-gate
100*0Sstevel@tonic-gate=head2 CHARACTER CLASSES
101*0Sstevel@tonic-gate
102*0Sstevel@tonic-gate   [amy]    Match 'a', 'm' or 'y'
103*0Sstevel@tonic-gate   [f-j]    Dash specifies "range"
104*0Sstevel@tonic-gate   [f-j-]   Dash escaped or at start or end means 'dash'
105*0Sstevel@tonic-gate   [^f-j]   Caret indicates "match any character _except_ these"
106*0Sstevel@tonic-gate
107*0Sstevel@tonic-gateThe following sequences work within or without a character class.
108*0Sstevel@tonic-gateThe first six are locale aware, all are Unicode aware.  The default
109*0Sstevel@tonic-gatecharacter class equivalent are given.  See L<perllocale> and
110*0Sstevel@tonic-gateL<perlunicode> for details.
111*0Sstevel@tonic-gate
112*0Sstevel@tonic-gate   \d      A digit                     [0-9]
113*0Sstevel@tonic-gate   \D      A nondigit                  [^0-9]
114*0Sstevel@tonic-gate   \w      A word character            [a-zA-Z0-9_]
115*0Sstevel@tonic-gate   \W      A non-word character        [^a-zA-Z0-9_]
116*0Sstevel@tonic-gate   \s      A whitespace character      [ \t\n\r\f]
117*0Sstevel@tonic-gate   \S      A non-whitespace character  [^ \t\n\r\f]
118*0Sstevel@tonic-gate
119*0Sstevel@tonic-gate   \C      Match a byte (with Unicode, '.' matches a character)
120*0Sstevel@tonic-gate   \pP     Match P-named (Unicode) property
121*0Sstevel@tonic-gate   \p{...} Match Unicode property with long name
122*0Sstevel@tonic-gate   \PP     Match non-P
123*0Sstevel@tonic-gate   \P{...} Match lack of Unicode property with long name
124*0Sstevel@tonic-gate   \X      Match extended unicode sequence
125*0Sstevel@tonic-gate
126*0Sstevel@tonic-gatePOSIX character classes and their Unicode and Perl equivalents:
127*0Sstevel@tonic-gate
128*0Sstevel@tonic-gate   alnum   IsAlnum              Alphanumeric
129*0Sstevel@tonic-gate   alpha   IsAlpha              Alphabetic
130*0Sstevel@tonic-gate   ascii   IsASCII              Any ASCII char
131*0Sstevel@tonic-gate   blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
132*0Sstevel@tonic-gate   cntrl   IsCntrl              Control characters
133*0Sstevel@tonic-gate   digit   IsDigit  \d          Digits
134*0Sstevel@tonic-gate   graph   IsGraph              Alphanumeric and punctuation
135*0Sstevel@tonic-gate   lower   IsLower              Lowercase chars (locale and Unicode aware)
136*0Sstevel@tonic-gate   print   IsPrint              Alphanumeric, punct, and space
137*0Sstevel@tonic-gate   punct   IsPunct              Punctuation
138*0Sstevel@tonic-gate   space   IsSpace  [\s\ck]     Whitespace
139*0Sstevel@tonic-gate           IsSpacePerl   \s     Perl's whitespace definition
140*0Sstevel@tonic-gate   upper   IsUpper              Uppercase chars (locale and Unicode aware)
141*0Sstevel@tonic-gate   word    IsWord   \w          Alphanumeric plus _ (Perl extension)
142*0Sstevel@tonic-gate   xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
143*0Sstevel@tonic-gate
144*0Sstevel@tonic-gateWithin a character class:
145*0Sstevel@tonic-gate
146*0Sstevel@tonic-gate    POSIX       traditional   Unicode
147*0Sstevel@tonic-gate    [:digit:]       \d        \p{IsDigit}
148*0Sstevel@tonic-gate    [:^digit:]      \D        \P{IsDigit}
149*0Sstevel@tonic-gate
150*0Sstevel@tonic-gate=head2 ANCHORS
151*0Sstevel@tonic-gate
152*0Sstevel@tonic-gateAll are zero-width assertions.
153*0Sstevel@tonic-gate
154*0Sstevel@tonic-gate   ^  Match string start (or line, if /m is used)
155*0Sstevel@tonic-gate   $  Match string end (or line, if /m is used) or before newline
156*0Sstevel@tonic-gate   \b Match word boundary (between \w and \W)
157*0Sstevel@tonic-gate   \B Match except at word boundary (between \w and \w or \W and \W)
158*0Sstevel@tonic-gate   \A Match string start (regardless of /m)
159*0Sstevel@tonic-gate   \Z Match string end (before optional newline)
160*0Sstevel@tonic-gate   \z Match absolute string end
161*0Sstevel@tonic-gate   \G Match where previous m//g left off
162*0Sstevel@tonic-gate
163*0Sstevel@tonic-gate=head2 QUANTIFIERS
164*0Sstevel@tonic-gate
165*0Sstevel@tonic-gateQuantifiers are greedy by default -- match the B<longest> leftmost.
166*0Sstevel@tonic-gate
167*0Sstevel@tonic-gate   Maximal Minimal Allowed range
168*0Sstevel@tonic-gate   ------- ------- -------------
169*0Sstevel@tonic-gate   {n,m}   {n,m}?  Must occur at least n times but no more than m times
170*0Sstevel@tonic-gate   {n,}    {n,}?   Must occur at least n times
171*0Sstevel@tonic-gate   {n}     {n}?    Must occur exactly n times
172*0Sstevel@tonic-gate   *       *?      0 or more times (same as {0,})
173*0Sstevel@tonic-gate   +       +?      1 or more times (same as {1,})
174*0Sstevel@tonic-gate   ?       ??      0 or 1 time (same as {0,1})
175*0Sstevel@tonic-gate
176*0Sstevel@tonic-gateThere is no quantifier {,n} -- that gets understood as a literal string.
177*0Sstevel@tonic-gate
178*0Sstevel@tonic-gate=head2 EXTENDED CONSTRUCTS
179*0Sstevel@tonic-gate
180*0Sstevel@tonic-gate   (?#text)         A comment
181*0Sstevel@tonic-gate   (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
182*0Sstevel@tonic-gate   (?=...)          Zero-width positive lookahead assertion
183*0Sstevel@tonic-gate   (?!...)          Zero-width negative lookahead assertion
184*0Sstevel@tonic-gate   (?<=...)         Zero-width positive lookbehind assertion
185*0Sstevel@tonic-gate   (?<!...)         Zero-width negative lookbehind assertion
186*0Sstevel@tonic-gate   (?>...)          Grab what we can, prohibit backtracking
187*0Sstevel@tonic-gate   (?{ code })      Embedded code, return value becomes $^R
188*0Sstevel@tonic-gate   (??{ code })     Dynamic regex, return value used as regex
189*0Sstevel@tonic-gate   (?(cond)yes|no)  cond being integer corresponding to capturing parens
190*0Sstevel@tonic-gate   (?(cond)yes)        or a lookaround/eval zero-width assertion
191*0Sstevel@tonic-gate
192*0Sstevel@tonic-gate=head2 VARIABLES
193*0Sstevel@tonic-gate
194*0Sstevel@tonic-gate   $_    Default variable for operators to use
195*0Sstevel@tonic-gate   $*    Enable multiline matching (deprecated; not in 5.9.0 or later)
196*0Sstevel@tonic-gate
197*0Sstevel@tonic-gate   $&    Entire matched string
198*0Sstevel@tonic-gate   $`    Everything prior to matched string
199*0Sstevel@tonic-gate   $'    Everything after to matched string
200*0Sstevel@tonic-gate
201*0Sstevel@tonic-gateThe use of those last three will slow down B<all> regex use
202*0Sstevel@tonic-gatewithin your program. Consult L<perlvar> for C<@LAST_MATCH_START>
203*0Sstevel@tonic-gateto see equivalent expressions that won't cause slow down.
204*0Sstevel@tonic-gateSee also L<Devel::SawAmpersand>.
205*0Sstevel@tonic-gate
206*0Sstevel@tonic-gate   $1, $2 ...  hold the Xth captured expr
207*0Sstevel@tonic-gate   $+    Last parenthesized pattern match
208*0Sstevel@tonic-gate   $^N   Holds the most recently closed capture
209*0Sstevel@tonic-gate   $^R   Holds the result of the last (?{...}) expr
210*0Sstevel@tonic-gate   @-    Offsets of starts of groups. $-[0] holds start of whole match
211*0Sstevel@tonic-gate   @+    Offsets of ends of groups. $+[0] holds end of whole match
212*0Sstevel@tonic-gate
213*0Sstevel@tonic-gateCaptured groups are numbered according to their I<opening> paren.
214*0Sstevel@tonic-gate
215*0Sstevel@tonic-gate=head2 FUNCTIONS
216*0Sstevel@tonic-gate
217*0Sstevel@tonic-gate   lc          Lowercase a string
218*0Sstevel@tonic-gate   lcfirst     Lowercase first char of a string
219*0Sstevel@tonic-gate   uc          Uppercase a string
220*0Sstevel@tonic-gate   ucfirst     Titlecase first char of a string
221*0Sstevel@tonic-gate
222*0Sstevel@tonic-gate   pos         Return or set current match position
223*0Sstevel@tonic-gate   quotemeta   Quote metacharacters
224*0Sstevel@tonic-gate   reset       Reset ?pattern? status
225*0Sstevel@tonic-gate   study       Analyze string for optimizing matching
226*0Sstevel@tonic-gate
227*0Sstevel@tonic-gate   split       Use regex to split a string into parts
228*0Sstevel@tonic-gate
229*0Sstevel@tonic-gateThe first four of these are like the escape sequences C<\L>, C<\l>,
230*0Sstevel@tonic-gateC<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
231*0Sstevel@tonic-gate
232*0Sstevel@tonic-gate=head2 TERMINOLOGY
233*0Sstevel@tonic-gate
234*0Sstevel@tonic-gate=head3 Titlecase
235*0Sstevel@tonic-gate
236*0Sstevel@tonic-gateUnicode concept which most often is equal to uppercase, but for
237*0Sstevel@tonic-gatecertain characters like the German "sharp s" there is a difference.
238*0Sstevel@tonic-gate
239*0Sstevel@tonic-gate=head1 AUTHOR
240*0Sstevel@tonic-gate
241*0Sstevel@tonic-gateIain Truskett.
242*0Sstevel@tonic-gate
243*0Sstevel@tonic-gateThis document may be distributed under the same terms as Perl itself.
244*0Sstevel@tonic-gate
245*0Sstevel@tonic-gate=head1 SEE ALSO
246*0Sstevel@tonic-gate
247*0Sstevel@tonic-gate=over 4
248*0Sstevel@tonic-gate
249*0Sstevel@tonic-gate=item *
250*0Sstevel@tonic-gate
251*0Sstevel@tonic-gateL<perlretut> for a tutorial on regular expressions.
252*0Sstevel@tonic-gate
253*0Sstevel@tonic-gate=item *
254*0Sstevel@tonic-gate
255*0Sstevel@tonic-gateL<perlrequick> for a rapid tutorial.
256*0Sstevel@tonic-gate
257*0Sstevel@tonic-gate=item *
258*0Sstevel@tonic-gate
259*0Sstevel@tonic-gateL<perlre> for more details.
260*0Sstevel@tonic-gate
261*0Sstevel@tonic-gate=item *
262*0Sstevel@tonic-gate
263*0Sstevel@tonic-gateL<perlvar> for details on the variables.
264*0Sstevel@tonic-gate
265*0Sstevel@tonic-gate=item *
266*0Sstevel@tonic-gate
267*0Sstevel@tonic-gateL<perlop> for details on the operators.
268*0Sstevel@tonic-gate
269*0Sstevel@tonic-gate=item *
270*0Sstevel@tonic-gate
271*0Sstevel@tonic-gateL<perlfunc> for details on the functions.
272*0Sstevel@tonic-gate
273*0Sstevel@tonic-gate=item *
274*0Sstevel@tonic-gate
275*0Sstevel@tonic-gateL<perlfaq6> for FAQs on regular expressions.
276*0Sstevel@tonic-gate
277*0Sstevel@tonic-gate=item *
278*0Sstevel@tonic-gate
279*0Sstevel@tonic-gateThe L<re> module to alter behaviour and aid
280*0Sstevel@tonic-gatedebugging.
281*0Sstevel@tonic-gate
282*0Sstevel@tonic-gate=item *
283*0Sstevel@tonic-gate
284*0Sstevel@tonic-gateL<perldebug/"Debugging regular expressions">
285*0Sstevel@tonic-gate
286*0Sstevel@tonic-gate=item *
287*0Sstevel@tonic-gate
288*0Sstevel@tonic-gateL<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
289*0Sstevel@tonic-gatefor details on regexes and internationalisation.
290*0Sstevel@tonic-gate
291*0Sstevel@tonic-gate=item *
292*0Sstevel@tonic-gate
293*0Sstevel@tonic-gateI<Mastering Regular Expressions> by Jeffrey Friedl
294*0Sstevel@tonic-gate(F<http://regex.info/>) for a thorough grounding and
295*0Sstevel@tonic-gatereference on the topic.
296*0Sstevel@tonic-gate
297*0Sstevel@tonic-gate=back
298*0Sstevel@tonic-gate
299*0Sstevel@tonic-gate=head1 THANKS
300*0Sstevel@tonic-gate
301*0Sstevel@tonic-gateDavid P.C. Wollmann,
302*0Sstevel@tonic-gateRichard Soderberg,
303*0Sstevel@tonic-gateSean M. Burke,
304*0Sstevel@tonic-gateTom Christiansen,
305*0Sstevel@tonic-gateJim Cromie,
306*0Sstevel@tonic-gateand
307*0Sstevel@tonic-gateJeffrey Goff
308*0Sstevel@tonic-gatefor useful advice.
309*0Sstevel@tonic-gate
310*0Sstevel@tonic-gate=cut
311