1*0Sstevel@tonic-gate=head1 NAME 2*0Sstevel@tonic-gate 3*0Sstevel@tonic-gateperlreref - Perl Regular Expressions Reference 4*0Sstevel@tonic-gate 5*0Sstevel@tonic-gate=head1 DESCRIPTION 6*0Sstevel@tonic-gate 7*0Sstevel@tonic-gateThis is a quick reference to Perl's regular expressions. 8*0Sstevel@tonic-gateFor full information see L<perlre> and L<perlop>, as well 9*0Sstevel@tonic-gateas the L</"SEE ALSO"> section in this document. 10*0Sstevel@tonic-gate 11*0Sstevel@tonic-gate=head2 OPERATORS 12*0Sstevel@tonic-gate 13*0Sstevel@tonic-gate =~ determines to which variable the regex is applied. 14*0Sstevel@tonic-gate In its absence, $_ is used. 15*0Sstevel@tonic-gate 16*0Sstevel@tonic-gate $var =~ /foo/; 17*0Sstevel@tonic-gate 18*0Sstevel@tonic-gate !~ determines to which variable the regex is applied, 19*0Sstevel@tonic-gate and negates the result of the match; it returns 20*0Sstevel@tonic-gate false if the match succeeds, and true if it fails. 21*0Sstevel@tonic-gate 22*0Sstevel@tonic-gate $var !~ /foo/; 23*0Sstevel@tonic-gate 24*0Sstevel@tonic-gate m/pattern/igmsoxc searches a string for a pattern match, 25*0Sstevel@tonic-gate applying the given options. 26*0Sstevel@tonic-gate 27*0Sstevel@tonic-gate i case-Insensitive 28*0Sstevel@tonic-gate g Global - all occurrences 29*0Sstevel@tonic-gate m Multiline mode - ^ and $ match internal lines 30*0Sstevel@tonic-gate s match as a Single line - . matches \n 31*0Sstevel@tonic-gate o compile pattern Once 32*0Sstevel@tonic-gate x eXtended legibility - free whitespace and comments 33*0Sstevel@tonic-gate c don't reset pos on failed matches when using /g 34*0Sstevel@tonic-gate 35*0Sstevel@tonic-gate If 'pattern' is an empty string, the last I<successfully> matched 36*0Sstevel@tonic-gate regex is used. Delimiters other than '/' may be used for both this 37*0Sstevel@tonic-gate operator and the following ones. 38*0Sstevel@tonic-gate 39*0Sstevel@tonic-gate qr/pattern/imsox lets you store a regex in a variable, 40*0Sstevel@tonic-gate or pass one around. Modifiers as for m// and are stored 41*0Sstevel@tonic-gate within the regex. 42*0Sstevel@tonic-gate 43*0Sstevel@tonic-gate s/pattern/replacement/igmsoxe substitutes matches of 44*0Sstevel@tonic-gate 'pattern' with 'replacement'. Modifiers as for m// 45*0Sstevel@tonic-gate with one addition: 46*0Sstevel@tonic-gate 47*0Sstevel@tonic-gate e Evaluate replacement as an expression 48*0Sstevel@tonic-gate 49*0Sstevel@tonic-gate 'e' may be specified multiple times. 'replacement' is interpreted 50*0Sstevel@tonic-gate as a double quoted string unless a single-quote (') is the delimiter. 51*0Sstevel@tonic-gate 52*0Sstevel@tonic-gate ?pattern? is like m/pattern/ but matches only once. No alternate 53*0Sstevel@tonic-gate delimiters can be used. Must be reset with L<reset|perlfunc/reset>. 54*0Sstevel@tonic-gate 55*0Sstevel@tonic-gate=head2 SYNTAX 56*0Sstevel@tonic-gate 57*0Sstevel@tonic-gate \ Escapes the character immediately following it 58*0Sstevel@tonic-gate . Matches any single character except a newline (unless /s is used) 59*0Sstevel@tonic-gate ^ Matches at the beginning of the string (or line, if /m is used) 60*0Sstevel@tonic-gate $ Matches at the end of the string (or line, if /m is used) 61*0Sstevel@tonic-gate * Matches the preceding element 0 or more times 62*0Sstevel@tonic-gate + Matches the preceding element 1 or more times 63*0Sstevel@tonic-gate ? Matches the preceding element 0 or 1 times 64*0Sstevel@tonic-gate {...} Specifies a range of occurrences for the element preceding it 65*0Sstevel@tonic-gate [...] Matches any one of the characters contained within the brackets 66*0Sstevel@tonic-gate (...) Groups subexpressions for capturing to $1, $2... 67*0Sstevel@tonic-gate (?:...) Groups subexpressions without capturing (cluster) 68*0Sstevel@tonic-gate | Matches either the subexpression preceding or following it 69*0Sstevel@tonic-gate \1, \2 ... The text from the Nth group 70*0Sstevel@tonic-gate 71*0Sstevel@tonic-gate=head2 ESCAPE SEQUENCES 72*0Sstevel@tonic-gate 73*0Sstevel@tonic-gateThese work as in normal strings. 74*0Sstevel@tonic-gate 75*0Sstevel@tonic-gate \a Alarm (beep) 76*0Sstevel@tonic-gate \e Escape 77*0Sstevel@tonic-gate \f Formfeed 78*0Sstevel@tonic-gate \n Newline 79*0Sstevel@tonic-gate \r Carriage return 80*0Sstevel@tonic-gate \t Tab 81*0Sstevel@tonic-gate \038 Any octal ASCII value 82*0Sstevel@tonic-gate \x7f Any hexadecimal ASCII value 83*0Sstevel@tonic-gate \x{263a} A wide hexadecimal value 84*0Sstevel@tonic-gate \cx Control-x 85*0Sstevel@tonic-gate \N{name} A named character 86*0Sstevel@tonic-gate 87*0Sstevel@tonic-gate \l Lowercase next character 88*0Sstevel@tonic-gate \u Titlecase next character 89*0Sstevel@tonic-gate \L Lowercase until \E 90*0Sstevel@tonic-gate \U Uppercase until \E 91*0Sstevel@tonic-gate \Q Disable pattern metacharacters until \E 92*0Sstevel@tonic-gate \E End case modification 93*0Sstevel@tonic-gate 94*0Sstevel@tonic-gateFor Titlecase, see L</Titlecase>. 95*0Sstevel@tonic-gate 96*0Sstevel@tonic-gateThis one works differently from normal strings: 97*0Sstevel@tonic-gate 98*0Sstevel@tonic-gate \b An assertion, not backspace, except in a character class 99*0Sstevel@tonic-gate 100*0Sstevel@tonic-gate=head2 CHARACTER CLASSES 101*0Sstevel@tonic-gate 102*0Sstevel@tonic-gate [amy] Match 'a', 'm' or 'y' 103*0Sstevel@tonic-gate [f-j] Dash specifies "range" 104*0Sstevel@tonic-gate [f-j-] Dash escaped or at start or end means 'dash' 105*0Sstevel@tonic-gate [^f-j] Caret indicates "match any character _except_ these" 106*0Sstevel@tonic-gate 107*0Sstevel@tonic-gateThe following sequences work within or without a character class. 108*0Sstevel@tonic-gateThe first six are locale aware, all are Unicode aware. The default 109*0Sstevel@tonic-gatecharacter class equivalent are given. See L<perllocale> and 110*0Sstevel@tonic-gateL<perlunicode> for details. 111*0Sstevel@tonic-gate 112*0Sstevel@tonic-gate \d A digit [0-9] 113*0Sstevel@tonic-gate \D A nondigit [^0-9] 114*0Sstevel@tonic-gate \w A word character [a-zA-Z0-9_] 115*0Sstevel@tonic-gate \W A non-word character [^a-zA-Z0-9_] 116*0Sstevel@tonic-gate \s A whitespace character [ \t\n\r\f] 117*0Sstevel@tonic-gate \S A non-whitespace character [^ \t\n\r\f] 118*0Sstevel@tonic-gate 119*0Sstevel@tonic-gate \C Match a byte (with Unicode, '.' matches a character) 120*0Sstevel@tonic-gate \pP Match P-named (Unicode) property 121*0Sstevel@tonic-gate \p{...} Match Unicode property with long name 122*0Sstevel@tonic-gate \PP Match non-P 123*0Sstevel@tonic-gate \P{...} Match lack of Unicode property with long name 124*0Sstevel@tonic-gate \X Match extended unicode sequence 125*0Sstevel@tonic-gate 126*0Sstevel@tonic-gatePOSIX character classes and their Unicode and Perl equivalents: 127*0Sstevel@tonic-gate 128*0Sstevel@tonic-gate alnum IsAlnum Alphanumeric 129*0Sstevel@tonic-gate alpha IsAlpha Alphabetic 130*0Sstevel@tonic-gate ascii IsASCII Any ASCII char 131*0Sstevel@tonic-gate blank IsSpace [ \t] Horizontal whitespace (GNU extension) 132*0Sstevel@tonic-gate cntrl IsCntrl Control characters 133*0Sstevel@tonic-gate digit IsDigit \d Digits 134*0Sstevel@tonic-gate graph IsGraph Alphanumeric and punctuation 135*0Sstevel@tonic-gate lower IsLower Lowercase chars (locale and Unicode aware) 136*0Sstevel@tonic-gate print IsPrint Alphanumeric, punct, and space 137*0Sstevel@tonic-gate punct IsPunct Punctuation 138*0Sstevel@tonic-gate space IsSpace [\s\ck] Whitespace 139*0Sstevel@tonic-gate IsSpacePerl \s Perl's whitespace definition 140*0Sstevel@tonic-gate upper IsUpper Uppercase chars (locale and Unicode aware) 141*0Sstevel@tonic-gate word IsWord \w Alphanumeric plus _ (Perl extension) 142*0Sstevel@tonic-gate xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit 143*0Sstevel@tonic-gate 144*0Sstevel@tonic-gateWithin a character class: 145*0Sstevel@tonic-gate 146*0Sstevel@tonic-gate POSIX traditional Unicode 147*0Sstevel@tonic-gate [:digit:] \d \p{IsDigit} 148*0Sstevel@tonic-gate [:^digit:] \D \P{IsDigit} 149*0Sstevel@tonic-gate 150*0Sstevel@tonic-gate=head2 ANCHORS 151*0Sstevel@tonic-gate 152*0Sstevel@tonic-gateAll are zero-width assertions. 153*0Sstevel@tonic-gate 154*0Sstevel@tonic-gate ^ Match string start (or line, if /m is used) 155*0Sstevel@tonic-gate $ Match string end (or line, if /m is used) or before newline 156*0Sstevel@tonic-gate \b Match word boundary (between \w and \W) 157*0Sstevel@tonic-gate \B Match except at word boundary (between \w and \w or \W and \W) 158*0Sstevel@tonic-gate \A Match string start (regardless of /m) 159*0Sstevel@tonic-gate \Z Match string end (before optional newline) 160*0Sstevel@tonic-gate \z Match absolute string end 161*0Sstevel@tonic-gate \G Match where previous m//g left off 162*0Sstevel@tonic-gate 163*0Sstevel@tonic-gate=head2 QUANTIFIERS 164*0Sstevel@tonic-gate 165*0Sstevel@tonic-gateQuantifiers are greedy by default -- match the B<longest> leftmost. 166*0Sstevel@tonic-gate 167*0Sstevel@tonic-gate Maximal Minimal Allowed range 168*0Sstevel@tonic-gate ------- ------- ------------- 169*0Sstevel@tonic-gate {n,m} {n,m}? Must occur at least n times but no more than m times 170*0Sstevel@tonic-gate {n,} {n,}? Must occur at least n times 171*0Sstevel@tonic-gate {n} {n}? Must occur exactly n times 172*0Sstevel@tonic-gate * *? 0 or more times (same as {0,}) 173*0Sstevel@tonic-gate + +? 1 or more times (same as {1,}) 174*0Sstevel@tonic-gate ? ?? 0 or 1 time (same as {0,1}) 175*0Sstevel@tonic-gate 176*0Sstevel@tonic-gateThere is no quantifier {,n} -- that gets understood as a literal string. 177*0Sstevel@tonic-gate 178*0Sstevel@tonic-gate=head2 EXTENDED CONSTRUCTS 179*0Sstevel@tonic-gate 180*0Sstevel@tonic-gate (?#text) A comment 181*0Sstevel@tonic-gate (?imxs-imsx:...) Enable/disable option (as per m// modifiers) 182*0Sstevel@tonic-gate (?=...) Zero-width positive lookahead assertion 183*0Sstevel@tonic-gate (?!...) Zero-width negative lookahead assertion 184*0Sstevel@tonic-gate (?<=...) Zero-width positive lookbehind assertion 185*0Sstevel@tonic-gate (?<!...) Zero-width negative lookbehind assertion 186*0Sstevel@tonic-gate (?>...) Grab what we can, prohibit backtracking 187*0Sstevel@tonic-gate (?{ code }) Embedded code, return value becomes $^R 188*0Sstevel@tonic-gate (??{ code }) Dynamic regex, return value used as regex 189*0Sstevel@tonic-gate (?(cond)yes|no) cond being integer corresponding to capturing parens 190*0Sstevel@tonic-gate (?(cond)yes) or a lookaround/eval zero-width assertion 191*0Sstevel@tonic-gate 192*0Sstevel@tonic-gate=head2 VARIABLES 193*0Sstevel@tonic-gate 194*0Sstevel@tonic-gate $_ Default variable for operators to use 195*0Sstevel@tonic-gate $* Enable multiline matching (deprecated; not in 5.9.0 or later) 196*0Sstevel@tonic-gate 197*0Sstevel@tonic-gate $& Entire matched string 198*0Sstevel@tonic-gate $` Everything prior to matched string 199*0Sstevel@tonic-gate $' Everything after to matched string 200*0Sstevel@tonic-gate 201*0Sstevel@tonic-gateThe use of those last three will slow down B<all> regex use 202*0Sstevel@tonic-gatewithin your program. Consult L<perlvar> for C<@LAST_MATCH_START> 203*0Sstevel@tonic-gateto see equivalent expressions that won't cause slow down. 204*0Sstevel@tonic-gateSee also L<Devel::SawAmpersand>. 205*0Sstevel@tonic-gate 206*0Sstevel@tonic-gate $1, $2 ... hold the Xth captured expr 207*0Sstevel@tonic-gate $+ Last parenthesized pattern match 208*0Sstevel@tonic-gate $^N Holds the most recently closed capture 209*0Sstevel@tonic-gate $^R Holds the result of the last (?{...}) expr 210*0Sstevel@tonic-gate @- Offsets of starts of groups. $-[0] holds start of whole match 211*0Sstevel@tonic-gate @+ Offsets of ends of groups. $+[0] holds end of whole match 212*0Sstevel@tonic-gate 213*0Sstevel@tonic-gateCaptured groups are numbered according to their I<opening> paren. 214*0Sstevel@tonic-gate 215*0Sstevel@tonic-gate=head2 FUNCTIONS 216*0Sstevel@tonic-gate 217*0Sstevel@tonic-gate lc Lowercase a string 218*0Sstevel@tonic-gate lcfirst Lowercase first char of a string 219*0Sstevel@tonic-gate uc Uppercase a string 220*0Sstevel@tonic-gate ucfirst Titlecase first char of a string 221*0Sstevel@tonic-gate 222*0Sstevel@tonic-gate pos Return or set current match position 223*0Sstevel@tonic-gate quotemeta Quote metacharacters 224*0Sstevel@tonic-gate reset Reset ?pattern? status 225*0Sstevel@tonic-gate study Analyze string for optimizing matching 226*0Sstevel@tonic-gate 227*0Sstevel@tonic-gate split Use regex to split a string into parts 228*0Sstevel@tonic-gate 229*0Sstevel@tonic-gateThe first four of these are like the escape sequences C<\L>, C<\l>, 230*0Sstevel@tonic-gateC<\U>, and C<\u>. For Titlecase, see L</Titlecase>. 231*0Sstevel@tonic-gate 232*0Sstevel@tonic-gate=head2 TERMINOLOGY 233*0Sstevel@tonic-gate 234*0Sstevel@tonic-gate=head3 Titlecase 235*0Sstevel@tonic-gate 236*0Sstevel@tonic-gateUnicode concept which most often is equal to uppercase, but for 237*0Sstevel@tonic-gatecertain characters like the German "sharp s" there is a difference. 238*0Sstevel@tonic-gate 239*0Sstevel@tonic-gate=head1 AUTHOR 240*0Sstevel@tonic-gate 241*0Sstevel@tonic-gateIain Truskett. 242*0Sstevel@tonic-gate 243*0Sstevel@tonic-gateThis document may be distributed under the same terms as Perl itself. 244*0Sstevel@tonic-gate 245*0Sstevel@tonic-gate=head1 SEE ALSO 246*0Sstevel@tonic-gate 247*0Sstevel@tonic-gate=over 4 248*0Sstevel@tonic-gate 249*0Sstevel@tonic-gate=item * 250*0Sstevel@tonic-gate 251*0Sstevel@tonic-gateL<perlretut> for a tutorial on regular expressions. 252*0Sstevel@tonic-gate 253*0Sstevel@tonic-gate=item * 254*0Sstevel@tonic-gate 255*0Sstevel@tonic-gateL<perlrequick> for a rapid tutorial. 256*0Sstevel@tonic-gate 257*0Sstevel@tonic-gate=item * 258*0Sstevel@tonic-gate 259*0Sstevel@tonic-gateL<perlre> for more details. 260*0Sstevel@tonic-gate 261*0Sstevel@tonic-gate=item * 262*0Sstevel@tonic-gate 263*0Sstevel@tonic-gateL<perlvar> for details on the variables. 264*0Sstevel@tonic-gate 265*0Sstevel@tonic-gate=item * 266*0Sstevel@tonic-gate 267*0Sstevel@tonic-gateL<perlop> for details on the operators. 268*0Sstevel@tonic-gate 269*0Sstevel@tonic-gate=item * 270*0Sstevel@tonic-gate 271*0Sstevel@tonic-gateL<perlfunc> for details on the functions. 272*0Sstevel@tonic-gate 273*0Sstevel@tonic-gate=item * 274*0Sstevel@tonic-gate 275*0Sstevel@tonic-gateL<perlfaq6> for FAQs on regular expressions. 276*0Sstevel@tonic-gate 277*0Sstevel@tonic-gate=item * 278*0Sstevel@tonic-gate 279*0Sstevel@tonic-gateThe L<re> module to alter behaviour and aid 280*0Sstevel@tonic-gatedebugging. 281*0Sstevel@tonic-gate 282*0Sstevel@tonic-gate=item * 283*0Sstevel@tonic-gate 284*0Sstevel@tonic-gateL<perldebug/"Debugging regular expressions"> 285*0Sstevel@tonic-gate 286*0Sstevel@tonic-gate=item * 287*0Sstevel@tonic-gate 288*0Sstevel@tonic-gateL<perluniintro>, L<perlunicode>, L<charnames> and L<locale> 289*0Sstevel@tonic-gatefor details on regexes and internationalisation. 290*0Sstevel@tonic-gate 291*0Sstevel@tonic-gate=item * 292*0Sstevel@tonic-gate 293*0Sstevel@tonic-gateI<Mastering Regular Expressions> by Jeffrey Friedl 294*0Sstevel@tonic-gate(F<http://regex.info/>) for a thorough grounding and 295*0Sstevel@tonic-gatereference on the topic. 296*0Sstevel@tonic-gate 297*0Sstevel@tonic-gate=back 298*0Sstevel@tonic-gate 299*0Sstevel@tonic-gate=head1 THANKS 300*0Sstevel@tonic-gate 301*0Sstevel@tonic-gateDavid P.C. Wollmann, 302*0Sstevel@tonic-gateRichard Soderberg, 303*0Sstevel@tonic-gateSean M. Burke, 304*0Sstevel@tonic-gateTom Christiansen, 305*0Sstevel@tonic-gateJim Cromie, 306*0Sstevel@tonic-gateand 307*0Sstevel@tonic-gateJeffrey Goff 308*0Sstevel@tonic-gatefor useful advice. 309*0Sstevel@tonic-gate 310*0Sstevel@tonic-gate=cut 311