1=head1 NAME 2 3perlreref - Perl Regular Expressions Reference 4 5=head1 DESCRIPTION 6 7This is a quick reference to Perl's regular expressions. 8For full information see L<perlre> and L<perlop>, as well 9as the L</"SEE ALSO"> section in this document. 10 11=head2 OPERATORS 12 13 =~ determines to which variable the regex is applied. 14 In its absence, $_ is used. 15 16 $var =~ /foo/; 17 18 !~ determines to which variable the regex is applied, 19 and negates the result of the match; it returns 20 false if the match succeeds, and true if it fails. 21 22 $var !~ /foo/; 23 24 m/pattern/igmsoxc searches a string for a pattern match, 25 applying the given options. 26 27 i case-Insensitive 28 g Global - all occurrences 29 m Multiline mode - ^ and $ match internal lines 30 s match as a Single line - . matches \n 31 o compile pattern Once 32 x eXtended legibility - free whitespace and comments 33 c don't reset pos on failed matches when using /g 34 35 If 'pattern' is an empty string, the last I<successfully> matched 36 regex is used. Delimiters other than '/' may be used for both this 37 operator and the following ones. 38 39 qr/pattern/imsox lets you store a regex in a variable, 40 or pass one around. Modifiers as for m// and are stored 41 within the regex. 42 43 s/pattern/replacement/igmsoxe substitutes matches of 44 'pattern' with 'replacement'. Modifiers as for m// 45 with one addition: 46 47 e Evaluate replacement as an expression 48 49 'e' may be specified multiple times. 'replacement' is interpreted 50 as a double quoted string unless a single-quote (') is the delimiter. 51 52 ?pattern? is like m/pattern/ but matches only once. No alternate 53 delimiters can be used. Must be reset with L<reset|perlfunc/reset>. 54 55=head2 SYNTAX 56 57 \ Escapes the character immediately following it 58 . Matches any single character except a newline (unless /s is used) 59 ^ Matches at the beginning of the string (or line, if /m is used) 60 $ Matches at the end of the string (or line, if /m is used) 61 * Matches the preceding element 0 or more times 62 + Matches the preceding element 1 or more times 63 ? Matches the preceding element 0 or 1 times 64 {...} Specifies a range of occurrences for the element preceding it 65 [...] Matches any one of the characters contained within the brackets 66 (...) Groups subexpressions for capturing to $1, $2... 67 (?:...) Groups subexpressions without capturing (cluster) 68 | Matches either the subexpression preceding or following it 69 \1, \2 ... The text from the Nth group 70 71=head2 ESCAPE SEQUENCES 72 73These work as in normal strings. 74 75 \a Alarm (beep) 76 \e Escape 77 \f Formfeed 78 \n Newline 79 \r Carriage return 80 \t Tab 81 \038 Any octal ASCII value 82 \x7f Any hexadecimal ASCII value 83 \x{263a} A wide hexadecimal value 84 \cx Control-x 85 \N{name} A named character 86 87 \l Lowercase next character 88 \u Titlecase next character 89 \L Lowercase until \E 90 \U Uppercase until \E 91 \Q Disable pattern metacharacters until \E 92 \E End case modification 93 94For Titlecase, see L</Titlecase>. 95 96This one works differently from normal strings: 97 98 \b An assertion, not backspace, except in a character class 99 100=head2 CHARACTER CLASSES 101 102 [amy] Match 'a', 'm' or 'y' 103 [f-j] Dash specifies "range" 104 [f-j-] Dash escaped or at start or end means 'dash' 105 [^f-j] Caret indicates "match any character _except_ these" 106 107The following sequences work within or without a character class. 108The first six are locale aware, all are Unicode aware. The default 109character class equivalent are given. See L<perllocale> and 110L<perlunicode> for details. 111 112 \d A digit [0-9] 113 \D A nondigit [^0-9] 114 \w A word character [a-zA-Z0-9_] 115 \W A non-word character [^a-zA-Z0-9_] 116 \s A whitespace character [ \t\n\r\f] 117 \S A non-whitespace character [^ \t\n\r\f] 118 119 \C Match a byte (with Unicode, '.' matches a character) 120 \pP Match P-named (Unicode) property 121 \p{...} Match Unicode property with long name 122 \PP Match non-P 123 \P{...} Match lack of Unicode property with long name 124 \X Match extended unicode sequence 125 126POSIX character classes and their Unicode and Perl equivalents: 127 128 alnum IsAlnum Alphanumeric 129 alpha IsAlpha Alphabetic 130 ascii IsASCII Any ASCII char 131 blank IsSpace [ \t] Horizontal whitespace (GNU extension) 132 cntrl IsCntrl Control characters 133 digit IsDigit \d Digits 134 graph IsGraph Alphanumeric and punctuation 135 lower IsLower Lowercase chars (locale and Unicode aware) 136 print IsPrint Alphanumeric, punct, and space 137 punct IsPunct Punctuation 138 space IsSpace [\s\ck] Whitespace 139 IsSpacePerl \s Perl's whitespace definition 140 upper IsUpper Uppercase chars (locale and Unicode aware) 141 word IsWord \w Alphanumeric plus _ (Perl extension) 142 xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit 143 144Within a character class: 145 146 POSIX traditional Unicode 147 [:digit:] \d \p{IsDigit} 148 [:^digit:] \D \P{IsDigit} 149 150=head2 ANCHORS 151 152All are zero-width assertions. 153 154 ^ Match string start (or line, if /m is used) 155 $ Match string end (or line, if /m is used) or before newline 156 \b Match word boundary (between \w and \W) 157 \B Match except at word boundary (between \w and \w or \W and \W) 158 \A Match string start (regardless of /m) 159 \Z Match string end (before optional newline) 160 \z Match absolute string end 161 \G Match where previous m//g left off 162 163=head2 QUANTIFIERS 164 165Quantifiers are greedy by default -- match the B<longest> leftmost. 166 167 Maximal Minimal Allowed range 168 ------- ------- ------------- 169 {n,m} {n,m}? Must occur at least n times but no more than m times 170 {n,} {n,}? Must occur at least n times 171 {n} {n}? Must occur exactly n times 172 * *? 0 or more times (same as {0,}) 173 + +? 1 or more times (same as {1,}) 174 ? ?? 0 or 1 time (same as {0,1}) 175 176There is no quantifier {,n} -- that gets understood as a literal string. 177 178=head2 EXTENDED CONSTRUCTS 179 180 (?#text) A comment 181 (?imxs-imsx:...) Enable/disable option (as per m// modifiers) 182 (?=...) Zero-width positive lookahead assertion 183 (?!...) Zero-width negative lookahead assertion 184 (?<=...) Zero-width positive lookbehind assertion 185 (?<!...) Zero-width negative lookbehind assertion 186 (?>...) Grab what we can, prohibit backtracking 187 (?{ code }) Embedded code, return value becomes $^R 188 (??{ code }) Dynamic regex, return value used as regex 189 (?(cond)yes|no) cond being integer corresponding to capturing parens 190 (?(cond)yes) or a lookaround/eval zero-width assertion 191 192=head2 VARIABLES 193 194 $_ Default variable for operators to use 195 $* Enable multiline matching (deprecated; not in 5.9.0 or later) 196 197 $& Entire matched string 198 $` Everything prior to matched string 199 $' Everything after to matched string 200 201The use of those last three will slow down B<all> regex use 202within your program. Consult L<perlvar> for C<@LAST_MATCH_START> 203to see equivalent expressions that won't cause slow down. 204See also L<Devel::SawAmpersand>. 205 206 $1, $2 ... hold the Xth captured expr 207 $+ Last parenthesized pattern match 208 $^N Holds the most recently closed capture 209 $^R Holds the result of the last (?{...}) expr 210 @- Offsets of starts of groups. $-[0] holds start of whole match 211 @+ Offsets of ends of groups. $+[0] holds end of whole match 212 213Captured groups are numbered according to their I<opening> paren. 214 215=head2 FUNCTIONS 216 217 lc Lowercase a string 218 lcfirst Lowercase first char of a string 219 uc Uppercase a string 220 ucfirst Titlecase first char of a string 221 222 pos Return or set current match position 223 quotemeta Quote metacharacters 224 reset Reset ?pattern? status 225 study Analyze string for optimizing matching 226 227 split Use regex to split a string into parts 228 229The first four of these are like the escape sequences C<\L>, C<\l>, 230C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. 231 232=head2 TERMINOLOGY 233 234=head3 Titlecase 235 236Unicode concept which most often is equal to uppercase, but for 237certain characters like the German "sharp s" there is a difference. 238 239=head1 AUTHOR 240 241Iain Truskett. 242 243This document may be distributed under the same terms as Perl itself. 244 245=head1 SEE ALSO 246 247=over 4 248 249=item * 250 251L<perlretut> for a tutorial on regular expressions. 252 253=item * 254 255L<perlrequick> for a rapid tutorial. 256 257=item * 258 259L<perlre> for more details. 260 261=item * 262 263L<perlvar> for details on the variables. 264 265=item * 266 267L<perlop> for details on the operators. 268 269=item * 270 271L<perlfunc> for details on the functions. 272 273=item * 274 275L<perlfaq6> for FAQs on regular expressions. 276 277=item * 278 279The L<re> module to alter behaviour and aid 280debugging. 281 282=item * 283 284L<perldebug/"Debugging regular expressions"> 285 286=item * 287 288L<perluniintro>, L<perlunicode>, L<charnames> and L<locale> 289for details on regexes and internationalisation. 290 291=item * 292 293I<Mastering Regular Expressions> by Jeffrey Friedl 294(F<http://regex.info/>) for a thorough grounding and 295reference on the topic. 296 297=back 298 299=head1 THANKS 300 301David P.C. Wollmann, 302Richard Soderberg, 303Sean M. Burke, 304Tom Christiansen, 305Jim Cromie, 306and 307Jeffrey Goff 308for useful advice. 309 310=cut 311