1*f375e9a0Smike-m.\" $OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $ 2*f375e9a0Smike-m.\" 3*f375e9a0Smike-m.\" Copyright (c) 1997, Phillip F Knaack. All rights reserved. 4*f375e9a0Smike-m.\" 5*f375e9a0Smike-m.\" Copyright (c) 1992, 1993, 1994 Henry Spencer. 6*f375e9a0Smike-m.\" Copyright (c) 1992, 1993, 1994 7*f375e9a0Smike-m.\" The Regents of the University of California. All rights reserved. 8*f375e9a0Smike-m.\" 9*f375e9a0Smike-m.\" This code is derived from software contributed to Berkeley by 10*f375e9a0Smike-m.\" Henry Spencer. 11*f375e9a0Smike-m.\" 12*f375e9a0Smike-m.\" Redistribution and use in source and binary forms, with or without 13*f375e9a0Smike-m.\" modification, are permitted provided that the following conditions 14*f375e9a0Smike-m.\" are met: 15*f375e9a0Smike-m.\" 1. Redistributions of source code must retain the above copyright 16*f375e9a0Smike-m.\" notice, this list of conditions and the following disclaimer. 17*f375e9a0Smike-m.\" 2. Redistributions in binary form must reproduce the above copyright 18*f375e9a0Smike-m.\" notice, this list of conditions and the following disclaimer in the 19*f375e9a0Smike-m.\" documentation and/or other materials provided with the distribution. 20*f375e9a0Smike-m.\" 3. Neither the name of the University nor the names of its contributors 21*f375e9a0Smike-m.\" may be used to endorse or promote products derived from this software 22*f375e9a0Smike-m.\" without specific prior written permission. 23*f375e9a0Smike-m.\" 24*f375e9a0Smike-m.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 25*f375e9a0Smike-m.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 26*f375e9a0Smike-m.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 27*f375e9a0Smike-m.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 28*f375e9a0Smike-m.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 29*f375e9a0Smike-m.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 30*f375e9a0Smike-m.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 31*f375e9a0Smike-m.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 32*f375e9a0Smike-m.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 33*f375e9a0Smike-m.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 34*f375e9a0Smike-m.\" SUCH DAMAGE. 35*f375e9a0Smike-m.\" 36*f375e9a0Smike-m.\" @(#)re_format.7 8.3 (Berkeley) 3/20/94 37*f375e9a0Smike-m.\" 38*f375e9a0Smike-m.Dd $Mdocdate: May 31 2007 $ 39*f375e9a0Smike-m.Dt RE_FORMAT 7 40*f375e9a0Smike-m.Os 41*f375e9a0Smike-m.Sh NAME 42*f375e9a0Smike-m.Nm re_format 43*f375e9a0Smike-m.Nd POSIX regular expressions 44*f375e9a0Smike-m.Sh DESCRIPTION 45*f375e9a0Smike-mRegular expressions (REs), 46*f375e9a0Smike-mas defined in 47*f375e9a0Smike-m.St -p1003.1-2004 , 48*f375e9a0Smike-mcome in two forms: 49*f375e9a0Smike-mbasic regular expressions 50*f375e9a0Smike-m(BREs) 51*f375e9a0Smike-mand extended regular expressions 52*f375e9a0Smike-m(EREs). 53*f375e9a0Smike-mBoth forms of regular expressions are supported 54*f375e9a0Smike-mby the interfaces described in 55*f375e9a0Smike-m.Xr regex 3 . 56*f375e9a0Smike-mApplications dealing with regular expressions 57*f375e9a0Smike-mmay use one or the other form 58*f375e9a0Smike-m(or indeed both). 59*f375e9a0Smike-mFor example, 60*f375e9a0Smike-m.Xr ed 1 61*f375e9a0Smike-muses BREs, 62*f375e9a0Smike-mwhilst 63*f375e9a0Smike-m.Xr egrep 1 64*f375e9a0Smike-mtalks EREs. 65*f375e9a0Smike-mConsult the manual page for the specific application to find out which 66*f375e9a0Smike-mit uses. 67*f375e9a0Smike-m.Pp 68*f375e9a0Smike-mPOSIX leaves some aspects of RE syntax and semantics open; 69*f375e9a0Smike-m.Sq ** 70*f375e9a0Smike-mmarks decisions on these aspects that 71*f375e9a0Smike-mmay not be fully portable to other POSIX implementations. 72*f375e9a0Smike-m.Pp 73*f375e9a0Smike-mThis manual page first describes regular expressions in general, 74*f375e9a0Smike-mspecifically extended regular expressions, 75*f375e9a0Smike-mand then discusses differences between them and basic regular expressions. 76*f375e9a0Smike-m.Sh EXTENDED REGULAR EXPRESSIONS 77*f375e9a0Smike-mAn ERE is one** or more non-empty** 78*f375e9a0Smike-m.Em branches , 79*f375e9a0Smike-mseparated by 80*f375e9a0Smike-m.Sq \*(Ba . 81*f375e9a0Smike-mIt matches anything that matches one of the branches. 82*f375e9a0Smike-m.Pp 83*f375e9a0Smike-mA branch is one** or more 84*f375e9a0Smike-m.Em pieces , 85*f375e9a0Smike-mconcatenated. 86*f375e9a0Smike-mIt matches a match for the first, followed by a match for the second, etc. 87*f375e9a0Smike-m.Pp 88*f375e9a0Smike-mA piece is an 89*f375e9a0Smike-m.Em atom 90*f375e9a0Smike-mpossibly followed by a single** 91*f375e9a0Smike-m.Sq * , 92*f375e9a0Smike-m.Sq + , 93*f375e9a0Smike-m.Sq ?\& , 94*f375e9a0Smike-mor 95*f375e9a0Smike-m.Em bound . 96*f375e9a0Smike-mAn atom followed by 97*f375e9a0Smike-m.Sq * 98*f375e9a0Smike-mmatches a sequence of 0 or more matches of the atom. 99*f375e9a0Smike-mAn atom followed by 100*f375e9a0Smike-m.Sq + 101*f375e9a0Smike-mmatches a sequence of 1 or more matches of the atom. 102*f375e9a0Smike-mAn atom followed by 103*f375e9a0Smike-m.Sq ?\& 104*f375e9a0Smike-mmatches a sequence of 0 or 1 matches of the atom. 105*f375e9a0Smike-m.Pp 106*f375e9a0Smike-mA bound is 107*f375e9a0Smike-m.Sq { 108*f375e9a0Smike-mfollowed by an unsigned decimal integer, 109*f375e9a0Smike-mpossibly followed by 110*f375e9a0Smike-m.Sq ,\& 111*f375e9a0Smike-mpossibly followed by another unsigned decimal integer, 112*f375e9a0Smike-malways followed by 113*f375e9a0Smike-m.Sq } . 114*f375e9a0Smike-mThe integers must lie between 0 and 115*f375e9a0Smike-m.Dv RE_DUP_MAX 116*f375e9a0Smike-m(255**) inclusive, 117*f375e9a0Smike-mand if there are two of them, the first may not exceed the second. 118*f375e9a0Smike-mAn atom followed by a bound containing one integer 119*f375e9a0Smike-m.Ar i 120*f375e9a0Smike-mand no comma matches 121*f375e9a0Smike-ma sequence of exactly 122*f375e9a0Smike-m.Ar i 123*f375e9a0Smike-mmatches of the atom. 124*f375e9a0Smike-mAn atom followed by a bound 125*f375e9a0Smike-mcontaining one integer 126*f375e9a0Smike-m.Ar i 127*f375e9a0Smike-mand a comma matches 128*f375e9a0Smike-ma sequence of 129*f375e9a0Smike-m.Ar i 130*f375e9a0Smike-mor more matches of the atom. 131*f375e9a0Smike-mAn atom followed by a bound 132*f375e9a0Smike-mcontaining two integers 133*f375e9a0Smike-m.Ar i 134*f375e9a0Smike-mand 135*f375e9a0Smike-m.Ar j 136*f375e9a0Smike-mmatches a sequence of 137*f375e9a0Smike-m.Ar i 138*f375e9a0Smike-mthrough 139*f375e9a0Smike-m.Ar j 140*f375e9a0Smike-m(inclusive) matches of the atom. 141*f375e9a0Smike-m.Pp 142*f375e9a0Smike-mAn atom is a regular expression enclosed in 143*f375e9a0Smike-m.Sq () 144*f375e9a0Smike-m(matching a part of the regular expression), 145*f375e9a0Smike-man empty set of 146*f375e9a0Smike-m.Sq () 147*f375e9a0Smike-m(matching the null string)**, 148*f375e9a0Smike-ma 149*f375e9a0Smike-m.Em bracket expression 150*f375e9a0Smike-m(see below), 151*f375e9a0Smike-m.Sq .\& 152*f375e9a0Smike-m(matching any single character), 153*f375e9a0Smike-m.Sq ^ 154*f375e9a0Smike-m(matching the null string at the beginning of a line), 155*f375e9a0Smike-m.Sq $ 156*f375e9a0Smike-m(matching the null string at the end of a line), 157*f375e9a0Smike-ma 158*f375e9a0Smike-m.Sq \e 159*f375e9a0Smike-mfollowed by one of the characters 160*f375e9a0Smike-m.Sq ^.[$()|*+?{\e 161*f375e9a0Smike-m(matching that character taken as an ordinary character), 162*f375e9a0Smike-ma 163*f375e9a0Smike-m.Sq \e 164*f375e9a0Smike-mfollowed by any other character** 165*f375e9a0Smike-m(matching that character taken as an ordinary character, 166*f375e9a0Smike-mas if the 167*f375e9a0Smike-m.Sq \e 168*f375e9a0Smike-mhad not been present**), 169*f375e9a0Smike-mor a single character with no other significance (matching that character). 170*f375e9a0Smike-mA 171*f375e9a0Smike-m.Sq { 172*f375e9a0Smike-mfollowed by a character other than a digit is an ordinary character, 173*f375e9a0Smike-mnot the beginning of a bound**. 174*f375e9a0Smike-mIt is illegal to end an RE with 175*f375e9a0Smike-m.Sq \e . 176*f375e9a0Smike-m.Pp 177*f375e9a0Smike-mA bracket expression is a list of characters enclosed in 178*f375e9a0Smike-m.Sq [] . 179*f375e9a0Smike-mIt normally matches any single character from the list (but see below). 180*f375e9a0Smike-mIf the list begins with 181*f375e9a0Smike-m.Sq ^ , 182*f375e9a0Smike-mit matches any single character 183*f375e9a0Smike-m.Em not 184*f375e9a0Smike-mfrom the rest of the list 185*f375e9a0Smike-m(but see below). 186*f375e9a0Smike-mIf two characters in the list are separated by 187*f375e9a0Smike-m.Sq - , 188*f375e9a0Smike-mthis is shorthand for the full 189*f375e9a0Smike-m.Em range 190*f375e9a0Smike-mof characters between those two (inclusive) in the 191*f375e9a0Smike-mcollating sequence, e.g.\& 192*f375e9a0Smike-m.Sq [0-9] 193*f375e9a0Smike-min ASCII matches any decimal digit. 194*f375e9a0Smike-mIt is illegal** for two ranges to share an endpoint, e.g.\& 195*f375e9a0Smike-m.Sq a-c-e . 196*f375e9a0Smike-mRanges are very collating-sequence-dependent, 197*f375e9a0Smike-mand portable programs should avoid relying on them. 198*f375e9a0Smike-m.Pp 199*f375e9a0Smike-mTo include a literal 200*f375e9a0Smike-m.Sq ]\& 201*f375e9a0Smike-min the list, make it the first character 202*f375e9a0Smike-m(following a possible 203*f375e9a0Smike-m.Sq ^ ) . 204*f375e9a0Smike-mTo include a literal 205*f375e9a0Smike-m.Sq - , 206*f375e9a0Smike-mmake it the first or last character, 207*f375e9a0Smike-mor the second endpoint of a range. 208*f375e9a0Smike-mTo use a literal 209*f375e9a0Smike-m.Sq - 210*f375e9a0Smike-mas the first endpoint of a range, 211*f375e9a0Smike-menclose it in 212*f375e9a0Smike-m.Sq [. 213*f375e9a0Smike-mand 214*f375e9a0Smike-m.Sq .] 215*f375e9a0Smike-mto make it a collating element (see below). 216*f375e9a0Smike-mWith the exception of these and some combinations using 217*f375e9a0Smike-m.Sq [ 218*f375e9a0Smike-m(see next paragraphs), 219*f375e9a0Smike-mall other special characters, including 220*f375e9a0Smike-m.Sq \e , 221*f375e9a0Smike-mlose their special significance within a bracket expression. 222*f375e9a0Smike-m.Pp 223*f375e9a0Smike-mWithin a bracket expression, a collating element 224*f375e9a0Smike-m(a character, 225*f375e9a0Smike-ma multi-character sequence that collates as if it were a single character, 226*f375e9a0Smike-mor a collating-sequence name for either) 227*f375e9a0Smike-menclosed in 228*f375e9a0Smike-m.Sq [. 229*f375e9a0Smike-mand 230*f375e9a0Smike-m.Sq .] 231*f375e9a0Smike-mstands for the sequence of characters of that collating element. 232*f375e9a0Smike-mThe sequence is a single element of the bracket expression's list. 233*f375e9a0Smike-mA bracket expression containing a multi-character collating element 234*f375e9a0Smike-mcan thus match more than one character, 235*f375e9a0Smike-me.g. if the collating sequence includes a 236*f375e9a0Smike-m.Sq ch 237*f375e9a0Smike-mcollating element, 238*f375e9a0Smike-mthen the RE 239*f375e9a0Smike-m.Sq [[.ch.]]*c 240*f375e9a0Smike-mmatches the first five characters of 241*f375e9a0Smike-m.Sq chchcc . 242*f375e9a0Smike-m.Pp 243*f375e9a0Smike-mWithin a bracket expression, a collating element enclosed in 244*f375e9a0Smike-m.Sq [= 245*f375e9a0Smike-mand 246*f375e9a0Smike-m.Sq =] 247*f375e9a0Smike-mis an equivalence class, standing for the sequences of characters 248*f375e9a0Smike-mof all collating elements equivalent to that one, including itself. 249*f375e9a0Smike-m(If there are no other equivalent collating elements, 250*f375e9a0Smike-mthe treatment is as if the enclosing delimiters were 251*f375e9a0Smike-m.Sq [. 252*f375e9a0Smike-mand 253*f375e9a0Smike-m.Sq .] . ) 254*f375e9a0Smike-mFor example, if 255*f375e9a0Smike-m.Sq x 256*f375e9a0Smike-mand 257*f375e9a0Smike-m.Sq y 258*f375e9a0Smike-mare the members of an equivalence class, 259*f375e9a0Smike-mthen 260*f375e9a0Smike-m.Sq [[=x=]] , 261*f375e9a0Smike-m.Sq [[=y=]] , 262*f375e9a0Smike-mand 263*f375e9a0Smike-m.Sq [xy] 264*f375e9a0Smike-mare all synonymous. 265*f375e9a0Smike-mAn equivalence class may not** be an endpoint of a range. 266*f375e9a0Smike-m.Pp 267*f375e9a0Smike-mWithin a bracket expression, the name of a 268*f375e9a0Smike-m.Em character class 269*f375e9a0Smike-menclosed 270*f375e9a0Smike-min 271*f375e9a0Smike-m.Sq [: 272*f375e9a0Smike-mand 273*f375e9a0Smike-m.Sq :] 274*f375e9a0Smike-mstands for the list of all characters belonging to that class. 275*f375e9a0Smike-mStandard character class names are: 276*f375e9a0Smike-m.Bd -literal -offset indent 277*f375e9a0Smike-malnum digit punct 278*f375e9a0Smike-malpha graph space 279*f375e9a0Smike-mblank lower upper 280*f375e9a0Smike-mcntrl print xdigit 281*f375e9a0Smike-m.Ed 282*f375e9a0Smike-m.Pp 283*f375e9a0Smike-mThese stand for the character classes defined in 284*f375e9a0Smike-m.Xr ctype 3 . 285*f375e9a0Smike-mA locale may provide others. 286*f375e9a0Smike-mA character class may not be used as an endpoint of a range. 287*f375e9a0Smike-m.Pp 288*f375e9a0Smike-mThere are two special cases** of bracket expressions: 289*f375e9a0Smike-mthe bracket expressions 290*f375e9a0Smike-m.Sq [[:<:]] 291*f375e9a0Smike-mand 292*f375e9a0Smike-m.Sq [[:>:]] 293*f375e9a0Smike-mmatch the null string at the beginning and end of a word, respectively. 294*f375e9a0Smike-mA word is defined as a sequence of 295*f375e9a0Smike-mcharacters starting and ending with a word character 296*f375e9a0Smike-mwhich is neither preceded nor followed by 297*f375e9a0Smike-mword characters. 298*f375e9a0Smike-mA word character is an 299*f375e9a0Smike-m.Em alnum 300*f375e9a0Smike-mcharacter (as defined by 301*f375e9a0Smike-m.Xr ctype 3 ) 302*f375e9a0Smike-mor an underscore. 303*f375e9a0Smike-mThis is an extension, 304*f375e9a0Smike-mcompatible with but not specified by POSIX, 305*f375e9a0Smike-mand should be used with 306*f375e9a0Smike-mcaution in software intended to be portable to other systems. 307*f375e9a0Smike-m.Pp 308*f375e9a0Smike-mIn the event that an RE could match more than one substring of a given 309*f375e9a0Smike-mstring, 310*f375e9a0Smike-mthe RE matches the one starting earliest in the string. 311*f375e9a0Smike-mIf the RE could match more than one substring starting at that point, 312*f375e9a0Smike-mit matches the longest. 313*f375e9a0Smike-mSubexpressions also match the longest possible substrings, subject to 314*f375e9a0Smike-mthe constraint that the whole match be as long as possible, 315*f375e9a0Smike-mwith subexpressions starting earlier in the RE taking priority over 316*f375e9a0Smike-mones starting later. 317*f375e9a0Smike-mNote that higher-level subexpressions thus take priority over 318*f375e9a0Smike-mtheir lower-level component subexpressions. 319*f375e9a0Smike-m.Pp 320*f375e9a0Smike-mMatch lengths are measured in characters, not collating elements. 321*f375e9a0Smike-mA null string is considered longer than no match at all. 322*f375e9a0Smike-mFor example, 323*f375e9a0Smike-m.Sq bb* 324*f375e9a0Smike-mmatches the three middle characters of 325*f375e9a0Smike-m.Sq abbbc ; 326*f375e9a0Smike-m.Sq (wee|week)(knights|nights) 327*f375e9a0Smike-mmatches all ten characters of 328*f375e9a0Smike-m.Sq weeknights ; 329*f375e9a0Smike-mwhen 330*f375e9a0Smike-m.Sq (.*).* 331*f375e9a0Smike-mis matched against 332*f375e9a0Smike-m.Sq abc , 333*f375e9a0Smike-mthe parenthesized subexpression matches all three characters; 334*f375e9a0Smike-mand when 335*f375e9a0Smike-m.Sq (a*)* 336*f375e9a0Smike-mis matched against 337*f375e9a0Smike-m.Sq bc , 338*f375e9a0Smike-mboth the whole RE and the parenthesized subexpression match the null string. 339*f375e9a0Smike-m.Pp 340*f375e9a0Smike-mIf case-independent matching is specified, 341*f375e9a0Smike-mthe effect is much as if all case distinctions had vanished from the 342*f375e9a0Smike-malphabet. 343*f375e9a0Smike-mWhen an alphabetic that exists in multiple cases appears as an 344*f375e9a0Smike-mordinary character outside a bracket expression, it is effectively 345*f375e9a0Smike-mtransformed into a bracket expression containing both cases, 346*f375e9a0Smike-me.g.\& 347*f375e9a0Smike-m.Sq x 348*f375e9a0Smike-mbecomes 349*f375e9a0Smike-m.Sq [xX] . 350*f375e9a0Smike-mWhen it appears inside a bracket expression, 351*f375e9a0Smike-mall case counterparts of it are added to the bracket expression, 352*f375e9a0Smike-mso that, for example, 353*f375e9a0Smike-m.Sq [x] 354*f375e9a0Smike-mbecomes 355*f375e9a0Smike-m.Sq [xX] 356*f375e9a0Smike-mand 357*f375e9a0Smike-m.Sq [^x] 358*f375e9a0Smike-mbecomes 359*f375e9a0Smike-m.Sq [^xX] . 360*f375e9a0Smike-m.Pp 361*f375e9a0Smike-mNo particular limit is imposed on the length of REs**. 362*f375e9a0Smike-mPrograms intended to be portable should not employ REs longer 363*f375e9a0Smike-mthan 256 bytes, 364*f375e9a0Smike-mas an implementation can refuse to accept such REs and remain 365*f375e9a0Smike-mPOSIX-compliant. 366*f375e9a0Smike-m.Pp 367*f375e9a0Smike-mThe following is a list of extended regular expressions: 368*f375e9a0Smike-m.Bl -tag -width Ds 369*f375e9a0Smike-m.It Ar c 370*f375e9a0Smike-mAny character 371*f375e9a0Smike-m.Ar c 372*f375e9a0Smike-mnot listed below matches itself. 373*f375e9a0Smike-m.It \e Ns Ar c 374*f375e9a0Smike-mAny backslash-escaped character 375*f375e9a0Smike-m.Ar c 376*f375e9a0Smike-mmatches itself. 377*f375e9a0Smike-m.It \&. 378*f375e9a0Smike-mMatches any single character that is not a newline 379*f375e9a0Smike-m.Pq Sq \en . 380*f375e9a0Smike-m.It Bq Ar char-class 381*f375e9a0Smike-mMatches any single character in 382*f375e9a0Smike-m.Ar char-class . 383*f375e9a0Smike-mTo include a 384*f375e9a0Smike-m.Ql \&] 385*f375e9a0Smike-min 386*f375e9a0Smike-m.Ar char-class , 387*f375e9a0Smike-mit must be the first character. 388*f375e9a0Smike-mA range of characters may be specified by separating the end characters 389*f375e9a0Smike-mof the range with a 390*f375e9a0Smike-m.Ql - ; 391*f375e9a0Smike-me.g.\& 392*f375e9a0Smike-m.Ar a-z 393*f375e9a0Smike-mspecifies the lower case characters. 394*f375e9a0Smike-mThe following literal expressions can also be used in 395*f375e9a0Smike-m.Ar char-class 396*f375e9a0Smike-mto specify sets of characters: 397*f375e9a0Smike-m.Bd -unfilled -offset indent 398*f375e9a0Smike-m[:alnum:] [:cntrl:] [:lower:] [:space:] 399*f375e9a0Smike-m[:alpha:] [:digit:] [:print:] [:upper:] 400*f375e9a0Smike-m[:blank:] [:graph:] [:punct:] [:xdigit:] 401*f375e9a0Smike-m.Ed 402*f375e9a0Smike-m.Pp 403*f375e9a0Smike-mIf 404*f375e9a0Smike-m.Ql - 405*f375e9a0Smike-mappears as the first or last character of 406*f375e9a0Smike-m.Ar char-class , 407*f375e9a0Smike-mthen it matches itself. 408*f375e9a0Smike-mAll other characters in 409*f375e9a0Smike-m.Ar char-class 410*f375e9a0Smike-mmatch themselves. 411*f375e9a0Smike-m.Pp 412*f375e9a0Smike-mPatterns in 413*f375e9a0Smike-m.Ar char-class 414*f375e9a0Smike-mof the form 415*f375e9a0Smike-m.Eo [. 416*f375e9a0Smike-m.Ar col-elm 417*f375e9a0Smike-m.Ec .]\& 418*f375e9a0Smike-mor 419*f375e9a0Smike-m.Eo [= 420*f375e9a0Smike-m.Ar col-elm 421*f375e9a0Smike-m.Ec =]\& , 422*f375e9a0Smike-mwhere 423*f375e9a0Smike-m.Ar col-elm 424*f375e9a0Smike-mis a collating element, are interpreted according to 425*f375e9a0Smike-m.Xr setlocale 3 426*f375e9a0Smike-m.Pq not currently supported . 427*f375e9a0Smike-m.It Bq ^ Ns Ar char-class 428*f375e9a0Smike-mMatches any single character, other than newline, not in 429*f375e9a0Smike-m.Ar char-class . 430*f375e9a0Smike-m.Ar char-class 431*f375e9a0Smike-mis defined as above. 432*f375e9a0Smike-m.It ^ 433*f375e9a0Smike-mIf 434*f375e9a0Smike-m.Sq ^ 435*f375e9a0Smike-mis the first character of a regular expression, then it 436*f375e9a0Smike-manchors the regular expression to the beginning of a line. 437*f375e9a0Smike-mOtherwise, it matches itself. 438*f375e9a0Smike-m.It $ 439*f375e9a0Smike-mIf 440*f375e9a0Smike-m.Sq $ 441*f375e9a0Smike-mis the last character of a regular expression, 442*f375e9a0Smike-mit anchors the regular expression to the end of a line. 443*f375e9a0Smike-mOtherwise, it matches itself. 444*f375e9a0Smike-m.It [[:<:]] 445*f375e9a0Smike-mAnchors the single character regular expression or subexpression 446*f375e9a0Smike-mimmediately following it to the beginning of a word. 447*f375e9a0Smike-m.It [[:>:]] 448*f375e9a0Smike-mAnchors the single character regular expression or subexpression 449*f375e9a0Smike-mimmediately following it to the end of a word. 450*f375e9a0Smike-m.It Pq Ar re 451*f375e9a0Smike-mDefines a subexpression 452*f375e9a0Smike-m.Ar re . 453*f375e9a0Smike-mAny set of characters enclosed in parentheses 454*f375e9a0Smike-mmatches whatever the set of characters without parentheses matches 455*f375e9a0Smike-m(that is a long-winded way of saying the constructs 456*f375e9a0Smike-m.Sq (re) 457*f375e9a0Smike-mand 458*f375e9a0Smike-m.Sq re 459*f375e9a0Smike-mmatch identically). 460*f375e9a0Smike-m.It * 461*f375e9a0Smike-mMatches the single character regular expression or subexpression 462*f375e9a0Smike-mimmediately preceding it zero or more times. 463*f375e9a0Smike-mIf 464*f375e9a0Smike-m.Sq * 465*f375e9a0Smike-mis the first character of a regular expression or subexpression, 466*f375e9a0Smike-mthen it matches itself. 467*f375e9a0Smike-mThe 468*f375e9a0Smike-m.Sq * 469*f375e9a0Smike-moperator sometimes yields unexpected results. 470*f375e9a0Smike-mFor example, the regular expression 471*f375e9a0Smike-m.Ar b* 472*f375e9a0Smike-mmatches the beginning of the string 473*f375e9a0Smike-m.Qq abbb 474*f375e9a0Smike-m(as opposed to the substring 475*f375e9a0Smike-m.Qq bbb ) , 476*f375e9a0Smike-msince a null match is the only leftmost match. 477*f375e9a0Smike-m.It + 478*f375e9a0Smike-mMatches the singular character regular expression 479*f375e9a0Smike-mor subexpression immediately preceding it 480*f375e9a0Smike-mone or more times. 481*f375e9a0Smike-m.It ? 482*f375e9a0Smike-mMatches the singular character regular expression 483*f375e9a0Smike-mor subexpression immediately preceding it 484*f375e9a0Smike-m0 or 1 times. 485*f375e9a0Smike-m.Sm off 486*f375e9a0Smike-m.It Xo 487*f375e9a0Smike-m.Pf { Ar n , m No }\ \& 488*f375e9a0Smike-m.Pf { Ar n , No }\ \& 489*f375e9a0Smike-m.Pf { Ar n No } 490*f375e9a0Smike-m.Xc 491*f375e9a0Smike-m.Sm on 492*f375e9a0Smike-mMatches the single character regular expression or subexpression 493*f375e9a0Smike-mimmediately preceding it at least 494*f375e9a0Smike-m.Ar n 495*f375e9a0Smike-mand at most 496*f375e9a0Smike-m.Ar m 497*f375e9a0Smike-mtimes. 498*f375e9a0Smike-mIf 499*f375e9a0Smike-m.Ar m 500*f375e9a0Smike-mis omitted, then it matches at least 501*f375e9a0Smike-m.Ar n 502*f375e9a0Smike-mtimes. 503*f375e9a0Smike-mIf the comma is also omitted, then it matches exactly 504*f375e9a0Smike-m.Ar n 505*f375e9a0Smike-mtimes. 506*f375e9a0Smike-m.It \*(Ba 507*f375e9a0Smike-mUsed to separate patterns. 508*f375e9a0Smike-mFor example, 509*f375e9a0Smike-mthe pattern 510*f375e9a0Smike-m.Sq cat\*(Badog 511*f375e9a0Smike-mmatches either 512*f375e9a0Smike-m.Sq cat 513*f375e9a0Smike-mor 514*f375e9a0Smike-m.Sq dog . 515*f375e9a0Smike-m.El 516*f375e9a0Smike-m.Sh BASIC REGULAR EXPRESSIONS 517*f375e9a0Smike-mBasic regular expressions differ in several respects: 518*f375e9a0Smike-m.Bl -bullet -offset 3n 519*f375e9a0Smike-m.It 520*f375e9a0Smike-m.Sq \*(Ba , 521*f375e9a0Smike-m.Sq + , 522*f375e9a0Smike-mand 523*f375e9a0Smike-m.Sq ?\& 524*f375e9a0Smike-mare ordinary characters and there is no equivalent 525*f375e9a0Smike-mfor their functionality. 526*f375e9a0Smike-m.It 527*f375e9a0Smike-mThe delimiters for bounds are 528*f375e9a0Smike-m.Sq \e{ 529*f375e9a0Smike-mand 530*f375e9a0Smike-m.Sq \e} , 531*f375e9a0Smike-mwith 532*f375e9a0Smike-m.Sq { 533*f375e9a0Smike-mand 534*f375e9a0Smike-m.Sq } 535*f375e9a0Smike-mby themselves ordinary characters. 536*f375e9a0Smike-m.It 537*f375e9a0Smike-mThe parentheses for nested subexpressions are 538*f375e9a0Smike-m.Sq \e( 539*f375e9a0Smike-mand 540*f375e9a0Smike-m.Sq \e) , 541*f375e9a0Smike-mwith 542*f375e9a0Smike-m.Sq ( 543*f375e9a0Smike-mand 544*f375e9a0Smike-m.Sq )\& 545*f375e9a0Smike-mby themselves ordinary characters. 546*f375e9a0Smike-m.It 547*f375e9a0Smike-m.Sq ^ 548*f375e9a0Smike-mis an ordinary character except at the beginning of the 549*f375e9a0Smike-mRE or** the beginning of a parenthesized subexpression. 550*f375e9a0Smike-m.It 551*f375e9a0Smike-m.Sq $ 552*f375e9a0Smike-mis an ordinary character except at the end of the 553*f375e9a0Smike-mRE or** the end of a parenthesized subexpression. 554*f375e9a0Smike-m.It 555*f375e9a0Smike-m.Sq * 556*f375e9a0Smike-mis an ordinary character if it appears at the beginning of the 557*f375e9a0Smike-mRE or the beginning of a parenthesized subexpression 558*f375e9a0Smike-m(after a possible leading 559*f375e9a0Smike-m.Sq ^ ) . 560*f375e9a0Smike-m.It 561*f375e9a0Smike-mFinally, there is one new type of atom, a 562*f375e9a0Smike-m.Em back-reference : 563*f375e9a0Smike-m.Sq \e 564*f375e9a0Smike-mfollowed by a non-zero decimal digit 565*f375e9a0Smike-m.Ar d 566*f375e9a0Smike-mmatches the same sequence of characters matched by the 567*f375e9a0Smike-m.Ar d Ns th 568*f375e9a0Smike-mparenthesized subexpression 569*f375e9a0Smike-m(numbering subexpressions by the positions of their opening parentheses, 570*f375e9a0Smike-mleft to right), 571*f375e9a0Smike-mso that, for example, 572*f375e9a0Smike-m.Sq \e([bc]\e)\e1 573*f375e9a0Smike-mmatches 574*f375e9a0Smike-m.Sq bb\& 575*f375e9a0Smike-mor 576*f375e9a0Smike-m.Sq cc 577*f375e9a0Smike-mbut not 578*f375e9a0Smike-m.Sq bc . 579*f375e9a0Smike-m.El 580*f375e9a0Smike-m.Pp 581*f375e9a0Smike-mThe following is a list of basic regular expressions: 582*f375e9a0Smike-m.Bl -tag -width Ds 583*f375e9a0Smike-m.It Ar c 584*f375e9a0Smike-mAny character 585*f375e9a0Smike-m.Ar c 586*f375e9a0Smike-mnot listed below matches itself. 587*f375e9a0Smike-m.It \e Ns Ar c 588*f375e9a0Smike-mAny backslash-escaped character 589*f375e9a0Smike-m.Ar c , 590*f375e9a0Smike-mexcept for 591*f375e9a0Smike-m.Sq { , 592*f375e9a0Smike-m.Sq } , 593*f375e9a0Smike-m.Sq \&( , 594*f375e9a0Smike-mand 595*f375e9a0Smike-m.Sq \&) , 596*f375e9a0Smike-mmatches itself. 597*f375e9a0Smike-m.It \&. 598*f375e9a0Smike-mMatches any single character that is not a newline 599*f375e9a0Smike-m.Pq Sq \en . 600*f375e9a0Smike-m.It Bq Ar char-class 601*f375e9a0Smike-mMatches any single character in 602*f375e9a0Smike-m.Ar char-class . 603*f375e9a0Smike-mTo include a 604*f375e9a0Smike-m.Ql \&] 605*f375e9a0Smike-min 606*f375e9a0Smike-m.Ar char-class , 607*f375e9a0Smike-mit must be the first character. 608*f375e9a0Smike-mA range of characters may be specified by separating the end characters 609*f375e9a0Smike-mof the range with a 610*f375e9a0Smike-m.Ql - ; 611*f375e9a0Smike-me.g.\& 612*f375e9a0Smike-m.Ar a-z 613*f375e9a0Smike-mspecifies the lower case characters. 614*f375e9a0Smike-mThe following literal expressions can also be used in 615*f375e9a0Smike-m.Ar char-class 616*f375e9a0Smike-mto specify sets of characters: 617*f375e9a0Smike-m.Bd -unfilled -offset indent 618*f375e9a0Smike-m[:alnum:] [:cntrl:] [:lower:] [:space:] 619*f375e9a0Smike-m[:alpha:] [:digit:] [:print:] [:upper:] 620*f375e9a0Smike-m[:blank:] [:graph:] [:punct:] [:xdigit:] 621*f375e9a0Smike-m.Ed 622*f375e9a0Smike-m.Pp 623*f375e9a0Smike-mIf 624*f375e9a0Smike-m.Ql - 625*f375e9a0Smike-mappears as the first or last character of 626*f375e9a0Smike-m.Ar char-class , 627*f375e9a0Smike-mthen it matches itself. 628*f375e9a0Smike-mAll other characters in 629*f375e9a0Smike-m.Ar char-class 630*f375e9a0Smike-mmatch themselves. 631*f375e9a0Smike-m.Pp 632*f375e9a0Smike-mPatterns in 633*f375e9a0Smike-m.Ar char-class 634*f375e9a0Smike-mof the form 635*f375e9a0Smike-m.Eo [. 636*f375e9a0Smike-m.Ar col-elm 637*f375e9a0Smike-m.Ec .]\& 638*f375e9a0Smike-mor 639*f375e9a0Smike-m.Eo [= 640*f375e9a0Smike-m.Ar col-elm 641*f375e9a0Smike-m.Ec =]\& , 642*f375e9a0Smike-mwhere 643*f375e9a0Smike-m.Ar col-elm 644*f375e9a0Smike-mis a collating element, are interpreted according to 645*f375e9a0Smike-m.Xr setlocale 3 646*f375e9a0Smike-m.Pq not currently supported . 647*f375e9a0Smike-m.It Bq ^ Ns Ar char-class 648*f375e9a0Smike-mMatches any single character, other than newline, not in 649*f375e9a0Smike-m.Ar char-class . 650*f375e9a0Smike-m.Ar char-class 651*f375e9a0Smike-mis defined as above. 652*f375e9a0Smike-m.It ^ 653*f375e9a0Smike-mIf 654*f375e9a0Smike-m.Sq ^ 655*f375e9a0Smike-mis the first character of a regular expression, then it 656*f375e9a0Smike-manchors the regular expression to the beginning of a line. 657*f375e9a0Smike-mOtherwise, it matches itself. 658*f375e9a0Smike-m.It $ 659*f375e9a0Smike-mIf 660*f375e9a0Smike-m.Sq $ 661*f375e9a0Smike-mis the last character of a regular expression, 662*f375e9a0Smike-mit anchors the regular expression to the end of a line. 663*f375e9a0Smike-mOtherwise, it matches itself. 664*f375e9a0Smike-m.It [[:<:]] 665*f375e9a0Smike-mAnchors the single character regular expression or subexpression 666*f375e9a0Smike-mimmediately following it to the beginning of a word. 667*f375e9a0Smike-m.It [[:>:]] 668*f375e9a0Smike-mAnchors the single character regular expression or subexpression 669*f375e9a0Smike-mimmediately following it to the end of a word. 670*f375e9a0Smike-m.It \e( Ns Ar re Ns \e) 671*f375e9a0Smike-mDefines a subexpression 672*f375e9a0Smike-m.Ar re . 673*f375e9a0Smike-mSubexpressions may be nested. 674*f375e9a0Smike-mA subsequent backreference of the form 675*f375e9a0Smike-m.Pf \e Ns Ar n , 676*f375e9a0Smike-mwhere 677*f375e9a0Smike-m.Ar n 678*f375e9a0Smike-mis a number in the range [1,9], expands to the text matched by the 679*f375e9a0Smike-m.Ar n Ns th 680*f375e9a0Smike-msubexpression. 681*f375e9a0Smike-mFor example, the regular expression 682*f375e9a0Smike-m.Ar \e(.*\e)\e1 683*f375e9a0Smike-mmatches any string consisting of identical adjacent substrings. 684*f375e9a0Smike-mSubexpressions are ordered relative to their left delimiter. 685*f375e9a0Smike-m.It * 686*f375e9a0Smike-mMatches the single character regular expression or subexpression 687*f375e9a0Smike-mimmediately preceding it zero or more times. 688*f375e9a0Smike-mIf 689*f375e9a0Smike-m.Sq * 690*f375e9a0Smike-mis the first character of a regular expression or subexpression, 691*f375e9a0Smike-mthen it matches itself. 692*f375e9a0Smike-mThe 693*f375e9a0Smike-m.Sq * 694*f375e9a0Smike-moperator sometimes yields unexpected results. 695*f375e9a0Smike-mFor example, the regular expression 696*f375e9a0Smike-m.Ar b* 697*f375e9a0Smike-mmatches the beginning of the string 698*f375e9a0Smike-m.Qq abbb 699*f375e9a0Smike-m(as opposed to the substring 700*f375e9a0Smike-m.Qq bbb ) , 701*f375e9a0Smike-msince a null match is the only leftmost match. 702*f375e9a0Smike-m.Sm off 703*f375e9a0Smike-m.It Xo 704*f375e9a0Smike-m.Pf \e{ Ar n , m No \e}\ \& 705*f375e9a0Smike-m.Pf \e{ Ar n , No \e}\ \& 706*f375e9a0Smike-m.Pf \e{ Ar n No \e} 707*f375e9a0Smike-m.Xc 708*f375e9a0Smike-m.Sm on 709*f375e9a0Smike-mMatches the single character regular expression or subexpression 710*f375e9a0Smike-mimmediately preceding it at least 711*f375e9a0Smike-m.Ar n 712*f375e9a0Smike-mand at most 713*f375e9a0Smike-m.Ar m 714*f375e9a0Smike-mtimes. 715*f375e9a0Smike-mIf 716*f375e9a0Smike-m.Ar m 717*f375e9a0Smike-mis omitted, then it matches at least 718*f375e9a0Smike-m.Ar n 719*f375e9a0Smike-mtimes. 720*f375e9a0Smike-mIf the comma is also omitted, then it matches exactly 721*f375e9a0Smike-m.Ar n 722*f375e9a0Smike-mtimes. 723*f375e9a0Smike-m.El 724*f375e9a0Smike-m.Sh SEE ALSO 725*f375e9a0Smike-m.Xr ctype 3 , 726*f375e9a0Smike-m.Xr regex 3 727*f375e9a0Smike-m.Sh STANDARDS 728*f375e9a0Smike-m.St -p1003.1-2004 : 729*f375e9a0Smike-mBase Definitions, Chapter 9 (Regular Expressions). 730*f375e9a0Smike-m.Sh BUGS 731*f375e9a0Smike-mHaving two kinds of REs is a botch. 732*f375e9a0Smike-m.Pp 733*f375e9a0Smike-mThe current POSIX spec says that 734*f375e9a0Smike-m.Sq )\& 735*f375e9a0Smike-mis an ordinary character in the absence of an unmatched 736*f375e9a0Smike-m.Sq ( ; 737*f375e9a0Smike-mthis was an unintentional result of a wording error, 738*f375e9a0Smike-mand change is likely. 739*f375e9a0Smike-mAvoid relying on it. 740*f375e9a0Smike-m.Pp 741*f375e9a0Smike-mBack-references are a dreadful botch, 742*f375e9a0Smike-mposing major problems for efficient implementations. 743*f375e9a0Smike-mThey are also somewhat vaguely defined 744*f375e9a0Smike-m(does 745*f375e9a0Smike-m.Sq a\e(\e(b\e)*\e2\e)*d 746*f375e9a0Smike-mmatch 747*f375e9a0Smike-m.Sq abbbd ? ) . 748*f375e9a0Smike-mAvoid using them. 749*f375e9a0Smike-m.Pp 750*f375e9a0Smike-mPOSIX's specification of case-independent matching is vague. 751*f375e9a0Smike-mThe 752*f375e9a0Smike-m.Dq one case implies all cases 753*f375e9a0Smike-mdefinition given above 754*f375e9a0Smike-mis the current consensus among implementors as to the right interpretation. 755*f375e9a0Smike-m.Pp 756*f375e9a0Smike-mThe syntax for word boundaries is incredibly ugly. 757