xref: /llvm-project/llvm/docs/re_format.7 (revision f375e9a0924fd059792c6dccbb0412d5127bfc74)
1*f375e9a0Smike-m.\"	$OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $
2*f375e9a0Smike-m.\"
3*f375e9a0Smike-m.\" Copyright (c) 1997, Phillip F Knaack. All rights reserved.
4*f375e9a0Smike-m.\"
5*f375e9a0Smike-m.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
6*f375e9a0Smike-m.\" Copyright (c) 1992, 1993, 1994
7*f375e9a0Smike-m.\"	The Regents of the University of California.  All rights reserved.
8*f375e9a0Smike-m.\"
9*f375e9a0Smike-m.\" This code is derived from software contributed to Berkeley by
10*f375e9a0Smike-m.\" Henry Spencer.
11*f375e9a0Smike-m.\"
12*f375e9a0Smike-m.\" Redistribution and use in source and binary forms, with or without
13*f375e9a0Smike-m.\" modification, are permitted provided that the following conditions
14*f375e9a0Smike-m.\" are met:
15*f375e9a0Smike-m.\" 1. Redistributions of source code must retain the above copyright
16*f375e9a0Smike-m.\"    notice, this list of conditions and the following disclaimer.
17*f375e9a0Smike-m.\" 2. Redistributions in binary form must reproduce the above copyright
18*f375e9a0Smike-m.\"    notice, this list of conditions and the following disclaimer in the
19*f375e9a0Smike-m.\"    documentation and/or other materials provided with the distribution.
20*f375e9a0Smike-m.\" 3. Neither the name of the University nor the names of its contributors
21*f375e9a0Smike-m.\"    may be used to endorse or promote products derived from this software
22*f375e9a0Smike-m.\"    without specific prior written permission.
23*f375e9a0Smike-m.\"
24*f375e9a0Smike-m.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
25*f375e9a0Smike-m.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26*f375e9a0Smike-m.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27*f375e9a0Smike-m.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
28*f375e9a0Smike-m.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
29*f375e9a0Smike-m.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
30*f375e9a0Smike-m.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
31*f375e9a0Smike-m.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
32*f375e9a0Smike-m.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
33*f375e9a0Smike-m.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34*f375e9a0Smike-m.\" SUCH DAMAGE.
35*f375e9a0Smike-m.\"
36*f375e9a0Smike-m.\"	@(#)re_format.7	8.3 (Berkeley) 3/20/94
37*f375e9a0Smike-m.\"
38*f375e9a0Smike-m.Dd $Mdocdate: May 31 2007 $
39*f375e9a0Smike-m.Dt RE_FORMAT 7
40*f375e9a0Smike-m.Os
41*f375e9a0Smike-m.Sh NAME
42*f375e9a0Smike-m.Nm re_format
43*f375e9a0Smike-m.Nd POSIX regular expressions
44*f375e9a0Smike-m.Sh DESCRIPTION
45*f375e9a0Smike-mRegular expressions (REs),
46*f375e9a0Smike-mas defined in
47*f375e9a0Smike-m.St -p1003.1-2004 ,
48*f375e9a0Smike-mcome in two forms:
49*f375e9a0Smike-mbasic regular expressions
50*f375e9a0Smike-m(BREs)
51*f375e9a0Smike-mand extended regular expressions
52*f375e9a0Smike-m(EREs).
53*f375e9a0Smike-mBoth forms of regular expressions are supported
54*f375e9a0Smike-mby the interfaces described in
55*f375e9a0Smike-m.Xr regex 3 .
56*f375e9a0Smike-mApplications dealing with regular expressions
57*f375e9a0Smike-mmay use one or the other form
58*f375e9a0Smike-m(or indeed both).
59*f375e9a0Smike-mFor example,
60*f375e9a0Smike-m.Xr ed 1
61*f375e9a0Smike-muses BREs,
62*f375e9a0Smike-mwhilst
63*f375e9a0Smike-m.Xr egrep 1
64*f375e9a0Smike-mtalks EREs.
65*f375e9a0Smike-mConsult the manual page for the specific application to find out which
66*f375e9a0Smike-mit uses.
67*f375e9a0Smike-m.Pp
68*f375e9a0Smike-mPOSIX leaves some aspects of RE syntax and semantics open;
69*f375e9a0Smike-m.Sq **
70*f375e9a0Smike-mmarks decisions on these aspects that
71*f375e9a0Smike-mmay not be fully portable to other POSIX implementations.
72*f375e9a0Smike-m.Pp
73*f375e9a0Smike-mThis manual page first describes regular expressions in general,
74*f375e9a0Smike-mspecifically extended regular expressions,
75*f375e9a0Smike-mand then discusses differences between them and basic regular expressions.
76*f375e9a0Smike-m.Sh EXTENDED REGULAR EXPRESSIONS
77*f375e9a0Smike-mAn ERE is one** or more non-empty**
78*f375e9a0Smike-m.Em branches ,
79*f375e9a0Smike-mseparated by
80*f375e9a0Smike-m.Sq \*(Ba .
81*f375e9a0Smike-mIt matches anything that matches one of the branches.
82*f375e9a0Smike-m.Pp
83*f375e9a0Smike-mA branch is one** or more
84*f375e9a0Smike-m.Em pieces ,
85*f375e9a0Smike-mconcatenated.
86*f375e9a0Smike-mIt matches a match for the first, followed by a match for the second, etc.
87*f375e9a0Smike-m.Pp
88*f375e9a0Smike-mA piece is an
89*f375e9a0Smike-m.Em atom
90*f375e9a0Smike-mpossibly followed by a single**
91*f375e9a0Smike-m.Sq * ,
92*f375e9a0Smike-m.Sq + ,
93*f375e9a0Smike-m.Sq ?\& ,
94*f375e9a0Smike-mor
95*f375e9a0Smike-m.Em bound .
96*f375e9a0Smike-mAn atom followed by
97*f375e9a0Smike-m.Sq *
98*f375e9a0Smike-mmatches a sequence of 0 or more matches of the atom.
99*f375e9a0Smike-mAn atom followed by
100*f375e9a0Smike-m.Sq +
101*f375e9a0Smike-mmatches a sequence of 1 or more matches of the atom.
102*f375e9a0Smike-mAn atom followed by
103*f375e9a0Smike-m.Sq ?\&
104*f375e9a0Smike-mmatches a sequence of 0 or 1 matches of the atom.
105*f375e9a0Smike-m.Pp
106*f375e9a0Smike-mA bound is
107*f375e9a0Smike-m.Sq {
108*f375e9a0Smike-mfollowed by an unsigned decimal integer,
109*f375e9a0Smike-mpossibly followed by
110*f375e9a0Smike-m.Sq ,\&
111*f375e9a0Smike-mpossibly followed by another unsigned decimal integer,
112*f375e9a0Smike-malways followed by
113*f375e9a0Smike-m.Sq } .
114*f375e9a0Smike-mThe integers must lie between 0 and
115*f375e9a0Smike-m.Dv RE_DUP_MAX
116*f375e9a0Smike-m(255**) inclusive,
117*f375e9a0Smike-mand if there are two of them, the first may not exceed the second.
118*f375e9a0Smike-mAn atom followed by a bound containing one integer
119*f375e9a0Smike-m.Ar i
120*f375e9a0Smike-mand no comma matches
121*f375e9a0Smike-ma sequence of exactly
122*f375e9a0Smike-m.Ar i
123*f375e9a0Smike-mmatches of the atom.
124*f375e9a0Smike-mAn atom followed by a bound
125*f375e9a0Smike-mcontaining one integer
126*f375e9a0Smike-m.Ar i
127*f375e9a0Smike-mand a comma matches
128*f375e9a0Smike-ma sequence of
129*f375e9a0Smike-m.Ar i
130*f375e9a0Smike-mor more matches of the atom.
131*f375e9a0Smike-mAn atom followed by a bound
132*f375e9a0Smike-mcontaining two integers
133*f375e9a0Smike-m.Ar i
134*f375e9a0Smike-mand
135*f375e9a0Smike-m.Ar j
136*f375e9a0Smike-mmatches a sequence of
137*f375e9a0Smike-m.Ar i
138*f375e9a0Smike-mthrough
139*f375e9a0Smike-m.Ar j
140*f375e9a0Smike-m(inclusive) matches of the atom.
141*f375e9a0Smike-m.Pp
142*f375e9a0Smike-mAn atom is a regular expression enclosed in
143*f375e9a0Smike-m.Sq ()
144*f375e9a0Smike-m(matching a part of the regular expression),
145*f375e9a0Smike-man empty set of
146*f375e9a0Smike-m.Sq ()
147*f375e9a0Smike-m(matching the null string)**,
148*f375e9a0Smike-ma
149*f375e9a0Smike-m.Em bracket expression
150*f375e9a0Smike-m(see below),
151*f375e9a0Smike-m.Sq .\&
152*f375e9a0Smike-m(matching any single character),
153*f375e9a0Smike-m.Sq ^
154*f375e9a0Smike-m(matching the null string at the beginning of a line),
155*f375e9a0Smike-m.Sq $
156*f375e9a0Smike-m(matching the null string at the end of a line),
157*f375e9a0Smike-ma
158*f375e9a0Smike-m.Sq \e
159*f375e9a0Smike-mfollowed by one of the characters
160*f375e9a0Smike-m.Sq ^.[$()|*+?{\e
161*f375e9a0Smike-m(matching that character taken as an ordinary character),
162*f375e9a0Smike-ma
163*f375e9a0Smike-m.Sq \e
164*f375e9a0Smike-mfollowed by any other character**
165*f375e9a0Smike-m(matching that character taken as an ordinary character,
166*f375e9a0Smike-mas if the
167*f375e9a0Smike-m.Sq \e
168*f375e9a0Smike-mhad not been present**),
169*f375e9a0Smike-mor a single character with no other significance (matching that character).
170*f375e9a0Smike-mA
171*f375e9a0Smike-m.Sq {
172*f375e9a0Smike-mfollowed by a character other than a digit is an ordinary character,
173*f375e9a0Smike-mnot the beginning of a bound**.
174*f375e9a0Smike-mIt is illegal to end an RE with
175*f375e9a0Smike-m.Sq \e .
176*f375e9a0Smike-m.Pp
177*f375e9a0Smike-mA bracket expression is a list of characters enclosed in
178*f375e9a0Smike-m.Sq [] .
179*f375e9a0Smike-mIt normally matches any single character from the list (but see below).
180*f375e9a0Smike-mIf the list begins with
181*f375e9a0Smike-m.Sq ^ ,
182*f375e9a0Smike-mit matches any single character
183*f375e9a0Smike-m.Em not
184*f375e9a0Smike-mfrom the rest of the list
185*f375e9a0Smike-m(but see below).
186*f375e9a0Smike-mIf two characters in the list are separated by
187*f375e9a0Smike-m.Sq - ,
188*f375e9a0Smike-mthis is shorthand for the full
189*f375e9a0Smike-m.Em range
190*f375e9a0Smike-mof characters between those two (inclusive) in the
191*f375e9a0Smike-mcollating sequence, e.g.\&
192*f375e9a0Smike-m.Sq [0-9]
193*f375e9a0Smike-min ASCII matches any decimal digit.
194*f375e9a0Smike-mIt is illegal** for two ranges to share an endpoint, e.g.\&
195*f375e9a0Smike-m.Sq a-c-e .
196*f375e9a0Smike-mRanges are very collating-sequence-dependent,
197*f375e9a0Smike-mand portable programs should avoid relying on them.
198*f375e9a0Smike-m.Pp
199*f375e9a0Smike-mTo include a literal
200*f375e9a0Smike-m.Sq ]\&
201*f375e9a0Smike-min the list, make it the first character
202*f375e9a0Smike-m(following a possible
203*f375e9a0Smike-m.Sq ^ ) .
204*f375e9a0Smike-mTo include a literal
205*f375e9a0Smike-m.Sq - ,
206*f375e9a0Smike-mmake it the first or last character,
207*f375e9a0Smike-mor the second endpoint of a range.
208*f375e9a0Smike-mTo use a literal
209*f375e9a0Smike-m.Sq -
210*f375e9a0Smike-mas the first endpoint of a range,
211*f375e9a0Smike-menclose it in
212*f375e9a0Smike-m.Sq [.
213*f375e9a0Smike-mand
214*f375e9a0Smike-m.Sq .]
215*f375e9a0Smike-mto make it a collating element (see below).
216*f375e9a0Smike-mWith the exception of these and some combinations using
217*f375e9a0Smike-m.Sq [
218*f375e9a0Smike-m(see next paragraphs),
219*f375e9a0Smike-mall other special characters, including
220*f375e9a0Smike-m.Sq \e ,
221*f375e9a0Smike-mlose their special significance within a bracket expression.
222*f375e9a0Smike-m.Pp
223*f375e9a0Smike-mWithin a bracket expression, a collating element
224*f375e9a0Smike-m(a character,
225*f375e9a0Smike-ma multi-character sequence that collates as if it were a single character,
226*f375e9a0Smike-mor a collating-sequence name for either)
227*f375e9a0Smike-menclosed in
228*f375e9a0Smike-m.Sq [.
229*f375e9a0Smike-mand
230*f375e9a0Smike-m.Sq .]
231*f375e9a0Smike-mstands for the sequence of characters of that collating element.
232*f375e9a0Smike-mThe sequence is a single element of the bracket expression's list.
233*f375e9a0Smike-mA bracket expression containing a multi-character collating element
234*f375e9a0Smike-mcan thus match more than one character,
235*f375e9a0Smike-me.g. if the collating sequence includes a
236*f375e9a0Smike-m.Sq ch
237*f375e9a0Smike-mcollating element,
238*f375e9a0Smike-mthen the RE
239*f375e9a0Smike-m.Sq [[.ch.]]*c
240*f375e9a0Smike-mmatches the first five characters of
241*f375e9a0Smike-m.Sq chchcc .
242*f375e9a0Smike-m.Pp
243*f375e9a0Smike-mWithin a bracket expression, a collating element enclosed in
244*f375e9a0Smike-m.Sq [=
245*f375e9a0Smike-mand
246*f375e9a0Smike-m.Sq =]
247*f375e9a0Smike-mis an equivalence class, standing for the sequences of characters
248*f375e9a0Smike-mof all collating elements equivalent to that one, including itself.
249*f375e9a0Smike-m(If there are no other equivalent collating elements,
250*f375e9a0Smike-mthe treatment is as if the enclosing delimiters were
251*f375e9a0Smike-m.Sq [.
252*f375e9a0Smike-mand
253*f375e9a0Smike-m.Sq .] . )
254*f375e9a0Smike-mFor example, if
255*f375e9a0Smike-m.Sq x
256*f375e9a0Smike-mand
257*f375e9a0Smike-m.Sq y
258*f375e9a0Smike-mare the members of an equivalence class,
259*f375e9a0Smike-mthen
260*f375e9a0Smike-m.Sq [[=x=]] ,
261*f375e9a0Smike-m.Sq [[=y=]] ,
262*f375e9a0Smike-mand
263*f375e9a0Smike-m.Sq [xy]
264*f375e9a0Smike-mare all synonymous.
265*f375e9a0Smike-mAn equivalence class may not** be an endpoint of a range.
266*f375e9a0Smike-m.Pp
267*f375e9a0Smike-mWithin a bracket expression, the name of a
268*f375e9a0Smike-m.Em character class
269*f375e9a0Smike-menclosed
270*f375e9a0Smike-min
271*f375e9a0Smike-m.Sq [:
272*f375e9a0Smike-mand
273*f375e9a0Smike-m.Sq :]
274*f375e9a0Smike-mstands for the list of all characters belonging to that class.
275*f375e9a0Smike-mStandard character class names are:
276*f375e9a0Smike-m.Bd -literal -offset indent
277*f375e9a0Smike-malnum	digit	punct
278*f375e9a0Smike-malpha	graph	space
279*f375e9a0Smike-mblank	lower	upper
280*f375e9a0Smike-mcntrl	print	xdigit
281*f375e9a0Smike-m.Ed
282*f375e9a0Smike-m.Pp
283*f375e9a0Smike-mThese stand for the character classes defined in
284*f375e9a0Smike-m.Xr ctype 3 .
285*f375e9a0Smike-mA locale may provide others.
286*f375e9a0Smike-mA character class may not be used as an endpoint of a range.
287*f375e9a0Smike-m.Pp
288*f375e9a0Smike-mThere are two special cases** of bracket expressions:
289*f375e9a0Smike-mthe bracket expressions
290*f375e9a0Smike-m.Sq [[:<:]]
291*f375e9a0Smike-mand
292*f375e9a0Smike-m.Sq [[:>:]]
293*f375e9a0Smike-mmatch the null string at the beginning and end of a word, respectively.
294*f375e9a0Smike-mA word is defined as a sequence of
295*f375e9a0Smike-mcharacters starting and ending with a word character
296*f375e9a0Smike-mwhich is neither preceded nor followed by
297*f375e9a0Smike-mword characters.
298*f375e9a0Smike-mA word character is an
299*f375e9a0Smike-m.Em alnum
300*f375e9a0Smike-mcharacter (as defined by
301*f375e9a0Smike-m.Xr ctype 3 )
302*f375e9a0Smike-mor an underscore.
303*f375e9a0Smike-mThis is an extension,
304*f375e9a0Smike-mcompatible with but not specified by POSIX,
305*f375e9a0Smike-mand should be used with
306*f375e9a0Smike-mcaution in software intended to be portable to other systems.
307*f375e9a0Smike-m.Pp
308*f375e9a0Smike-mIn the event that an RE could match more than one substring of a given
309*f375e9a0Smike-mstring,
310*f375e9a0Smike-mthe RE matches the one starting earliest in the string.
311*f375e9a0Smike-mIf the RE could match more than one substring starting at that point,
312*f375e9a0Smike-mit matches the longest.
313*f375e9a0Smike-mSubexpressions also match the longest possible substrings, subject to
314*f375e9a0Smike-mthe constraint that the whole match be as long as possible,
315*f375e9a0Smike-mwith subexpressions starting earlier in the RE taking priority over
316*f375e9a0Smike-mones starting later.
317*f375e9a0Smike-mNote that higher-level subexpressions thus take priority over
318*f375e9a0Smike-mtheir lower-level component subexpressions.
319*f375e9a0Smike-m.Pp
320*f375e9a0Smike-mMatch lengths are measured in characters, not collating elements.
321*f375e9a0Smike-mA null string is considered longer than no match at all.
322*f375e9a0Smike-mFor example,
323*f375e9a0Smike-m.Sq bb*
324*f375e9a0Smike-mmatches the three middle characters of
325*f375e9a0Smike-m.Sq abbbc ;
326*f375e9a0Smike-m.Sq (wee|week)(knights|nights)
327*f375e9a0Smike-mmatches all ten characters of
328*f375e9a0Smike-m.Sq weeknights ;
329*f375e9a0Smike-mwhen
330*f375e9a0Smike-m.Sq (.*).*
331*f375e9a0Smike-mis matched against
332*f375e9a0Smike-m.Sq abc ,
333*f375e9a0Smike-mthe parenthesized subexpression matches all three characters;
334*f375e9a0Smike-mand when
335*f375e9a0Smike-m.Sq (a*)*
336*f375e9a0Smike-mis matched against
337*f375e9a0Smike-m.Sq bc ,
338*f375e9a0Smike-mboth the whole RE and the parenthesized subexpression match the null string.
339*f375e9a0Smike-m.Pp
340*f375e9a0Smike-mIf case-independent matching is specified,
341*f375e9a0Smike-mthe effect is much as if all case distinctions had vanished from the
342*f375e9a0Smike-malphabet.
343*f375e9a0Smike-mWhen an alphabetic that exists in multiple cases appears as an
344*f375e9a0Smike-mordinary character outside a bracket expression, it is effectively
345*f375e9a0Smike-mtransformed into a bracket expression containing both cases,
346*f375e9a0Smike-me.g.\&
347*f375e9a0Smike-m.Sq x
348*f375e9a0Smike-mbecomes
349*f375e9a0Smike-m.Sq [xX] .
350*f375e9a0Smike-mWhen it appears inside a bracket expression,
351*f375e9a0Smike-mall case counterparts of it are added to the bracket expression,
352*f375e9a0Smike-mso that, for example,
353*f375e9a0Smike-m.Sq [x]
354*f375e9a0Smike-mbecomes
355*f375e9a0Smike-m.Sq [xX]
356*f375e9a0Smike-mand
357*f375e9a0Smike-m.Sq [^x]
358*f375e9a0Smike-mbecomes
359*f375e9a0Smike-m.Sq [^xX] .
360*f375e9a0Smike-m.Pp
361*f375e9a0Smike-mNo particular limit is imposed on the length of REs**.
362*f375e9a0Smike-mPrograms intended to be portable should not employ REs longer
363*f375e9a0Smike-mthan 256 bytes,
364*f375e9a0Smike-mas an implementation can refuse to accept such REs and remain
365*f375e9a0Smike-mPOSIX-compliant.
366*f375e9a0Smike-m.Pp
367*f375e9a0Smike-mThe following is a list of extended regular expressions:
368*f375e9a0Smike-m.Bl -tag -width Ds
369*f375e9a0Smike-m.It Ar c
370*f375e9a0Smike-mAny character
371*f375e9a0Smike-m.Ar c
372*f375e9a0Smike-mnot listed below matches itself.
373*f375e9a0Smike-m.It \e Ns Ar c
374*f375e9a0Smike-mAny backslash-escaped character
375*f375e9a0Smike-m.Ar c
376*f375e9a0Smike-mmatches itself.
377*f375e9a0Smike-m.It \&.
378*f375e9a0Smike-mMatches any single character that is not a newline
379*f375e9a0Smike-m.Pq Sq \en .
380*f375e9a0Smike-m.It Bq Ar char-class
381*f375e9a0Smike-mMatches any single character in
382*f375e9a0Smike-m.Ar char-class .
383*f375e9a0Smike-mTo include a
384*f375e9a0Smike-m.Ql \&]
385*f375e9a0Smike-min
386*f375e9a0Smike-m.Ar char-class ,
387*f375e9a0Smike-mit must be the first character.
388*f375e9a0Smike-mA range of characters may be specified by separating the end characters
389*f375e9a0Smike-mof the range with a
390*f375e9a0Smike-m.Ql - ;
391*f375e9a0Smike-me.g.\&
392*f375e9a0Smike-m.Ar a-z
393*f375e9a0Smike-mspecifies the lower case characters.
394*f375e9a0Smike-mThe following literal expressions can also be used in
395*f375e9a0Smike-m.Ar char-class
396*f375e9a0Smike-mto specify sets of characters:
397*f375e9a0Smike-m.Bd -unfilled -offset indent
398*f375e9a0Smike-m[:alnum:] [:cntrl:] [:lower:] [:space:]
399*f375e9a0Smike-m[:alpha:] [:digit:] [:print:] [:upper:]
400*f375e9a0Smike-m[:blank:] [:graph:] [:punct:] [:xdigit:]
401*f375e9a0Smike-m.Ed
402*f375e9a0Smike-m.Pp
403*f375e9a0Smike-mIf
404*f375e9a0Smike-m.Ql -
405*f375e9a0Smike-mappears as the first or last character of
406*f375e9a0Smike-m.Ar char-class ,
407*f375e9a0Smike-mthen it matches itself.
408*f375e9a0Smike-mAll other characters in
409*f375e9a0Smike-m.Ar char-class
410*f375e9a0Smike-mmatch themselves.
411*f375e9a0Smike-m.Pp
412*f375e9a0Smike-mPatterns in
413*f375e9a0Smike-m.Ar char-class
414*f375e9a0Smike-mof the form
415*f375e9a0Smike-m.Eo [.
416*f375e9a0Smike-m.Ar col-elm
417*f375e9a0Smike-m.Ec .]\&
418*f375e9a0Smike-mor
419*f375e9a0Smike-m.Eo [=
420*f375e9a0Smike-m.Ar col-elm
421*f375e9a0Smike-m.Ec =]\& ,
422*f375e9a0Smike-mwhere
423*f375e9a0Smike-m.Ar col-elm
424*f375e9a0Smike-mis a collating element, are interpreted according to
425*f375e9a0Smike-m.Xr setlocale 3
426*f375e9a0Smike-m.Pq not currently supported .
427*f375e9a0Smike-m.It Bq ^ Ns Ar char-class
428*f375e9a0Smike-mMatches any single character, other than newline, not in
429*f375e9a0Smike-m.Ar char-class .
430*f375e9a0Smike-m.Ar char-class
431*f375e9a0Smike-mis defined as above.
432*f375e9a0Smike-m.It ^
433*f375e9a0Smike-mIf
434*f375e9a0Smike-m.Sq ^
435*f375e9a0Smike-mis the first character of a regular expression, then it
436*f375e9a0Smike-manchors the regular expression to the beginning of a line.
437*f375e9a0Smike-mOtherwise, it matches itself.
438*f375e9a0Smike-m.It $
439*f375e9a0Smike-mIf
440*f375e9a0Smike-m.Sq $
441*f375e9a0Smike-mis the last character of a regular expression,
442*f375e9a0Smike-mit anchors the regular expression to the end of a line.
443*f375e9a0Smike-mOtherwise, it matches itself.
444*f375e9a0Smike-m.It [[:<:]]
445*f375e9a0Smike-mAnchors the single character regular expression or subexpression
446*f375e9a0Smike-mimmediately following it to the beginning of a word.
447*f375e9a0Smike-m.It [[:>:]]
448*f375e9a0Smike-mAnchors the single character regular expression or subexpression
449*f375e9a0Smike-mimmediately following it to the end of a word.
450*f375e9a0Smike-m.It Pq Ar re
451*f375e9a0Smike-mDefines a subexpression
452*f375e9a0Smike-m.Ar re .
453*f375e9a0Smike-mAny set of characters enclosed in parentheses
454*f375e9a0Smike-mmatches whatever the set of characters without parentheses matches
455*f375e9a0Smike-m(that is a long-winded way of saying the constructs
456*f375e9a0Smike-m.Sq (re)
457*f375e9a0Smike-mand
458*f375e9a0Smike-m.Sq re
459*f375e9a0Smike-mmatch identically).
460*f375e9a0Smike-m.It *
461*f375e9a0Smike-mMatches the single character regular expression or subexpression
462*f375e9a0Smike-mimmediately preceding it zero or more times.
463*f375e9a0Smike-mIf
464*f375e9a0Smike-m.Sq *
465*f375e9a0Smike-mis the first character of a regular expression or subexpression,
466*f375e9a0Smike-mthen it matches itself.
467*f375e9a0Smike-mThe
468*f375e9a0Smike-m.Sq *
469*f375e9a0Smike-moperator sometimes yields unexpected results.
470*f375e9a0Smike-mFor example, the regular expression
471*f375e9a0Smike-m.Ar b*
472*f375e9a0Smike-mmatches the beginning of the string
473*f375e9a0Smike-m.Qq abbb
474*f375e9a0Smike-m(as opposed to the substring
475*f375e9a0Smike-m.Qq bbb ) ,
476*f375e9a0Smike-msince a null match is the only leftmost match.
477*f375e9a0Smike-m.It +
478*f375e9a0Smike-mMatches the singular character regular expression
479*f375e9a0Smike-mor subexpression immediately preceding it
480*f375e9a0Smike-mone or more times.
481*f375e9a0Smike-m.It ?
482*f375e9a0Smike-mMatches the singular character regular expression
483*f375e9a0Smike-mor subexpression immediately preceding it
484*f375e9a0Smike-m0 or 1 times.
485*f375e9a0Smike-m.Sm off
486*f375e9a0Smike-m.It Xo
487*f375e9a0Smike-m.Pf { Ar n , m No }\ \&
488*f375e9a0Smike-m.Pf { Ar n , No }\ \&
489*f375e9a0Smike-m.Pf { Ar n No }
490*f375e9a0Smike-m.Xc
491*f375e9a0Smike-m.Sm on
492*f375e9a0Smike-mMatches the single character regular expression or subexpression
493*f375e9a0Smike-mimmediately preceding it at least
494*f375e9a0Smike-m.Ar n
495*f375e9a0Smike-mand at most
496*f375e9a0Smike-m.Ar m
497*f375e9a0Smike-mtimes.
498*f375e9a0Smike-mIf
499*f375e9a0Smike-m.Ar m
500*f375e9a0Smike-mis omitted, then it matches at least
501*f375e9a0Smike-m.Ar n
502*f375e9a0Smike-mtimes.
503*f375e9a0Smike-mIf the comma is also omitted, then it matches exactly
504*f375e9a0Smike-m.Ar n
505*f375e9a0Smike-mtimes.
506*f375e9a0Smike-m.It \*(Ba
507*f375e9a0Smike-mUsed to separate patterns.
508*f375e9a0Smike-mFor example,
509*f375e9a0Smike-mthe pattern
510*f375e9a0Smike-m.Sq cat\*(Badog
511*f375e9a0Smike-mmatches either
512*f375e9a0Smike-m.Sq cat
513*f375e9a0Smike-mor
514*f375e9a0Smike-m.Sq dog .
515*f375e9a0Smike-m.El
516*f375e9a0Smike-m.Sh BASIC REGULAR EXPRESSIONS
517*f375e9a0Smike-mBasic regular expressions differ in several respects:
518*f375e9a0Smike-m.Bl -bullet -offset 3n
519*f375e9a0Smike-m.It
520*f375e9a0Smike-m.Sq \*(Ba ,
521*f375e9a0Smike-m.Sq + ,
522*f375e9a0Smike-mand
523*f375e9a0Smike-m.Sq ?\&
524*f375e9a0Smike-mare ordinary characters and there is no equivalent
525*f375e9a0Smike-mfor their functionality.
526*f375e9a0Smike-m.It
527*f375e9a0Smike-mThe delimiters for bounds are
528*f375e9a0Smike-m.Sq \e{
529*f375e9a0Smike-mand
530*f375e9a0Smike-m.Sq \e} ,
531*f375e9a0Smike-mwith
532*f375e9a0Smike-m.Sq {
533*f375e9a0Smike-mand
534*f375e9a0Smike-m.Sq }
535*f375e9a0Smike-mby themselves ordinary characters.
536*f375e9a0Smike-m.It
537*f375e9a0Smike-mThe parentheses for nested subexpressions are
538*f375e9a0Smike-m.Sq \e(
539*f375e9a0Smike-mand
540*f375e9a0Smike-m.Sq \e) ,
541*f375e9a0Smike-mwith
542*f375e9a0Smike-m.Sq (
543*f375e9a0Smike-mand
544*f375e9a0Smike-m.Sq )\&
545*f375e9a0Smike-mby themselves ordinary characters.
546*f375e9a0Smike-m.It
547*f375e9a0Smike-m.Sq ^
548*f375e9a0Smike-mis an ordinary character except at the beginning of the
549*f375e9a0Smike-mRE or** the beginning of a parenthesized subexpression.
550*f375e9a0Smike-m.It
551*f375e9a0Smike-m.Sq $
552*f375e9a0Smike-mis an ordinary character except at the end of the
553*f375e9a0Smike-mRE or** the end of a parenthesized subexpression.
554*f375e9a0Smike-m.It
555*f375e9a0Smike-m.Sq *
556*f375e9a0Smike-mis an ordinary character if it appears at the beginning of the
557*f375e9a0Smike-mRE or the beginning of a parenthesized subexpression
558*f375e9a0Smike-m(after a possible leading
559*f375e9a0Smike-m.Sq ^ ) .
560*f375e9a0Smike-m.It
561*f375e9a0Smike-mFinally, there is one new type of atom, a
562*f375e9a0Smike-m.Em back-reference :
563*f375e9a0Smike-m.Sq \e
564*f375e9a0Smike-mfollowed by a non-zero decimal digit
565*f375e9a0Smike-m.Ar d
566*f375e9a0Smike-mmatches the same sequence of characters matched by the
567*f375e9a0Smike-m.Ar d Ns th
568*f375e9a0Smike-mparenthesized subexpression
569*f375e9a0Smike-m(numbering subexpressions by the positions of their opening parentheses,
570*f375e9a0Smike-mleft to right),
571*f375e9a0Smike-mso that, for example,
572*f375e9a0Smike-m.Sq \e([bc]\e)\e1
573*f375e9a0Smike-mmatches
574*f375e9a0Smike-m.Sq bb\&
575*f375e9a0Smike-mor
576*f375e9a0Smike-m.Sq cc
577*f375e9a0Smike-mbut not
578*f375e9a0Smike-m.Sq bc .
579*f375e9a0Smike-m.El
580*f375e9a0Smike-m.Pp
581*f375e9a0Smike-mThe following is a list of basic regular expressions:
582*f375e9a0Smike-m.Bl -tag -width Ds
583*f375e9a0Smike-m.It Ar c
584*f375e9a0Smike-mAny character
585*f375e9a0Smike-m.Ar c
586*f375e9a0Smike-mnot listed below matches itself.
587*f375e9a0Smike-m.It \e Ns Ar c
588*f375e9a0Smike-mAny backslash-escaped character
589*f375e9a0Smike-m.Ar c ,
590*f375e9a0Smike-mexcept for
591*f375e9a0Smike-m.Sq { ,
592*f375e9a0Smike-m.Sq } ,
593*f375e9a0Smike-m.Sq \&( ,
594*f375e9a0Smike-mand
595*f375e9a0Smike-m.Sq \&) ,
596*f375e9a0Smike-mmatches itself.
597*f375e9a0Smike-m.It \&.
598*f375e9a0Smike-mMatches any single character that is not a newline
599*f375e9a0Smike-m.Pq Sq \en .
600*f375e9a0Smike-m.It Bq Ar char-class
601*f375e9a0Smike-mMatches any single character in
602*f375e9a0Smike-m.Ar char-class .
603*f375e9a0Smike-mTo include a
604*f375e9a0Smike-m.Ql \&]
605*f375e9a0Smike-min
606*f375e9a0Smike-m.Ar char-class ,
607*f375e9a0Smike-mit must be the first character.
608*f375e9a0Smike-mA range of characters may be specified by separating the end characters
609*f375e9a0Smike-mof the range with a
610*f375e9a0Smike-m.Ql - ;
611*f375e9a0Smike-me.g.\&
612*f375e9a0Smike-m.Ar a-z
613*f375e9a0Smike-mspecifies the lower case characters.
614*f375e9a0Smike-mThe following literal expressions can also be used in
615*f375e9a0Smike-m.Ar char-class
616*f375e9a0Smike-mto specify sets of characters:
617*f375e9a0Smike-m.Bd -unfilled -offset indent
618*f375e9a0Smike-m[:alnum:] [:cntrl:] [:lower:] [:space:]
619*f375e9a0Smike-m[:alpha:] [:digit:] [:print:] [:upper:]
620*f375e9a0Smike-m[:blank:] [:graph:] [:punct:] [:xdigit:]
621*f375e9a0Smike-m.Ed
622*f375e9a0Smike-m.Pp
623*f375e9a0Smike-mIf
624*f375e9a0Smike-m.Ql -
625*f375e9a0Smike-mappears as the first or last character of
626*f375e9a0Smike-m.Ar char-class ,
627*f375e9a0Smike-mthen it matches itself.
628*f375e9a0Smike-mAll other characters in
629*f375e9a0Smike-m.Ar char-class
630*f375e9a0Smike-mmatch themselves.
631*f375e9a0Smike-m.Pp
632*f375e9a0Smike-mPatterns in
633*f375e9a0Smike-m.Ar char-class
634*f375e9a0Smike-mof the form
635*f375e9a0Smike-m.Eo [.
636*f375e9a0Smike-m.Ar col-elm
637*f375e9a0Smike-m.Ec .]\&
638*f375e9a0Smike-mor
639*f375e9a0Smike-m.Eo [=
640*f375e9a0Smike-m.Ar col-elm
641*f375e9a0Smike-m.Ec =]\& ,
642*f375e9a0Smike-mwhere
643*f375e9a0Smike-m.Ar col-elm
644*f375e9a0Smike-mis a collating element, are interpreted according to
645*f375e9a0Smike-m.Xr setlocale 3
646*f375e9a0Smike-m.Pq not currently supported .
647*f375e9a0Smike-m.It Bq ^ Ns Ar char-class
648*f375e9a0Smike-mMatches any single character, other than newline, not in
649*f375e9a0Smike-m.Ar char-class .
650*f375e9a0Smike-m.Ar char-class
651*f375e9a0Smike-mis defined as above.
652*f375e9a0Smike-m.It ^
653*f375e9a0Smike-mIf
654*f375e9a0Smike-m.Sq ^
655*f375e9a0Smike-mis the first character of a regular expression, then it
656*f375e9a0Smike-manchors the regular expression to the beginning of a line.
657*f375e9a0Smike-mOtherwise, it matches itself.
658*f375e9a0Smike-m.It $
659*f375e9a0Smike-mIf
660*f375e9a0Smike-m.Sq $
661*f375e9a0Smike-mis the last character of a regular expression,
662*f375e9a0Smike-mit anchors the regular expression to the end of a line.
663*f375e9a0Smike-mOtherwise, it matches itself.
664*f375e9a0Smike-m.It [[:<:]]
665*f375e9a0Smike-mAnchors the single character regular expression or subexpression
666*f375e9a0Smike-mimmediately following it to the beginning of a word.
667*f375e9a0Smike-m.It [[:>:]]
668*f375e9a0Smike-mAnchors the single character regular expression or subexpression
669*f375e9a0Smike-mimmediately following it to the end of a word.
670*f375e9a0Smike-m.It \e( Ns Ar re Ns \e)
671*f375e9a0Smike-mDefines a subexpression
672*f375e9a0Smike-m.Ar re .
673*f375e9a0Smike-mSubexpressions may be nested.
674*f375e9a0Smike-mA subsequent backreference of the form
675*f375e9a0Smike-m.Pf \e Ns Ar n ,
676*f375e9a0Smike-mwhere
677*f375e9a0Smike-m.Ar n
678*f375e9a0Smike-mis a number in the range [1,9], expands to the text matched by the
679*f375e9a0Smike-m.Ar n Ns th
680*f375e9a0Smike-msubexpression.
681*f375e9a0Smike-mFor example, the regular expression
682*f375e9a0Smike-m.Ar \e(.*\e)\e1
683*f375e9a0Smike-mmatches any string consisting of identical adjacent substrings.
684*f375e9a0Smike-mSubexpressions are ordered relative to their left delimiter.
685*f375e9a0Smike-m.It *
686*f375e9a0Smike-mMatches the single character regular expression or subexpression
687*f375e9a0Smike-mimmediately preceding it zero or more times.
688*f375e9a0Smike-mIf
689*f375e9a0Smike-m.Sq *
690*f375e9a0Smike-mis the first character of a regular expression or subexpression,
691*f375e9a0Smike-mthen it matches itself.
692*f375e9a0Smike-mThe
693*f375e9a0Smike-m.Sq *
694*f375e9a0Smike-moperator sometimes yields unexpected results.
695*f375e9a0Smike-mFor example, the regular expression
696*f375e9a0Smike-m.Ar b*
697*f375e9a0Smike-mmatches the beginning of the string
698*f375e9a0Smike-m.Qq abbb
699*f375e9a0Smike-m(as opposed to the substring
700*f375e9a0Smike-m.Qq bbb ) ,
701*f375e9a0Smike-msince a null match is the only leftmost match.
702*f375e9a0Smike-m.Sm off
703*f375e9a0Smike-m.It Xo
704*f375e9a0Smike-m.Pf \e{ Ar n , m No \e}\ \&
705*f375e9a0Smike-m.Pf \e{ Ar n , No \e}\ \&
706*f375e9a0Smike-m.Pf \e{ Ar n No \e}
707*f375e9a0Smike-m.Xc
708*f375e9a0Smike-m.Sm on
709*f375e9a0Smike-mMatches the single character regular expression or subexpression
710*f375e9a0Smike-mimmediately preceding it at least
711*f375e9a0Smike-m.Ar n
712*f375e9a0Smike-mand at most
713*f375e9a0Smike-m.Ar m
714*f375e9a0Smike-mtimes.
715*f375e9a0Smike-mIf
716*f375e9a0Smike-m.Ar m
717*f375e9a0Smike-mis omitted, then it matches at least
718*f375e9a0Smike-m.Ar n
719*f375e9a0Smike-mtimes.
720*f375e9a0Smike-mIf the comma is also omitted, then it matches exactly
721*f375e9a0Smike-m.Ar n
722*f375e9a0Smike-mtimes.
723*f375e9a0Smike-m.El
724*f375e9a0Smike-m.Sh SEE ALSO
725*f375e9a0Smike-m.Xr ctype 3 ,
726*f375e9a0Smike-m.Xr regex 3
727*f375e9a0Smike-m.Sh STANDARDS
728*f375e9a0Smike-m.St -p1003.1-2004 :
729*f375e9a0Smike-mBase Definitions, Chapter 9 (Regular Expressions).
730*f375e9a0Smike-m.Sh BUGS
731*f375e9a0Smike-mHaving two kinds of REs is a botch.
732*f375e9a0Smike-m.Pp
733*f375e9a0Smike-mThe current POSIX spec says that
734*f375e9a0Smike-m.Sq )\&
735*f375e9a0Smike-mis an ordinary character in the absence of an unmatched
736*f375e9a0Smike-m.Sq ( ;
737*f375e9a0Smike-mthis was an unintentional result of a wording error,
738*f375e9a0Smike-mand change is likely.
739*f375e9a0Smike-mAvoid relying on it.
740*f375e9a0Smike-m.Pp
741*f375e9a0Smike-mBack-references are a dreadful botch,
742*f375e9a0Smike-mposing major problems for efficient implementations.
743*f375e9a0Smike-mThey are also somewhat vaguely defined
744*f375e9a0Smike-m(does
745*f375e9a0Smike-m.Sq a\e(\e(b\e)*\e2\e)*d
746*f375e9a0Smike-mmatch
747*f375e9a0Smike-m.Sq abbbd ? ) .
748*f375e9a0Smike-mAvoid using them.
749*f375e9a0Smike-m.Pp
750*f375e9a0Smike-mPOSIX's specification of case-independent matching is vague.
751*f375e9a0Smike-mThe
752*f375e9a0Smike-m.Dq one case implies all cases
753*f375e9a0Smike-mdefinition given above
754*f375e9a0Smike-mis the current consensus among implementors as to the right interpretation.
755*f375e9a0Smike-m.Pp
756*f375e9a0Smike-mThe syntax for word boundaries is incredibly ugly.
757