xref: /illumos-gate/usr/src/man/man7/regex.7 (revision bbf215553c7233fbab8a0afdf1fac74c44781867)
1*bbf21555SRichard Lowe.\"
2*bbf21555SRichard Lowe.\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
3*bbf21555SRichard Lowe.\" permission to reproduce portions of its copyrighted documentation.
4*bbf21555SRichard Lowe.\" Original documentation from The Open Group can be obtained online at
5*bbf21555SRichard Lowe.\" http://www.opengroup.org/bookstore/.
6*bbf21555SRichard Lowe.\"
7*bbf21555SRichard Lowe.\" The Institute of Electrical and Electronics Engineers and The Open
8*bbf21555SRichard Lowe.\" Group, have given us permission to reprint portions of their
9*bbf21555SRichard Lowe.\" documentation.
10*bbf21555SRichard Lowe.\"
11*bbf21555SRichard Lowe.\" In the following statement, the phrase ``this text'' refers to portions
12*bbf21555SRichard Lowe.\" of the system documentation.
13*bbf21555SRichard Lowe.\"
14*bbf21555SRichard Lowe.\" Portions of this text are reprinted and reproduced in electronic form
15*bbf21555SRichard Lowe.\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
16*bbf21555SRichard Lowe.\" Standard for Information Technology -- Portable Operating System
17*bbf21555SRichard Lowe.\" Interface (POSIX), The Open Group Base Specifications Issue 6,
18*bbf21555SRichard Lowe.\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
19*bbf21555SRichard Lowe.\" Engineers, Inc and The Open Group.  In the event of any discrepancy
20*bbf21555SRichard Lowe.\" between these versions and the original IEEE and The Open Group
21*bbf21555SRichard Lowe.\" Standard, the original IEEE and The Open Group Standard is the referee
22*bbf21555SRichard Lowe.\" document.  The original Standard can be obtained online at
23*bbf21555SRichard Lowe.\" http://www.opengroup.org/unix/online.html.
24*bbf21555SRichard Lowe.\"
25*bbf21555SRichard Lowe.\" This notice shall appear on any product containing this material.
26*bbf21555SRichard Lowe.\"
27*bbf21555SRichard Lowe.\" The contents of this file are subject to the terms of the
28*bbf21555SRichard Lowe.\" Common Development and Distribution License (the "License").
29*bbf21555SRichard Lowe.\" You may not use this file except in compliance with the License.
30*bbf21555SRichard Lowe.\"
31*bbf21555SRichard Lowe.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
32*bbf21555SRichard Lowe.\" or http://www.opensolaris.org/os/licensing.
33*bbf21555SRichard Lowe.\" See the License for the specific language governing permissions
34*bbf21555SRichard Lowe.\" and limitations under the License.
35*bbf21555SRichard Lowe.\"
36*bbf21555SRichard Lowe.\" When distributing Covered Code, include this CDDL HEADER in each
37*bbf21555SRichard Lowe.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
38*bbf21555SRichard Lowe.\" If applicable, add the following below this CDDL HEADER, with the
39*bbf21555SRichard Lowe.\" fields enclosed by brackets "[]" replaced with your own identifying
40*bbf21555SRichard Lowe.\" information: Portions Copyright [yyyy] [name of copyright owner]
41*bbf21555SRichard Lowe.\"
42*bbf21555SRichard Lowe.\"
43*bbf21555SRichard Lowe.\" Copyright (c) 1992, X/Open Company Limited  All Rights Reserved
44*bbf21555SRichard Lowe.\" Portions Copyright (c) 1999, Sun Microsystems, Inc.  All Rights Reserved
45*bbf21555SRichard Lowe.\" Copyright 2017 Nexenta Systems, Inc.
46*bbf21555SRichard Lowe.\"
47*bbf21555SRichard Lowe.Dd August 14, 2020
48*bbf21555SRichard Lowe.Dt REGEX 7
49*bbf21555SRichard Lowe.Os
50*bbf21555SRichard Lowe.Sh NAME
51*bbf21555SRichard Lowe.Nm regex
52*bbf21555SRichard Lowe.Nd internationalized basic and extended regular expression matching
53*bbf21555SRichard Lowe.Sh DESCRIPTION
54*bbf21555SRichard LoweRegular Expressions
55*bbf21555SRichard Lowe.Pq REs
56*bbf21555SRichard Loweprovide a mechanism to select specific strings from a set of character strings.
57*bbf21555SRichard LoweThe Internationalized Regular Expressions described below differ from the Simple
58*bbf21555SRichard LoweRegular Expressions described on the
59*bbf21555SRichard Lowe.Xr regexp 7
60*bbf21555SRichard Lowemanual page in the following ways:
61*bbf21555SRichard Lowe.Bl -bullet
62*bbf21555SRichard Lowe.It
63*bbf21555SRichard Loweboth Basic and Extended Regular Expressions are supported
64*bbf21555SRichard Lowe.It
65*bbf21555SRichard Lowethe Internationalization features -- character class, equivalence class, and
66*bbf21555SRichard Lowemulti-character collation -- are supported.
67*bbf21555SRichard Lowe.El
68*bbf21555SRichard Lowe.Pp
69*bbf21555SRichard LoweThe Basic Regular Expression
70*bbf21555SRichard Lowe.Pq BRE
71*bbf21555SRichard Lowenotation and construction rules described in the
72*bbf21555SRichard Lowe.Sx BASIC REGULAR EXPRESSIONS
73*bbf21555SRichard Lowesection apply to most utilities supporting regular expressions.
74*bbf21555SRichard LoweSome utilities, instead, support the Extended Regular Expressions
75*bbf21555SRichard Lowe.Pq ERE
76*bbf21555SRichard Lowedescribed in the
77*bbf21555SRichard Lowe.Sx EXTENDED REGULAR EXPRESSIONS
78*bbf21555SRichard Lowesection; any exceptions for both cases are noted in the descriptions of the
79*bbf21555SRichard Lowespecific utilities using regular expressions.
80*bbf21555SRichard LoweBoth BREs and EREs are supported by the Regular Expression Matching interfaces
81*bbf21555SRichard Lowe.Xr regcomp 3C
82*bbf21555SRichard Loweand
83*bbf21555SRichard Lowe.Xr regexec 3C .
84*bbf21555SRichard Lowe.Sh BASIC REGULAR EXPRESSIONS
85*bbf21555SRichard Lowe.Ss BREs Matching a Single Character
86*bbf21555SRichard LoweA BRE ordinary character, a special character preceded by a backslash, or a
87*bbf21555SRichard Loweperiod matches a single character.
88*bbf21555SRichard LoweA bracket expression matches a single character or a single collating element.
89*bbf21555SRichard LoweSee
90*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
91*bbf21555SRichard Lowebelow.
92*bbf21555SRichard Lowe.Ss BRE Ordinary Characters
93*bbf21555SRichard LoweAn ordinary character is a BRE that matches itself: any character in the
94*bbf21555SRichard Lowesupported character set, except for the BRE special characters listed in
95*bbf21555SRichard Lowe.Sx BRE Special Characters ,
96*bbf21555SRichard Lowebelow.
97*bbf21555SRichard Lowe.Pp
98*bbf21555SRichard LoweThe interpretation of an ordinary character preceded by a backslash
99*bbf21555SRichard Lowe.Pq Qq \e
100*bbf21555SRichard Loweis undefined, except for:
101*bbf21555SRichard Lowe.Bl -enum
102*bbf21555SRichard Lowe.It
103*bbf21555SRichard Lowethe characters
104*bbf21555SRichard Lowe.Qq \&) ,
105*bbf21555SRichard Lowe.Qq \&( ,
106*bbf21555SRichard Lowe.Qq { ,
107*bbf21555SRichard Loweand
108*bbf21555SRichard Lowe.Qq }
109*bbf21555SRichard Lowe.It
110*bbf21555SRichard Lowethe digits 1 to 9 inclusive
111*bbf21555SRichard Lowe.Po see
112*bbf21555SRichard Lowe.Sx BREs Matching Multiple Characters ,
113*bbf21555SRichard Lowebelow
114*bbf21555SRichard Lowe.Pc
115*bbf21555SRichard Lowe.It
116*bbf21555SRichard Lowea character inside a bracket expression.
117*bbf21555SRichard Lowe.El
118*bbf21555SRichard Lowe.Ss BRE Special Characters
119*bbf21555SRichard LoweA BRE special character has special properties in certain contexts.
120*bbf21555SRichard LoweOutside those contexts, or when preceded by a backslash, such a character will
121*bbf21555SRichard Lowebe a BRE that matches the special character itself.
122*bbf21555SRichard LoweThe BRE special characters and the contexts in which they have their special
123*bbf21555SRichard Lowemeaning are:
124*bbf21555SRichard Lowe.Bl -tag -width Ds
125*bbf21555SRichard Lowe.It Sy \&. \&[ \&\e
126*bbf21555SRichard LoweThe period, left-bracket, and backslash are special except when used in a
127*bbf21555SRichard Lowebracket expression
128*bbf21555SRichard Lowe.Po see
129*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
130*bbf21555SRichard Lowebelow
131*bbf21555SRichard Lowe.Pc .
132*bbf21555SRichard LoweAn expression containing a
133*bbf21555SRichard Lowe.Qq \&[
134*bbf21555SRichard Lowethat is not preceded by a backslash and is not part of a bracket expression
135*bbf21555SRichard Loweproduces undefined results.
136*bbf21555SRichard Lowe.It Sy *
137*bbf21555SRichard LoweThe asterisk is special except when used:
138*bbf21555SRichard Lowe.Bl -bullet
139*bbf21555SRichard Lowe.It
140*bbf21555SRichard Lowein a bracket expression
141*bbf21555SRichard Lowe.It
142*bbf21555SRichard Loweas the first character of an entire BRE
143*bbf21555SRichard Lowe.Po after an initial
144*bbf21555SRichard Lowe.Qq ^ ,
145*bbf21555SRichard Loweif any
146*bbf21555SRichard Lowe.Pc
147*bbf21555SRichard Lowe.It
148*bbf21555SRichard Loweas the first character of a subexpression
149*bbf21555SRichard Lowe.Po after an initial
150*bbf21555SRichard Lowe.Qq ^ ,
151*bbf21555SRichard Loweif any; see
152*bbf21555SRichard Lowe.Sx BREs Matching Multiple Characters ,
153*bbf21555SRichard Lowebelow
154*bbf21555SRichard Lowe.Pc .
155*bbf21555SRichard Lowe.El
156*bbf21555SRichard Lowe.It Sy ^
157*bbf21555SRichard LoweThe circumflex is special when used:
158*bbf21555SRichard Lowe.Bl -bullet
159*bbf21555SRichard Lowe.It
160*bbf21555SRichard Loweas an anchor
161*bbf21555SRichard Lowe.Po see
162*bbf21555SRichard Lowe.Sx BRE Expression Anchoring ,
163*bbf21555SRichard Lowebelow
164*bbf21555SRichard Lowe.Pc .
165*bbf21555SRichard Lowe.It
166*bbf21555SRichard Loweas the first character of a bracket expression
167*bbf21555SRichard Lowe.Po see
168*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
169*bbf21555SRichard Lowebelow
170*bbf21555SRichard Lowe.Pc .
171*bbf21555SRichard Lowe.El
172*bbf21555SRichard Lowe.It Sy $
173*bbf21555SRichard LoweThe dollar sign is special when used as an anchor.
174*bbf21555SRichard Lowe.El
175*bbf21555SRichard Lowe.Ss Periods in BREs
176*bbf21555SRichard LoweA period
177*bbf21555SRichard Lowe.Pq Qq \&. ,
178*bbf21555SRichard Lowewhen used outside a bracket expression, is a BRE that matches any character in
179*bbf21555SRichard Lowethe supported character set except NUL.
180*bbf21555SRichard Lowe.Ss RE Bracket Expression
181*bbf21555SRichard LoweA bracket expression
182*bbf21555SRichard Lowe.Po an expression enclosed in square brackets,
183*bbf21555SRichard Lowe.Qq []
184*bbf21555SRichard Lowe.Pc
185*bbf21555SRichard Loweis an RE that matches a single collating element contained in the non-empty set
186*bbf21555SRichard Loweof collating elements represented by the bracket expression.
187*bbf21555SRichard Lowe.Pp
188*bbf21555SRichard LoweThe following rules and definitions apply to bracket expressions:
189*bbf21555SRichard Lowe.Bl -enum
190*bbf21555SRichard Lowe.It
191*bbf21555SRichard LoweA
192*bbf21555SRichard Lowe.Em bracket expression
193*bbf21555SRichard Loweis either a matching list expression or a non-matching list expression.
194*bbf21555SRichard LoweIt consists of one or more expressions: collating elements, collating symbols,
195*bbf21555SRichard Loweequivalence classes, character classes, or range expressions
196*bbf21555SRichard Lowe.Pq see rule 7 below .
197*bbf21555SRichard LowePortable applications must not use range expressions, even though all
198*bbf21555SRichard Loweimplementations support them.
199*bbf21555SRichard LoweThe right-bracket
200*bbf21555SRichard Lowe.Pq Qq \&]
201*bbf21555SRichard Loweloses its special meaning and represents itself in a bracket expression if it
202*bbf21555SRichard Loweoccurs first in the list
203*bbf21555SRichard Lowe.Po after an initial circumflex
204*bbf21555SRichard Lowe.Pq Qq ^ ,
205*bbf21555SRichard Loweif any
206*bbf21555SRichard Lowe.Pc .
207*bbf21555SRichard LoweOtherwise, it terminates the bracket expression, unless it appears in a
208*bbf21555SRichard Lowecollating symbol
209*bbf21555SRichard Lowe.Po such as
210*bbf21555SRichard Lowe.Qq [.].]
211*bbf21555SRichard Lowe.Pc
212*bbf21555SRichard Loweor is the ending right-bracket for a collating symbol, equivalence class, or
213*bbf21555SRichard Lowecharacter class.
214*bbf21555SRichard Lowe.Pp
215*bbf21555SRichard LoweThe special characters
216*bbf21555SRichard Lowe.Qq \&. ,
217*bbf21555SRichard Lowe.Qq * ,
218*bbf21555SRichard Lowe.Qq \&[ ,
219*bbf21555SRichard Lowe.Qq \&\e
220*bbf21555SRichard Lowe.Pq period, asterisk, left-bracket and backslash, respectively
221*bbf21555SRichard Lowelose their special meaning within a bracket expression.
222*bbf21555SRichard Lowe.Pp
223*bbf21555SRichard LoweThe character sequences
224*bbf21555SRichard Lowe.Qq [. ,
225*bbf21555SRichard Lowe.Qq [= ,
226*bbf21555SRichard Lowe.Qq [:
227*bbf21555SRichard Lowe.Pq left-bracket followed by a period, equals-sign, or colon
228*bbf21555SRichard Loweare special inside a bracket expression and are used to delimit collating
229*bbf21555SRichard Lowesymbols, equivalence class expressions, and character class expressions.
230*bbf21555SRichard LoweThese symbols must be followed by a valid expression and the matching
231*bbf21555SRichard Loweterminating sequence
232*bbf21555SRichard Lowe.Qq .] ,
233*bbf21555SRichard Lowe.Qq =]
234*bbf21555SRichard Loweor
235*bbf21555SRichard Lowe.Qq :] ,
236*bbf21555SRichard Loweas described in the following items.
237*bbf21555SRichard Lowe.It
238*bbf21555SRichard LoweA
239*bbf21555SRichard Lowe.Em matching list expression
240*bbf21555SRichard Lowespecifies a list that matches any one of the expressions represented in the
241*bbf21555SRichard Lowelist.
242*bbf21555SRichard LoweThe first character in the list must not be the circumflex.
243*bbf21555SRichard LoweFor example,
244*bbf21555SRichard Lowe.Qq [abc]
245*bbf21555SRichard Loweis an RE that matches any of the characters
246*bbf21555SRichard Lowe.Qq a ,
247*bbf21555SRichard Lowe.Qq b
248*bbf21555SRichard Loweor
249*bbf21555SRichard Lowe.Qq c .
250*bbf21555SRichard Lowe.It
251*bbf21555SRichard LoweA
252*bbf21555SRichard Lowe.Em non-matching list expression
253*bbf21555SRichard Lowebegins with a circumflex
254*bbf21555SRichard Lowe.Pq Qq ^ ,
255*bbf21555SRichard Loweand specifies a list that matches any character or collating element except for
256*bbf21555SRichard Lowethe expressions represented in the list after the leading circumflex.
257*bbf21555SRichard LoweFor example,
258*bbf21555SRichard Lowe.Qq [^abc]
259*bbf21555SRichard Loweis an RE that matches any character or collating element except the characters
260*bbf21555SRichard Lowe.Qq a ,
261*bbf21555SRichard Lowe.Qq b ,
262*bbf21555SRichard Loweor
263*bbf21555SRichard Lowe.Qq c .
264*bbf21555SRichard LoweThe circumflex will have this special meaning only when it occurs first in the
265*bbf21555SRichard Lowelist, immediately following the left-bracket.
266*bbf21555SRichard Lowe.It
267*bbf21555SRichard LoweA
268*bbf21555SRichard Lowe.Em collating symbol
269*bbf21555SRichard Loweis a collating element enclosed within bracket-period
270*bbf21555SRichard Lowe.Pq Qq [..]
271*bbf21555SRichard Lowedelimiters.
272*bbf21555SRichard LoweMulti-character collating elements must be represented as collating symbols when
273*bbf21555SRichard Loweit is necessary to distinguish them from a list of the individual characters
274*bbf21555SRichard Lowethat make up the multi-character collating element.
275*bbf21555SRichard LoweFor example, if the string
276*bbf21555SRichard Lowe.Qq ch
277*bbf21555SRichard Loweis a collating element in the current collation sequence with the associated
278*bbf21555SRichard Lowecollating symbol
279*bbf21555SRichard Lowe.Qq Aq ch ,
280*bbf21555SRichard Lowethe expression
281*bbf21555SRichard Lowe.Qq [[.ch.]]
282*bbf21555SRichard Lowewill be treated as an RE matching the character sequence
283*bbf21555SRichard Lowe.Qq ch ,
284*bbf21555SRichard Lowewhile
285*bbf21555SRichard Lowe.Qq [ch]
286*bbf21555SRichard Lowewill be treated as an RE matching
287*bbf21555SRichard Lowe.Qq c
288*bbf21555SRichard Loweor
289*bbf21555SRichard Lowe.Qq h .
290*bbf21555SRichard LoweCollating symbols will be recognized only inside bracket expressions.
291*bbf21555SRichard LoweThis implies that the RE
292*bbf21555SRichard Lowe.Qq [[.ch.]]*c
293*bbf21555SRichard Lowematches the first to fifth character in the string
294*bbf21555SRichard Lowe.Qq chchch.
295*bbf21555SRichard LoweIf the string is not a collating element in the current collating sequence
296*bbf21555SRichard Lowedefinition, or if the collating element has no characters associated with it,
297*bbf21555SRichard Lowethe symbol will be treated as an invalid expression.
298*bbf21555SRichard Lowe.It
299*bbf21555SRichard LoweAn
300*bbf21555SRichard Lowe.Em equivalence class expression
301*bbf21555SRichard Lowerepresents the set of collating elements belonging to an equivalence class.
302*bbf21555SRichard LoweOnly primary equivalence classes will be recognised.
303*bbf21555SRichard LoweThe class is expressed by enclosing any one of the collating elements in the
304*bbf21555SRichard Loweequivalence class within bracket-equal
305*bbf21555SRichard Lowe.Pq Qq [==]
306*bbf21555SRichard Lowedelimiters.
307*bbf21555SRichard LoweFor example, if
308*bbf21555SRichard Lowe.Qq a
309*bbf21555SRichard Loweand
310*bbf21555SRichard Lowe.Qq b
311*bbf21555SRichard Lowebelong to the same equivalence class, then
312*bbf21555SRichard Lowe.Qq [[=a=]b] ,
313*bbf21555SRichard Lowe.Qq [[==]a]
314*bbf21555SRichard Loweand
315*bbf21555SRichard Lowe.Qq [[==]b]
316*bbf21555SRichard Lowewill each be equivalent to
317*bbf21555SRichard Lowe.Qq [ab] .
318*bbf21555SRichard LoweIf the collating element does not belong to an equivalence class, the
319*bbf21555SRichard Loweequivalence class expression will be treated as a
320*bbf21555SRichard Lowe.Em collating symbol .
321*bbf21555SRichard Lowe.It
322*bbf21555SRichard LoweA
323*bbf21555SRichard Lowe.Em character class expression
324*bbf21555SRichard Lowerepresents the set of characters belonging to a character class, as defined in
325*bbf21555SRichard Lowethe
326*bbf21555SRichard Lowe.Ev LC_CTYPE
327*bbf21555SRichard Lowecategory in the current locale.
328*bbf21555SRichard LoweAll character classes specified in the current locale will be recognized.
329*bbf21555SRichard LoweA character class expression is expressed as a character class name enclosed
330*bbf21555SRichard Lowewithin bracket-colon
331*bbf21555SRichard Lowe.Pq Qq [::]
332*bbf21555SRichard Lowedelimiters.
333*bbf21555SRichard Lowe.Pp
334*bbf21555SRichard LoweThe following character class expressions are supported in all locales:
335*bbf21555SRichard Lowe.Bl -column "[:alnum:]" "[:cntrl:]" "[:lower:]" "[:xdigit:]"
336*bbf21555SRichard Lowe.It [:alnum:] Ta [:cntrl:] Ta [:lower:] Ta [:space:]
337*bbf21555SRichard Lowe.It [:alpha:] Ta [:digit:] Ta [:print:] Ta [:upper:]
338*bbf21555SRichard Lowe.It [:blank:] Ta [:graph:] Ta [:punct:] Ta [:xdigit:]
339*bbf21555SRichard Lowe.El
340*bbf21555SRichard Lowe.Pp
341*bbf21555SRichard LoweIn addition, character class expressions of the form
342*bbf21555SRichard Lowe.Qq [:name:]
343*bbf21555SRichard Loweare recognized in those locales where the
344*bbf21555SRichard Lowe.Em name
345*bbf21555SRichard Lowekeyword has been given a
346*bbf21555SRichard Lowe.Em charclass
347*bbf21555SRichard Lowedefinition in the
348*bbf21555SRichard Lowe.Ev LC_CTYPE
349*bbf21555SRichard Lowecategory.
350*bbf21555SRichard Lowe.It
351*bbf21555SRichard LoweA
352*bbf21555SRichard Lowe.Em range expression
353*bbf21555SRichard Lowerepresents the set of collating elements that fall between two elements in the
354*bbf21555SRichard Lowecurrent collation sequence, inclusively.
355*bbf21555SRichard LoweIt is expressed as the starting point and the ending point separated by a hyphen
356*bbf21555SRichard Lowe.Pq Qq - .
357*bbf21555SRichard Lowe.Pp
358*bbf21555SRichard LoweRange expressions must not be used in portable applications because their
359*bbf21555SRichard Lowebehavior is dependent on the collating sequence.
360*bbf21555SRichard LoweRanges will be treated according to the current collating sequence, and include
361*bbf21555SRichard Lowesuch characters that fall within the range based on that collating sequence,
362*bbf21555SRichard Loweregardless of character values.
363*bbf21555SRichard LoweThis, however, means that the interpretation will differ depending on collating
364*bbf21555SRichard Lowesequence.
365*bbf21555SRichard LoweIf, for instance, one collating sequence defines as a variant of
366*bbf21555SRichard Lowe.Qq a ,
367*bbf21555SRichard Lowewhile another defines it as a letter following
368*bbf21555SRichard Lowe.Qq z ,
369*bbf21555SRichard Lowethen the expression
370*bbf21555SRichard Lowe.Qq [-z]
371*bbf21555SRichard Loweis valid in the first language and invalid in the second.
372*bbf21555SRichard Lowe.sp
373*bbf21555SRichard LoweIn the following, all examples assume the collation sequence specified for the
374*bbf21555SRichard LowePOSIX locale, unless another collation sequence is specifically defined.
375*bbf21555SRichard Lowe.Pp
376*bbf21555SRichard LoweThe starting range point and the ending range point must be a collating element
377*bbf21555SRichard Loweor collating symbol.
378*bbf21555SRichard LoweAn equivalence class expression used as a starting or ending point of a range
379*bbf21555SRichard Loweexpression produces unspecified results.
380*bbf21555SRichard LoweAn equivalence class can be used portably within a bracket expression, but only
381*bbf21555SRichard Loweoutside the range.
382*bbf21555SRichard LoweFor example, the unspecified expression
383*bbf21555SRichard Lowe.Qq [[=e=]-f]
384*bbf21555SRichard Loweshould be given as
385*bbf21555SRichard Lowe.Qq [[=e=]e-f] .
386*bbf21555SRichard LoweThe ending range point must collate equal to or higher than the starting range
387*bbf21555SRichard Lowepoint; otherwise, the expression will be treated as invalid.
388*bbf21555SRichard LoweThe order used is the order in which the collating elements are specified in the
389*bbf21555SRichard Lowecurrent collation definition.
390*bbf21555SRichard LoweOne-to-many mappings
391*bbf21555SRichard Lowe.Po see
392*bbf21555SRichard Lowe.Xr locale 7
393*bbf21555SRichard Lowe.Pc
394*bbf21555SRichard Lowewill not be performed.
395*bbf21555SRichard LoweFor example, assuming that the character
396*bbf21555SRichard Lowe.Qq eszet
397*bbf21555SRichard Loweis placed in the collation sequence after
398*bbf21555SRichard Lowe.Qq r
399*bbf21555SRichard Loweand
400*bbf21555SRichard Lowe.Qq s ,
401*bbf21555SRichard Lowebut before
402*bbf21555SRichard Lowe.Qq t ,
403*bbf21555SRichard Loweand that it maps to the sequence
404*bbf21555SRichard Lowe.Qq ss
405*bbf21555SRichard Lowefor collation purposes, then the expression
406*bbf21555SRichard Lowe.Qq [r-s]
407*bbf21555SRichard Lowematches only
408*bbf21555SRichard Lowe.Qq r
409*bbf21555SRichard Loweand
410*bbf21555SRichard Lowe.Qq s ,
411*bbf21555SRichard Lowebut the expression
412*bbf21555SRichard Lowe.Qq [s-t]
413*bbf21555SRichard Lowematches
414*bbf21555SRichard Lowe.Qq s ,
415*bbf21555SRichard Lowe.Qq beta ,
416*bbf21555SRichard Loweor
417*bbf21555SRichard Lowe.Qq t .
418*bbf21555SRichard Lowe.Pp
419*bbf21555SRichard LoweThe interpretation of range expressions where the ending range point is also
420*bbf21555SRichard Lowethe starting range point of a subsequent range expression
421*bbf21555SRichard Lowe.Po for instance
422*bbf21555SRichard Lowe.Qq [a-m-o]
423*bbf21555SRichard Lowe.Pc
424*bbf21555SRichard Loweis undefined.
425*bbf21555SRichard Lowe.Pp
426*bbf21555SRichard LoweThe hyphen character will be treated as itself if it occurs first
427*bbf21555SRichard Lowe.Po after an initial
428*bbf21555SRichard Lowe.Qq ^ ,
429*bbf21555SRichard Loweif any
430*bbf21555SRichard Lowe.Pc
431*bbf21555SRichard Loweor last in the list, or as an ending range point in a range expression.
432*bbf21555SRichard LoweAs examples, the expressions
433*bbf21555SRichard Lowe.Qq [-ac]
434*bbf21555SRichard Loweand
435*bbf21555SRichard Lowe.Qq [ac-]
436*bbf21555SRichard Loweare equivalent and match any of the characters
437*bbf21555SRichard Lowe.Qq a ,
438*bbf21555SRichard Lowe.Qq c ,
439*bbf21555SRichard Loweor
440*bbf21555SRichard Lowe.Qq -;
441*bbf21555SRichard Lowe.Qq [^-ac]
442*bbf21555SRichard Loweand
443*bbf21555SRichard Lowe.Qq [^ac-]
444*bbf21555SRichard Loweare equivalent and match any characters except
445*bbf21555SRichard Lowe.Qq a ,
446*bbf21555SRichard Lowe.Qq c ,
447*bbf21555SRichard Loweor
448*bbf21555SRichard Lowe.Qq -;
449*bbf21555SRichard Lowethe expression
450*bbf21555SRichard Lowe.Qq [%--]
451*bbf21555SRichard Lowematches any of the characters between
452*bbf21555SRichard Lowe.Qq %
453*bbf21555SRichard Loweand
454*bbf21555SRichard Lowe.Qq -
455*bbf21555SRichard Loweinclusive; the expression
456*bbf21555SRichard Lowe.Qq [--@]
457*bbf21555SRichard Lowematches any of the characters between
458*bbf21555SRichard Lowe.Qq -
459*bbf21555SRichard Loweand
460*bbf21555SRichard Lowe.Qq @
461*bbf21555SRichard Loweinclusive; and the expression
462*bbf21555SRichard Lowe.Qq [a--@]
463*bbf21555SRichard Loweis invalid, because the letter
464*bbf21555SRichard Lowe.Qq a
465*bbf21555SRichard Lowefollows the symbol
466*bbf21555SRichard Lowe.Qq -
467*bbf21555SRichard Lowein the POSIX locale.
468*bbf21555SRichard LoweTo use a hyphen as the starting range point, it must either come first in the
469*bbf21555SRichard Lowebracket expression or be specified as a collating symbol, for example:
470*bbf21555SRichard Lowe.Qq [][.-.]-0] ,
471*bbf21555SRichard Lowewhich matches either a right bracket or any character or collating element that
472*bbf21555SRichard Lowecollates between hyphen and 0, inclusive.
473*bbf21555SRichard Lowe.Pp
474*bbf21555SRichard LoweIf a bracket expression must specify both
475*bbf21555SRichard Lowe.Qq -
476*bbf21555SRichard Loweand
477*bbf21555SRichard Lowe.Qq \&] ,
478*bbf21555SRichard Lowethe
479*bbf21555SRichard Lowe.Qq \&]
480*bbf21555SRichard Lowemust be placed first
481*bbf21555SRichard Lowe.Po after the
482*bbf21555SRichard Lowe.Qq ^ ,
483*bbf21555SRichard Loweif any
484*bbf21555SRichard Lowe.Pc
485*bbf21555SRichard Loweand the
486*bbf21555SRichard Lowe.Qq -
487*bbf21555SRichard Lowelast within the bracket expression.
488*bbf21555SRichard Lowe.El
489*bbf21555SRichard Lowe.Pp
490*bbf21555SRichard LoweNote: Latin-1 characters such as
491*bbf21555SRichard Lowe.Qq \(ga
492*bbf21555SRichard Loweor
493*bbf21555SRichard Lowe.Qq ^
494*bbf21555SRichard Loweare not printable in some locales, for example, the
495*bbf21555SRichard Lowe.Em ja
496*bbf21555SRichard Lowelocale.
497*bbf21555SRichard Lowe.Ss BREs Matching Multiple Characters
498*bbf21555SRichard LoweThe following rules can be used to construct BREs matching multiple characters
499*bbf21555SRichard Lowefrom BREs matching a single character:
500*bbf21555SRichard Lowe.Bl -enum
501*bbf21555SRichard Lowe.It
502*bbf21555SRichard LoweThe concatenation of BREs matches the concatenation of the strings matched
503*bbf21555SRichard Loweby each component of the BRE.
504*bbf21555SRichard Lowe.It
505*bbf21555SRichard LoweA
506*bbf21555SRichard Lowe.Em subexpression
507*bbf21555SRichard Lowecan be defined within a BRE by enclosing it between the character pairs
508*bbf21555SRichard Lowe.Qq \e(
509*bbf21555SRichard Loweand
510*bbf21555SRichard Lowe.Qq \e) .
511*bbf21555SRichard LoweSuch a subexpression matches whatever it would have matched without the
512*bbf21555SRichard Lowe.Qq \e(
513*bbf21555SRichard Loweand
514*bbf21555SRichard Lowe.Qq \e) ,
515*bbf21555SRichard Loweexcept that anchoring within subexpressions is optional behavior; see
516*bbf21555SRichard Lowe.Sx BRE Expression Anchoring ,
517*bbf21555SRichard Lowebelow.
518*bbf21555SRichard LoweSubexpressions can be arbitrarily nested.
519*bbf21555SRichard Lowe.It
520*bbf21555SRichard LoweThe
521*bbf21555SRichard Lowe.Em back-reference
522*bbf21555SRichard Loweexpression
523*bbf21555SRichard Lowe.Qq \e Ns Em n
524*bbf21555SRichard Lowematches the same
525*bbf21555SRichard Lowe.Pq possibly empty
526*bbf21555SRichard Lowestring of characters as was matched by a subexpression enclosed between
527*bbf21555SRichard Lowe.Qq \e(
528*bbf21555SRichard Loweand
529*bbf21555SRichard Lowe.Qq \e)
530*bbf21555SRichard Lowepreceding the
531*bbf21555SRichard Lowe.Qq \e Ns Em n .
532*bbf21555SRichard LoweThe character
533*bbf21555SRichard Lowe.Qq Em n
534*bbf21555SRichard Lowemust be a digit from 1 to 9 inclusive,
535*bbf21555SRichard Lowe.Em n Ns th
536*bbf21555SRichard Lowesubexpression
537*bbf21555SRichard Lowe.Po the one that begins with the
538*bbf21555SRichard Lowe.Em n Ns th
539*bbf21555SRichard Lowe.Qq \e(
540*bbf21555SRichard Loweand ends with the corresponding paired
541*bbf21555SRichard Lowe.Qq \e)
542*bbf21555SRichard Lowe.Pc .
543*bbf21555SRichard LoweThe expression is invalid if less than
544*bbf21555SRichard Lowe.Em n
545*bbf21555SRichard Lowesubexpressions precede the
546*bbf21555SRichard Lowe.Qq \e Ns Em n .
547*bbf21555SRichard LoweFor example, the expression
548*bbf21555SRichard Lowe.Qq ^\e(.*\e)\e1$
549*bbf21555SRichard Lowematches a line consisting of two adjacent appearances of the same string, and
550*bbf21555SRichard Lowethe expression
551*bbf21555SRichard Lowe.Qq \e(a\e)*\e1
552*bbf21555SRichard Lowefails to match
553*bbf21555SRichard Lowe.Qq a .
554*bbf21555SRichard LoweThe limit of nine back-references to subexpressions in the RE is based on the
555*bbf21555SRichard Loweuse of a single digit identifier.
556*bbf21555SRichard LoweThis does not imply that only nine subexpressions are allowed in REs.
557*bbf21555SRichard Lowe.It
558*bbf21555SRichard LoweWhen a BRE matching a single character, a subexpression or a back-reference is
559*bbf21555SRichard Lowefollowed by the special character asterisk
560*bbf21555SRichard Lowe.Pq Qq * ,
561*bbf21555SRichard Lowetogether with that asterisk it matches what zero or more consecutive occurrences
562*bbf21555SRichard Loweof the BRE would match.
563*bbf21555SRichard LoweFor example,
564*bbf21555SRichard Lowe.Qq [ab]*
565*bbf21555SRichard Loweand
566*bbf21555SRichard Lowe.Qq [ab][ab]
567*bbf21555SRichard Loweare equivalent when matching the string
568*bbf21555SRichard Lowe.Qq ab .
569*bbf21555SRichard Lowe.It
570*bbf21555SRichard LoweWhen a BRE matching a single character, a subexpression, or a back-reference
571*bbf21555SRichard Loweis followed by an
572*bbf21555SRichard Lowe.Em interval expression
573*bbf21555SRichard Loweof the format
574*bbf21555SRichard Lowe.Qq \e{ Ns Em m Ns \e} ,
575*bbf21555SRichard Lowe.Qq \e{ Ns Em m Ns ,\e}
576*bbf21555SRichard Loweor
577*bbf21555SRichard Lowe.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} ,
578*bbf21555SRichard Lowetogether with that interval expression it matches what repeated consecutive
579*bbf21555SRichard Loweoccurrences of the BRE would match.
580*bbf21555SRichard LoweThe values of
581*bbf21555SRichard Lowe.Em m
582*bbf21555SRichard Loweand
583*bbf21555SRichard Lowe.Em n
584*bbf21555SRichard Lowewill be decimal integers in the range 0 <=
585*bbf21555SRichard Lowe.Em m
586*bbf21555SRichard Lowe<=
587*bbf21555SRichard Lowe.Em n
588*bbf21555SRichard Lowe<=
589*bbf21555SRichard Lowe.Dv BRE_DUP_MAX ,
590*bbf21555SRichard Lowewhere
591*bbf21555SRichard Lowe.Em m
592*bbf21555SRichard Lowespecifies the exact or minimum number of occurrences and
593*bbf21555SRichard Lowe.Em n
594*bbf21555SRichard Lowespecifies the maximum number of occurrences.
595*bbf21555SRichard LoweThe expression
596*bbf21555SRichard Lowe.Qq \e{ Ns Em m Ns \e}
597*bbf21555SRichard Lowematches exactly
598*bbf21555SRichard Lowe.Em m
599*bbf21555SRichard Loweoccurrences of the preceding BRE,
600*bbf21555SRichard Lowe.Qq \e{ Ns Em m Ns ,\e}
601*bbf21555SRichard Lowematches at least
602*bbf21555SRichard Lowe.Em m
603*bbf21555SRichard Loweoccurrences and
604*bbf21555SRichard Lowe.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e}
605*bbf21555SRichard Lowematches any number of occurrences between
606*bbf21555SRichard Lowe.Em m
607*bbf21555SRichard Loweand
608*bbf21555SRichard Lowe.Em n ,
609*bbf21555SRichard Loweinclusive.
610*bbf21555SRichard Lowe.Pp
611*bbf21555SRichard LoweFor example, in the string
612*bbf21555SRichard Lowe.Qq abababccccccd ,
613*bbf21555SRichard Lowethe BRE
614*bbf21555SRichard Lowe.Qq c\e{3\e}
615*bbf21555SRichard Loweis matched by characters seven to nine, the BRE
616*bbf21555SRichard Lowe.Qq \e(ab\e)\e{4,\e}
617*bbf21555SRichard Loweis not matched at all and the BRE
618*bbf21555SRichard Lowe.Qq c\e{1,3\e}d
619*bbf21555SRichard Loweis matched by characters ten to thirteen.
620*bbf21555SRichard Lowe.El
621*bbf21555SRichard Lowe.Pp
622*bbf21555SRichard LoweThe behavior of multiple adjacent duplication symbols
623*bbf21555SRichard Lowe.Po Qq *
624*bbf21555SRichard Loweand intervals
625*bbf21555SRichard Lowe.Pc
626*bbf21555SRichard Loweproduces undefined results.
627*bbf21555SRichard Lowe.Ss BRE Precedence
628*bbf21555SRichard LoweThe order of precedence is as shown in the following table:
629*bbf21555SRichard Lowe.Bl -column "BRE Precedence (from high to low)" ""
630*bbf21555SRichard Lowe.It Sy BRE Precedence (from high to low) Ta
631*bbf21555SRichard Lowe.It collation-related bracket symbols Ta [= =]  [: :]  [. .]
632*bbf21555SRichard Lowe.It escaped characters Ta \e< Ns Em special character Ns >
633*bbf21555SRichard Lowe.It bracket expression Ta [ ]
634*bbf21555SRichard Lowe.It subexpressions/back-references Ta \e( \e) \e Ns Em n
635*bbf21555SRichard Lowe.It single-character-BRE duplication Ta * \e{ Ns Em m Ns \&, Ns Em n Ns \e}
636*bbf21555SRichard Lowe.It concatenation Ta
637*bbf21555SRichard Lowe.It anchoring Ta ^ $
638*bbf21555SRichard Lowe.El
639*bbf21555SRichard Lowe.Ss BRE Expression Anchoring
640*bbf21555SRichard LoweA BRE can be limited to matching strings that begin or end a line; this is
641*bbf21555SRichard Lowecalled
642*bbf21555SRichard Lowe.Em anchoring .
643*bbf21555SRichard LoweThe circumflex and dollar sign special characters will be considered BRE anchors
644*bbf21555SRichard Lowein the following contexts:
645*bbf21555SRichard Lowe.Bl -enum
646*bbf21555SRichard Lowe.It
647*bbf21555SRichard LoweA circumflex
648*bbf21555SRichard Lowe.Pq Qq ^
649*bbf21555SRichard Loweis an anchor when used as the first character of an entire BRE.
650*bbf21555SRichard LoweThe implementation may treat circumflex as an anchor when used as the first
651*bbf21555SRichard Lowecharacter of a subexpression.
652*bbf21555SRichard LoweThe circumflex will anchor the expression to the beginning of a string;
653*bbf21555SRichard Loweonly sequences starting at the first character of a string will be matched by
654*bbf21555SRichard Lowethe BRE.
655*bbf21555SRichard LoweFor example, the BRE
656*bbf21555SRichard Lowe.Qq ^ab
657*bbf21555SRichard Lowematches
658*bbf21555SRichard Lowe.Qq ab
659*bbf21555SRichard Lowein the string
660*bbf21555SRichard Lowe.Qq abcdef ,
661*bbf21555SRichard Lowebut fails to match in the string
662*bbf21555SRichard Lowe.Qq cdefab .
663*bbf21555SRichard LoweA portable BRE must escape a leading circumflex in a subexpression to match a
664*bbf21555SRichard Loweliteral circumflex.
665*bbf21555SRichard Lowe.It
666*bbf21555SRichard LoweA dollar sign
667*bbf21555SRichard Lowe.Pq Qq $
668*bbf21555SRichard Loweis an anchor when used as the last character of an entire BRE.
669*bbf21555SRichard LoweThe implementation may treat a dollar sign as an anchor when used as the last
670*bbf21555SRichard Lowecharacter of a subexpression.
671*bbf21555SRichard LoweThe dollar sign will anchor the expression to the end of the string being
672*bbf21555SRichard Lowematched; the dollar sign can be said to match the end-of-string following the
673*bbf21555SRichard Lowelast character.
674*bbf21555SRichard Lowe.It
675*bbf21555SRichard LoweA BRE anchored by both
676*bbf21555SRichard Lowe.Qq ^
677*bbf21555SRichard Loweand
678*bbf21555SRichard Lowe.Qq $
679*bbf21555SRichard Lowematches only an entire string.
680*bbf21555SRichard LoweFor example, the BRE
681*bbf21555SRichard Lowe^abcdef$
682*bbf21555SRichard Lowematches strings consisting only of
683*bbf21555SRichard Lowe.Qq abcdef .
684*bbf21555SRichard Lowe.It
685*bbf21555SRichard Lowe.Qq ^
686*bbf21555SRichard Loweand
687*bbf21555SRichard Lowe.Qq $
688*bbf21555SRichard Loweare not special in subexpressions.
689*bbf21555SRichard Lowe.El
690*bbf21555SRichard Lowe.Pp
691*bbf21555SRichard LoweNote: The Solaris implementation does not support anchoring in BRE
692*bbf21555SRichard Lowesubexpressions.
693*bbf21555SRichard Lowe.Sh EXTENDED REGULAR EXPRESSIONS
694*bbf21555SRichard LoweThe rules specified for BREs apply to Extended Regular Expressions
695*bbf21555SRichard Lowe.Pq EREs
696*bbf21555SRichard Lowewith the following exceptions:
697*bbf21555SRichard Lowe.Bl -bullet
698*bbf21555SRichard Lowe.It
699*bbf21555SRichard LoweThe characters
700*bbf21555SRichard Lowe.Qq | ,
701*bbf21555SRichard Lowe.Qq + ,
702*bbf21555SRichard Loweand
703*bbf21555SRichard Lowe.Qq \&?
704*bbf21555SRichard Lowehave special meaning, as defined below.
705*bbf21555SRichard Lowe.It
706*bbf21555SRichard LoweThe
707*bbf21555SRichard Lowe.Qq {
708*bbf21555SRichard Loweand
709*bbf21555SRichard Lowe.Qq }
710*bbf21555SRichard Lowecharacters, when used as the duplication operator, are not preceded by
711*bbf21555SRichard Lowebackslashes.
712*bbf21555SRichard LoweThe constructs
713*bbf21555SRichard Lowe.Qq \e{
714*bbf21555SRichard Loweand
715*bbf21555SRichard Lowe.Qq \e}
716*bbf21555SRichard Lowesimply match the characters
717*bbf21555SRichard Lowe.Qq {
718*bbf21555SRichard Loweand
719*bbf21555SRichard Lowe.Qq }, respectively.
720*bbf21555SRichard Lowe.It
721*bbf21555SRichard LoweThe back reference operator is not supported.
722*bbf21555SRichard Lowe.It
723*bbf21555SRichard LoweAnchoring
724*bbf21555SRichard Lowe.Pq Qq ^$
725*bbf21555SRichard Loweis supported in subexpressions.
726*bbf21555SRichard Lowe.El
727*bbf21555SRichard Lowe.Ss EREs Matching a Single Character
728*bbf21555SRichard LoweAn ERE ordinary character, a special character preceded by a backslash, or a
729*bbf21555SRichard Loweperiod matches a single character.
730*bbf21555SRichard LoweA bracket expression matches a single character or a single collating element.
731*bbf21555SRichard LoweAn
732*bbf21555SRichard Lowe.Em ERE matching a single character
733*bbf21555SRichard Loweenclosed in parentheses matches the same as the ERE without parentheses would
734*bbf21555SRichard Lowehave matched.
735*bbf21555SRichard Lowe.Ss ERE Ordinary Characters
736*bbf21555SRichard LoweAn
737*bbf21555SRichard Lowe.Em ordinary character
738*bbf21555SRichard Loweis an ERE that matches itself.
739*bbf21555SRichard LoweAn ordinary character is any character in the supported character set, except
740*bbf21555SRichard Lowefor the ERE special characters listed in
741*bbf21555SRichard Lowe.Sx ERE Special Characters
742*bbf21555SRichard Lowebelow.
743*bbf21555SRichard LoweThe interpretation of an ordinary character preceded by a backslash
744*bbf21555SRichard Lowe.Pq Qq \&\e
745*bbf21555SRichard Loweis undefined.
746*bbf21555SRichard Lowe.Ss ERE Special Characters
747*bbf21555SRichard LoweAn
748*bbf21555SRichard Lowe.Em ERE special character
749*bbf21555SRichard Lowehas special properties in certain contexts.
750*bbf21555SRichard LoweOutside those contexts, or when preceded by a backslash, such a character is an
751*bbf21555SRichard LoweERE that matches the special character itself.
752*bbf21555SRichard LoweThe extended regular expression special characters and the contexts in which
753*bbf21555SRichard Lowethey have their special meaning are:
754*bbf21555SRichard Lowe.Bl -tag -width Ds
755*bbf21555SRichard Lowe.It Sy \&. \&[ \&\e \&(
756*bbf21555SRichard LoweThe period, left-bracket, backslash, and left-parenthesis are special except
757*bbf21555SRichard Lowewhen used in a bracket expression
758*bbf21555SRichard Lowe.Po see
759*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
760*bbf21555SRichard Loweabove
761*bbf21555SRichard Lowe.Pc .
762*bbf21555SRichard LoweOutside a bracket expression, a left-parenthesis immediately followed by a
763*bbf21555SRichard Loweright-parenthesis produces undefined results.
764*bbf21555SRichard Lowe.It Sy \&)
765*bbf21555SRichard LoweThe right-parenthesis is special when matched with a preceding
766*bbf21555SRichard Loweleft-parenthesis, both outside a bracket expression.
767*bbf21555SRichard Lowe.It Sy * + \&? {
768*bbf21555SRichard LoweThe asterisk, plus-sign, question-mark, and left-brace are special except when
769*bbf21555SRichard Loweused in a bracket expression
770*bbf21555SRichard Lowe.Po see
771*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
772*bbf21555SRichard Loweabove
773*bbf21555SRichard Lowe.Pc .
774*bbf21555SRichard LoweAny of the following uses produce undefined results:
775*bbf21555SRichard Lowe.Bl -bullet
776*bbf21555SRichard Lowe.It
777*bbf21555SRichard Loweif these characters appear first in an ERE, or immediately following a
778*bbf21555SRichard Lowevertical-line, circumflex or left-parenthesis
779*bbf21555SRichard Lowe.It
780*bbf21555SRichard Loweif a left-brace is not part of a valid interval expression.
781*bbf21555SRichard Lowe.El
782*bbf21555SRichard Lowe.It Sy \&|
783*bbf21555SRichard LoweThe vertical-line is special except when used in a bracket expression
784*bbf21555SRichard Lowe.Po see
785*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
786*bbf21555SRichard Loweabove
787*bbf21555SRichard Lowe.Pc .
788*bbf21555SRichard LoweA vertical-line appearing first or last in an ERE, or immediately following a
789*bbf21555SRichard Lowevertical-line or a left-parenthesis, or immediately preceding a
790*bbf21555SRichard Loweright-parenthesis, produces undefined results.
791*bbf21555SRichard Lowe.It Sy ^
792*bbf21555SRichard LoweThe circumflex is special when used:
793*bbf21555SRichard Lowe.Bl -bullet
794*bbf21555SRichard Lowe.It
795*bbf21555SRichard Loweas an anchor
796*bbf21555SRichard Lowe.Po see
797*bbf21555SRichard Lowe.Sx ERE Expression Anchoring ,
798*bbf21555SRichard Lowebelow
799*bbf21555SRichard Lowe.Pc .
800*bbf21555SRichard Lowe.It
801*bbf21555SRichard Loweas the first character of a bracket expression
802*bbf21555SRichard Lowe.Po see
803*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
804*bbf21555SRichard Loweabove
805*bbf21555SRichard Lowe.Pc .
806*bbf21555SRichard Lowe.El
807*bbf21555SRichard Lowe.It Sy $
808*bbf21555SRichard LoweThe dollar sign is special when used as an anchor.
809*bbf21555SRichard Lowe.El
810*bbf21555SRichard Lowe.Ss Periods in EREs
811*bbf21555SRichard LoweA period
812*bbf21555SRichard Lowe.Pq Qq \&. ,
813*bbf21555SRichard Lowewhen used outside a bracket expression, is an ERE that matches any character in
814*bbf21555SRichard Lowethe supported character set except NUL.
815*bbf21555SRichard Lowe.Ss ERE Bracket Expression
816*bbf21555SRichard LoweThe rules for ERE Bracket Expressions are the same as for Basic Regular
817*bbf21555SRichard LoweExpressions; see
818*bbf21555SRichard Lowe.Sx RE Bracket Expression ,
819*bbf21555SRichard Loweabove.
820*bbf21555SRichard Lowe.Ss EREs Matching Multiple Characters
821*bbf21555SRichard LoweThe following rules will be used to construct EREs matching multiple characters
822*bbf21555SRichard Lowefrom EREs matching a single character:
823*bbf21555SRichard Lowe.Bl -enum
824*bbf21555SRichard Lowe.It
825*bbf21555SRichard LoweA
826*bbf21555SRichard Lowe.Em concatenation of EREs
827*bbf21555SRichard Lowematches the concatenation of the character sequences matched by each component
828*bbf21555SRichard Loweof the ERE.
829*bbf21555SRichard LoweA concatenation of EREs enclosed in parentheses matches whatever the
830*bbf21555SRichard Loweconcatenation without the parentheses matches.
831*bbf21555SRichard LoweFor example, both the ERE
832*bbf21555SRichard Lowe.Qq cd
833*bbf21555SRichard Loweand the ERE
834*bbf21555SRichard Lowe.Qq (cd)
835*bbf21555SRichard Loweare matched by the third and fourth character of the string
836*bbf21555SRichard Lowe.Qq abcdefabcdef .
837*bbf21555SRichard Lowe.It
838*bbf21555SRichard LoweWhen an ERE matching a single character or an ERE enclosed in parentheses is
839*bbf21555SRichard Lowefollowed by the special character plus-sign
840*bbf21555SRichard Lowe.Pq Qq + ,
841*bbf21555SRichard Lowetogether with that plus-sign it matches what one or more consecutive occurrences
842*bbf21555SRichard Loweof the ERE would match.
843*bbf21555SRichard LoweFor example, the ERE
844*bbf21555SRichard Lowe.Qq b+(bc)
845*bbf21555SRichard Lowematches the fourth to seventh characters in the string
846*bbf21555SRichard Lowe.Qq acabbbcde ;
847*bbf21555SRichard Lowe.Qq [ab]+
848*bbf21555SRichard Loweand
849*bbf21555SRichard Lowe.Qq [ab][ab]*
850*bbf21555SRichard Loweare equivalent.
851*bbf21555SRichard Lowe.It
852*bbf21555SRichard LoweWhen an ERE matching a single character or an ERE enclosed in parentheses is
853*bbf21555SRichard Lowefollowed by the special character asterisk
854*bbf21555SRichard Lowe.Pq Qq * ,
855*bbf21555SRichard Lowetogether with that asterisk it matches what zero or more consecutive occurrences
856*bbf21555SRichard Loweof the ERE would match.
857*bbf21555SRichard LoweFor example, the ERE
858*bbf21555SRichard Lowe.Qq b*c
859*bbf21555SRichard Lowematches the first character in the string
860*bbf21555SRichard Lowe.Qq cabbbcde ,
861*bbf21555SRichard Loweand the ERE
862*bbf21555SRichard Lowe.Qq b*cd
863*bbf21555SRichard Lowematches the third to seventh characters in the string
864*bbf21555SRichard Lowe.Qq cabbbcdebbbbbbcdbc .
865*bbf21555SRichard LoweAnd,
866*bbf21555SRichard Lowe.Qq [ab]*
867*bbf21555SRichard Loweand
868*bbf21555SRichard Lowe.Qq [ab][ab]
869*bbf21555SRichard Loweare equivalent when matching the string
870*bbf21555SRichard Lowe.Qq ab .
871*bbf21555SRichard Lowe.It
872*bbf21555SRichard LoweWhen an ERE matching a single character or an ERE enclosed in parentheses is
873*bbf21555SRichard Lowefollowed by the special character question-mark
874*bbf21555SRichard Lowe.Pq Qq \&? ,
875*bbf21555SRichard Lowetogether with that question-mark it matches what zero or one consecutive
876*bbf21555SRichard Loweoccurrences of the ERE would match.
877*bbf21555SRichard LoweFor example, the ERE
878*bbf21555SRichard Lowe.Qq b?c
879*bbf21555SRichard Lowematches the second character in the string
880*bbf21555SRichard Lowe.Qq acabbbcde .
881*bbf21555SRichard Lowe.It
882*bbf21555SRichard LoweWhen an ERE matching a single character or an ERE enclosed in parentheses is
883*bbf21555SRichard Lowefollowed by an
884*bbf21555SRichard Lowe.Em interval expression
885*bbf21555SRichard Loweof the format
886*bbf21555SRichard Lowe.Qq { Ns Em m Ns } ,
887*bbf21555SRichard Lowe.Qq { Ns Em m Ns ,}
888*bbf21555SRichard Loweor
889*bbf21555SRichard Lowe.Qq { Ns Em m Ns \&, Ns Em n Ns } ,
890*bbf21555SRichard Lowetogether with that interval expression it matches what repeated consecutive
891*bbf21555SRichard Loweoccurrences of the ERE would match.
892*bbf21555SRichard LoweThe values of
893*bbf21555SRichard Lowe.Em m
894*bbf21555SRichard Loweand
895*bbf21555SRichard Lowe.Em n
896*bbf21555SRichard Lowewill be decimal integers in the range 0 <=
897*bbf21555SRichard Lowe.Em m
898*bbf21555SRichard Lowe<=
899*bbf21555SRichard Lowe.Em n
900*bbf21555SRichard Lowe<=
901*bbf21555SRichard Lowe.Dv RE_DUP_MAX ,
902*bbf21555SRichard Lowewhere
903*bbf21555SRichard Lowe.Em m
904*bbf21555SRichard Lowespecifies the exact or minimum number of occurrences and
905*bbf21555SRichard Lowe.Em n
906*bbf21555SRichard Lowespecifies the maximum number of occurrences.
907*bbf21555SRichard LoweThe expression
908*bbf21555SRichard Lowe.Qq { Ns Em m Ns }
909*bbf21555SRichard Lowematches exactly
910*bbf21555SRichard Lowe.Em m
911*bbf21555SRichard Loweoccurrences of the preceding ERE,
912*bbf21555SRichard Lowe.Qq { Ns Em m Ns ,}
913*bbf21555SRichard Lowematches at least
914*bbf21555SRichard Lowe.Em m
915*bbf21555SRichard Loweoccurrences and
916*bbf21555SRichard Lowe.Qq { Ns m Ns \&, Ns Em n Ns }
917*bbf21555SRichard Lowematches any number of occurrences between
918*bbf21555SRichard Lowe.Em m
919*bbf21555SRichard Loweand
920*bbf21555SRichard Lowe.Em n ,
921*bbf21555SRichard Loweinclusive.
922*bbf21555SRichard Lowe.El
923*bbf21555SRichard Lowe.Pp
924*bbf21555SRichard LoweFor example, in the string
925*bbf21555SRichard Lowe.Qq abababccccccd
926*bbf21555SRichard Lowethe ERE
927*bbf21555SRichard Lowe.Qq c{3}
928*bbf21555SRichard Loweis matched by characters seven to nine and the ERE
929*bbf21555SRichard Lowe.Qq (ab){2,}
930*bbf21555SRichard Loweis matched by characters one to six.
931*bbf21555SRichard Lowe.Pp
932*bbf21555SRichard LoweThe behavior of multiple adjacent duplication symbols
933*bbf21555SRichard Lowe.Po
934*bbf21555SRichard Lowe.Qq + ,
935*bbf21555SRichard Lowe.Qq * ,
936*bbf21555SRichard Lowe.Qq \&?
937*bbf21555SRichard Loweand intervals
938*bbf21555SRichard Lowe.Pc
939*bbf21555SRichard Loweproduces undefined results.
940*bbf21555SRichard Lowe.Ss ERE Alternation
941*bbf21555SRichard LoweTwo EREs separated by the special character vertical-line
942*bbf21555SRichard Lowe.Pq Qq |
943*bbf21555SRichard Lowematch a string that is matched by either.
944*bbf21555SRichard LoweFor example, the ERE
945*bbf21555SRichard Lowe.Qq a((bc)|d)
946*bbf21555SRichard Lowematches the string
947*bbf21555SRichard Lowe.Qq abc
948*bbf21555SRichard Loweand the string
949*bbf21555SRichard Lowe.Qq ad .
950*bbf21555SRichard LoweSingle characters, or expressions matching single characters, separated by the
951*bbf21555SRichard Lowevertical bar and enclosed in parentheses, will be treated as an ERE matching a
952*bbf21555SRichard Lowesingle character.
953*bbf21555SRichard Lowe.Ss ERE Precedence
954*bbf21555SRichard LoweThe order of precedence will be as shown in the following table:
955*bbf21555SRichard Lowe.Bl -column "ERE Precedence (from high to low)" ""
956*bbf21555SRichard Lowe.It Sy ERE Precedence (from high to low) Ta
957*bbf21555SRichard Lowe.It collation-related bracket symbols Ta [= =]  [: :]  [. .]
958*bbf21555SRichard Lowe.It escaped characters Ta \e< Ns Em special character Ns >
959*bbf21555SRichard Lowe.It bracket expression Ta \&[ \&]
960*bbf21555SRichard Lowe.It grouping Ta \&( \&)
961*bbf21555SRichard Lowe.It single-character-ERE duplication Ta * + \&? { Ns Em m Ns \&, Ns Em n Ns}
962*bbf21555SRichard Lowe.It concatenation Ta
963*bbf21555SRichard Lowe.It anchoring Ta ^  $
964*bbf21555SRichard Lowe.It alternation Ta |
965*bbf21555SRichard Lowe.El
966*bbf21555SRichard Lowe.Pp
967*bbf21555SRichard LoweFor example, the ERE
968*bbf21555SRichard Lowe.Qq abba|cde
969*bbf21555SRichard Lowematches either the string
970*bbf21555SRichard Lowe.Qq abba
971*bbf21555SRichard Loweor the string
972*bbf21555SRichard Lowe.Qq cde
973*bbf21555SRichard Lowe.Po rather than the string
974*bbf21555SRichard Lowe.Qq abbade
975*bbf21555SRichard Loweor
976*bbf21555SRichard Lowe.Qq abbcde ,
977*bbf21555SRichard Lowebecause concatenation has a higher order of precedence than alternation
978*bbf21555SRichard Lowe.Pc .
979*bbf21555SRichard Lowe.Ss ERE Expression Anchoring
980*bbf21555SRichard LoweAn ERE can be limited to matching strings that begin or end a line; this is
981*bbf21555SRichard Lowecalled
982*bbf21555SRichard Lowe.Em anchoring .
983*bbf21555SRichard LoweThe circumflex and dollar sign special characters are considered ERE anchors
984*bbf21555SRichard Lowewhen used anywhere outside a bracket expression.
985*bbf21555SRichard LoweThis has the following effects:
986*bbf21555SRichard Lowe.Bl -enum
987*bbf21555SRichard Lowe.It
988*bbf21555SRichard LoweA circumflex
989*bbf21555SRichard Lowe.Pq Qq ^
990*bbf21555SRichard Loweoutside a bracket expression anchors the expression or subexpression it begins
991*bbf21555SRichard Loweto the beginning of a string; such an expression or subexpression can match only
992*bbf21555SRichard Lowea sequence starting at the first character of a string.
993*bbf21555SRichard LoweFor example, the EREs
994*bbf21555SRichard Lowe.Qq ^ab
995*bbf21555SRichard Loweand
996*bbf21555SRichard Lowe.Qq (^ab)
997*bbf21555SRichard Lowematch
998*bbf21555SRichard Lowe.Qq ab
999*bbf21555SRichard Lowein the string
1000*bbf21555SRichard Lowe.Qq abcdef ,
1001*bbf21555SRichard Lowebut fail to match in the string
1002*bbf21555SRichard Lowe.Qq cdefab ,
1003*bbf21555SRichard Loweand the ERE
1004*bbf21555SRichard Lowe.Qq a^b
1005*bbf21555SRichard Loweis valid, but can never match because the
1006*bbf21555SRichard Lowe.Qq a
1007*bbf21555SRichard Loweprevents the expression
1008*bbf21555SRichard Lowe.Qq ^b
1009*bbf21555SRichard Lowefrom matching starting at the first character.
1010*bbf21555SRichard Lowe.It
1011*bbf21555SRichard LoweA dollar sign
1012*bbf21555SRichard Lowe.Pq Qq $
1013*bbf21555SRichard Loweoutside a bracket expression anchors the expression or subexpression it ends to
1014*bbf21555SRichard Lowethe end of a string; such an expression or subexpression can match only a
1015*bbf21555SRichard Lowesequence ending at the last character of a string.
1016*bbf21555SRichard LoweFor example, the EREs
1017*bbf21555SRichard Lowe.Qq ef$
1018*bbf21555SRichard Loweand
1019*bbf21555SRichard Lowe.Qq (ef$)
1020*bbf21555SRichard Lowematch
1021*bbf21555SRichard Lowe.Qq ef
1022*bbf21555SRichard Lowein the string
1023*bbf21555SRichard Lowe.Qq abcdef ,
1024*bbf21555SRichard Lowebut fail to match in the string
1025*bbf21555SRichard Lowe.Qq cdefab ,
1026*bbf21555SRichard Loweand the ERE
1027*bbf21555SRichard Lowe.Qq e$f
1028*bbf21555SRichard Loweis valid, but can never match because the
1029*bbf21555SRichard Lowe.Qq f
1030*bbf21555SRichard Loweprevents the expression
1031*bbf21555SRichard Lowe.Qq e$
1032*bbf21555SRichard Lowefrom matching ending at the last character.
1033*bbf21555SRichard Lowe.El
1034*bbf21555SRichard Lowe.Sh SEE ALSO
1035*bbf21555SRichard Lowe.Xr localedef 1 ,
1036*bbf21555SRichard Lowe.Xr regcomp 3C ,
1037*bbf21555SRichard Lowe.Xr attributes 7 ,
1038*bbf21555SRichard Lowe.Xr environ 7 ,
1039*bbf21555SRichard Lowe.Xr locale 7 ,
1040*bbf21555SRichard Lowe.Xr regexp 7
1041