xref: /netbsd-src/external/bsd/tre/dist/doc/tre-syntax.html (revision 63d4abf06d37aace2f9e41a494102a64fe3abddb)
1*63d4abf0Sagc<h1>TRE Regexp Syntax</h1>
2*63d4abf0Sagc
3*63d4abf0Sagc<p>
4*63d4abf0SagcThis document describes the POSIX 1003.2 extended RE (ERE) syntax and
5*63d4abf0Sagcthe basic RE (BRE) syntax as implemented by TRE, and the TRE extensions
6*63d4abf0Sagcto the ERE syntax.  A simple Extended Backus-Naur Form (EBNF) style
7*63d4abf0Sagcnotation is used to describe the grammar.
8*63d4abf0Sagc</p>
9*63d4abf0Sagc
10*63d4abf0Sagc<h2>ERE Syntax</h2>
11*63d4abf0Sagc
12*63d4abf0Sagc<h3>Alternation operator</h3>
13*63d4abf0Sagc<a name="alternation"></a>
14*63d4abf0Sagc<a name="extended-regexp"></a>
15*63d4abf0Sagc
16*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
17*63d4abf0Sagc<tr><td>
18*63d4abf0Sagc<pre>
19*63d4abf0Sagc<i>extended-regexp</i> ::= <a href="#branch"><i>branch</i></a>
20*63d4abf0Sagc                |   <i>extended-regexp</i> <b>"|"</b> <a href="#branch"><i>branch</i></a>
21*63d4abf0Sagc</pre>
22*63d4abf0Sagc</td></tr>
23*63d4abf0Sagc</table>
24*63d4abf0Sagc<p>
25*63d4abf0SagcAn extended regexp (ERE) is one or more <i>branches</i>, separated by
26*63d4abf0Sagc<tt>|</tt>.  An ERE matches anything that matches one or more of the
27*63d4abf0Sagcbranches.
28*63d4abf0Sagc</p>
29*63d4abf0Sagc
30*63d4abf0Sagc<h3>Catenation of REs</h3>
31*63d4abf0Sagc<a name="catenation"></a>
32*63d4abf0Sagc<a name="branch"></a>
33*63d4abf0Sagc
34*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
35*63d4abf0Sagc<tr><td>
36*63d4abf0Sagc<pre>
37*63d4abf0Sagc<i>branch</i> ::= <i>piece</i>
38*63d4abf0Sagc       |   <i>branch</i> <i>piece</i>
39*63d4abf0Sagc</pre>
40*63d4abf0Sagc</td></tr>
41*63d4abf0Sagc</table>
42*63d4abf0Sagc<p>
43*63d4abf0SagcA branch is one or more <i>pieces</i> concatenated.  It matches a
44*63d4abf0Sagcmatch for the first piece, followed by a match for the second piece,
45*63d4abf0Sagcand so on.
46*63d4abf0Sagc</p>
47*63d4abf0Sagc
48*63d4abf0Sagc
49*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
50*63d4abf0Sagc<tr><td>
51*63d4abf0Sagc<pre>
52*63d4abf0Sagc<i>piece</i> ::= <i>atom</i>
53*63d4abf0Sagc      |   <i>atom</i> <a href="#repeat-operator"><i>repeat-operator</i></a>
54*63d4abf0Sagc      |   <i>atom</i> <a href="#approx-settings"><i>approx-settings</i></a>
55*63d4abf0Sagc</pre>
56*63d4abf0Sagc</td></tr>
57*63d4abf0Sagc</table>
58*63d4abf0Sagc<p>
59*63d4abf0SagcA piece is an <i>atom</i> possibly followed by a repeat operator or an
60*63d4abf0Sagcexpression controlling approximate matching parameters for the <i>atom</i>.
61*63d4abf0Sagc</p>
62*63d4abf0Sagc
63*63d4abf0Sagc
64*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
65*63d4abf0Sagc<tr><td>
66*63d4abf0Sagc<pre>
67*63d4abf0Sagc<i>atom</i> ::= <b>"("</b> <i>extended-regexp</i> <b>")"</b>
68*63d4abf0Sagc     |   <a href="#bracket-expression"><i>bracket-expression</i></a>
69*63d4abf0Sagc     |   <b>"."</b>
70*63d4abf0Sagc     |   <a href="#assertion"><i>assertion</i></a>
71*63d4abf0Sagc     |   <a href="#literal"><i>literal</i></a>
72*63d4abf0Sagc     |   <a href="#backref"><i>back-reference</i></a>
73*63d4abf0Sagc     |   <b>"(?#"</b> <i>comment-text</i> <b>")"</b>
74*63d4abf0Sagc     |   <b>"(?"</b> <a href="#options"><i>options</i></a> <b>")"</b> <i>extended-regexp</i>
75*63d4abf0Sagc     |   <b>"(?"</b> <a href="#options"><i>options</i></a> <b>":"</b> <i>extended-regexp</i> <b>")"</b>
76*63d4abf0Sagc</pre>
77*63d4abf0Sagc</td></tr>
78*63d4abf0Sagc</table>
79*63d4abf0Sagc<p>
80*63d4abf0SagcAn atom is either an ERE enclosed in parenthesis, a bracket
81*63d4abf0Sagcexpression, a <tt>.</tt> (period), an assertion, or a literal.
82*63d4abf0Sagc</p>
83*63d4abf0Sagc
84*63d4abf0Sagc<p>
85*63d4abf0SagcThe dot (<tt>.</tt>) matches any single character.
86*63d4abf0SagcIf the <code>REG_NEWLINE</code> compilation flag (see <a
87*63d4abf0Sagchref="api.html">API manual</a>) is specified, the newline
88*63d4abf0Sagccharacter is not matched.
89*63d4abf0Sagc</p>
90*63d4abf0Sagc
91*63d4abf0Sagc<p>
92*63d4abf0Sagc<tt>Comment-text</tt> can contain any characters except for a closing parenthesis <tt>)</tt>. The text in the comment is
93*63d4abf0Sagccompletely ignored by the regex parser and it used solely for readability purposes.
94*63d4abf0Sagc</p>
95*63d4abf0Sagc
96*63d4abf0Sagc<h3>Repeat operators</h3>
97*63d4abf0Sagc<a name="repeat-operator"></a>
98*63d4abf0Sagc
99*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
100*63d4abf0Sagc<tr><td>
101*63d4abf0Sagc<pre>
102*63d4abf0Sagc<i>repeat-operator</i> ::= <b>"*"</b>
103*63d4abf0Sagc                |   <b>"+"</b>
104*63d4abf0Sagc                |   <b>"?"</b>
105*63d4abf0Sagc                |   <i>bound</i>
106*63d4abf0Sagc                |   <b>"*?"</b>
107*63d4abf0Sagc                |   <b>"+?"</b>
108*63d4abf0Sagc                |   <b>"??"</b>
109*63d4abf0Sagc                |   <i>bound</i> <b>?</b>
110*63d4abf0Sagc</pre>
111*63d4abf0Sagc</td></tr>
112*63d4abf0Sagc</table>
113*63d4abf0Sagc
114*63d4abf0Sagc<p>
115*63d4abf0SagcAn atom followed by <tt>*</tt> matches a sequence of 0 or more matches
116*63d4abf0Sagcof the atom.  <tt>+</tt> is similar to <tt>*</tt>, matching a sequence
117*63d4abf0Sagcof 1 or more matches of the atom.  An atom followed by <tt>?</tt>
118*63d4abf0Sagcmatches a sequence of 0 or 1 matches of the atom.
119*63d4abf0Sagc</p>
120*63d4abf0Sagc
121*63d4abf0Sagc<p>
122*63d4abf0SagcA <i>bound</i> is one of the following, where <i>m</i> and <i>m</i>
123*63d4abf0Sagcare unsigned decimal integers between <tt>0</tt> and
124*63d4abf0Sagc<tt>RE_DUP_MAX</tt>:
125*63d4abf0Sagc</p>
126*63d4abf0Sagc
127*63d4abf0Sagc<ol>
128*63d4abf0Sagc<li><tt>{</tt><i>m</i><tt>,</tt><i>n</i><tt>}</tt></li>
129*63d4abf0Sagc<li><tt>{</tt><i>m</i><tt>,}</tt></li>
130*63d4abf0Sagc<li><tt>{</tt><i>m</i><tt>}</tt></li>
131*63d4abf0Sagc</ol>
132*63d4abf0Sagc
133*63d4abf0Sagc<p>
134*63d4abf0SagcAn atom followed by [1] matches a sequence of <i>m</i> through <i>n</i>
135*63d4abf0Sagc(inclusive) matches of the atom.  An atom followed by [2]
136*63d4abf0Sagcmatches a sequence of <i>m</i> or more matches of the atom.  An atom
137*63d4abf0Sagcfollowed by [3] matches a sequence of exactly <i>m</i> matches of the
138*63d4abf0Sagcatom.
139*63d4abf0Sagc</p>
140*63d4abf0Sagc
141*63d4abf0Sagc
142*63d4abf0Sagc<p>
143*63d4abf0SagcAdding a <tt>?</tt> to a repeat operator makes the subexpression minimal, or
144*63d4abf0Sagcnon-greedy.  Normally a repeated expression is greedy, that is, it matches as
145*63d4abf0Sagcmany characters as possible.  A non-greedy subexpression matches as few
146*63d4abf0Sagccharacters as possible.  Note that this does not (always) mean the same thing
147*63d4abf0Sagcas matching as many or few repetitions as possible.  Also note
148*63d4abf0Sagcthat <strong>minimal repetitions are not currently supported for approximate
149*63d4abf0Sagcmatching</strong>.
150*63d4abf0Sagc</p>
151*63d4abf0Sagc
152*63d4abf0Sagc<h3>Approximate matching settings</h3>
153*63d4abf0Sagc<a name="approx-settings"></a>
154*63d4abf0Sagc
155*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
156*63d4abf0Sagc<tr><td>
157*63d4abf0Sagc<pre>
158*63d4abf0Sagc<i>approx-settings</i> ::= <b>"{"</b> <i>count-limits</i>* <b>","</b>? <i>cost-equation</i>? <b>"}"</b>
159*63d4abf0Sagc
160*63d4abf0Sagc<i>count-limits</i> ::= <b>"+"</b> <i>number</i>?
161*63d4abf0Sagc             |   <b>"-"</b> <i>number</i>?
162*63d4abf0Sagc             |   <b>"#"</b> <i>number</i>?
163*63d4abf0Sagc             |   <b>"~"</b> <i>number</i>?
164*63d4abf0Sagc
165*63d4abf0Sagc<i>cost-equation</i> ::= ( <i>cost-term</i> "+"? " "? )+ <b>"&lt;"</b> <i>number</i>
166*63d4abf0Sagc
167*63d4abf0Sagc<i>cost-term</i> ::= <i>number</i> <b>"i"</b>
168*63d4abf0Sagc          |   <i>number</i> <b>"d"</b>
169*63d4abf0Sagc          |   <i>number</i> <b>"s"</b>
170*63d4abf0Sagc
171*63d4abf0Sagc</pre>
172*63d4abf0Sagc</td></tr>
173*63d4abf0Sagc</table>
174*63d4abf0Sagc
175*63d4abf0Sagc<p>
176*63d4abf0SagcThe approximate matching settings for a subpattern can be changed
177*63d4abf0Sagcby appending <i>approx-settings</i> to the subpattern.  Limits for
178*63d4abf0Sagcthe number of errors can be set and an expression for specifying and
179*63d4abf0Sagclimiting the costs can be given.
180*63d4abf0Sagc</p>
181*63d4abf0Sagc
182*63d4abf0Sagc<p>
183*63d4abf0SagcThe <i>count-limits</i> can be used to set limits for the number of
184*63d4abf0Sagcinsertions (<tt>+</tt>), deletions (<tt>-</tt>), substitutions
185*63d4abf0Sagc(<tt>#</tt>), and total number of errors (<tt>~</tt>).  If the
186*63d4abf0Sagc<i>number</i> part is omitted, the specified error count will be
187*63d4abf0Sagcunlimited.
188*63d4abf0Sagc</p>
189*63d4abf0Sagc
190*63d4abf0Sagc<p>
191*63d4abf0SagcThe <i>cost-equation</i> can be thought of as a mathematical equation,
192*63d4abf0Sagcwhere <tt>i</tt>, <tt>d</tt>, and <tt>s</tt> stand for the number of
193*63d4abf0Sagcinsertions, deletions, and substitutions, respectively.  The equation
194*63d4abf0Sagccan have a multiplier for each of <tt>i</tt>, <tt>d</tt>, and
195*63d4abf0Sagc<tt>s</tt>.  The multiplier is the cost of the error, and the number
196*63d4abf0Sagcafter <tt>&lt;</tt> is the maximum allowed cost of a match.  Spaces
197*63d4abf0Sagcand pluses can be inserted to make the equation readable.  In fact, when
198*63d4abf0Sagcspecifying only a cost equation, adding a space after the opening <tt>{</tt>
199*63d4abf0Sagcis <strong>required</strong>.
200*63d4abf0Sagc</p>
201*63d4abf0Sagc
202*63d4abf0Sagc<p>
203*63d4abf0SagcExamples:
204*63d4abf0Sagc<dl>
205*63d4abf0Sagc<dt><tt>{~}</tt></dt>
206*63d4abf0Sagc<dd>Sets the maximum number of errors to unlimited.</dd>
207*63d4abf0Sagc<dt><tt>{~3}</tt></dt>
208*63d4abf0Sagc<dd>Sets the maximum number of errors to three.</dd>
209*63d4abf0Sagc<dt><tt>{+2~5}</tt></dt>
210*63d4abf0Sagc<dd>Sets the maximum number of errors to five, and the maximum number
211*63d4abf0Sagcof insertions to two.</dd>
212*63d4abf0Sagc<dt><tt>{&lt;3}</tt></dt>
213*63d4abf0Sagc<dd>Sets the maximum cost to three.
214*63d4abf0Sagc<dt><tt>{ 2i + 1d + 2s &lt; 5 }</tt></dt>
215*63d4abf0Sagc<dd>Sets the cost of an insertion to two, a deletion to one, a
216*63d4abf0Sagcsubstitution to two, and the maximum cost to five.
217*63d4abf0Sagc</dl>
218*63d4abf0Sagc
219*63d4abf0Sagc
220*63d4abf0Sagc<h3>Bracket expressions</h3>
221*63d4abf0Sagc<a name="bracket-expression"></a>
222*63d4abf0Sagc
223*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
224*63d4abf0Sagc<tr><td>
225*63d4abf0Sagc<pre>
226*63d4abf0Sagc<i>bracket-expression</i> ::= <b>"["</b> <i>item</i>+ <b>"]"</b>
227*63d4abf0Sagc                   |   <b>"[^"</b> <i>item</i>+ <b>"]"</b>
228*63d4abf0Sagc</pre>
229*63d4abf0Sagc</td></tr>
230*63d4abf0Sagc</table>
231*63d4abf0Sagc
232*63d4abf0Sagc<p>
233*63d4abf0SagcA bracket expression specifies a set of characters by enclosing a
234*63d4abf0Sagcnonempty list of items in brackets.  Normally anything matching any
235*63d4abf0Sagcitem in the list is matched.  If the list begins with <tt>^</tt> the
236*63d4abf0Sagcmeaning is negated; any character matching no item in the list is
237*63d4abf0Sagcmatched.
238*63d4abf0Sagc</p>
239*63d4abf0Sagc
240*63d4abf0Sagc<p>
241*63d4abf0SagcAn item is any of the following:
242*63d4abf0Sagc</p>
243*63d4abf0Sagc<ul>
244*63d4abf0Sagc<li>A single character, matching that character.</li>
245*63d4abf0Sagc<li>Two characters separated by <tt>-</tt>.  This is shorthand for the
246*63d4abf0Sagcfull range of characters  between those two (inclusive) in the
247*63d4abf0Sagccollating sequence.  For example, <tt>[0-9]</tt> in ASCII matches any
248*63d4abf0Sagcdecimal digit.</li>
249*63d4abf0Sagc<li>A collating element enclosed in <tt>[.</tt> and <tt>.]</tt>,
250*63d4abf0Sagcmatching the collating element.  This can be used to include a literal
251*63d4abf0Sagc<tt>-</tt> or a multi-character collating element in the list.</li>
252*63d4abf0Sagc<li>A collating element enclosed in <tt>[=</tt> and <tt>=]</tt> (an
253*63d4abf0Sagcequivalence class), matching all collating elements with the same
254*63d4abf0Sagcprimary collation weight as that element, including the element
255*63d4abf0Sagcitself.</li>
256*63d4abf0Sagc<li>The name of a character class enclosed in <tt>[:</tt> and
257*63d4abf0Sagc<tt>:]</tt>, matching any character belonging to the class.  The set
258*63d4abf0Sagcof valid names depends on the <code>LC_CTYPE</code> category of the
259*63d4abf0Sagccurrent locale, but the following names are valid in all locales:
260*63d4abf0Sagc<ul>
261*63d4abf0Sagc<li><tt>alnum</tt> - alphanumeric characters</li>
262*63d4abf0Sagc<li><tt>alpha</tt> - alphabetic characters</li>
263*63d4abf0Sagc<li><tt>blank</tt> - blank characters</li>
264*63d4abf0Sagc<li><tt>cntrl</tt> - control characters</li>
265*63d4abf0Sagc<li><tt>digit</tt> - decimal digits (0 through 9)</li>
266*63d4abf0Sagc<li><tt>graph</tt> - all printable characters except space</li>
267*63d4abf0Sagc<li><tt>lower</tt> - lower-case letters</li>
268*63d4abf0Sagc<li><tt>print</tt> - printable characters including space</li>
269*63d4abf0Sagc<li><tt>punct</tt> - printable characters not space or alphanumeric</li>
270*63d4abf0Sagc<li><tt>space</tt> - white-space characters</li>
271*63d4abf0Sagc<li><tt>upper</tt> - upper case letters</li>
272*63d4abf0Sagc<li><tt>xdigit</tt> - hexadecimal digits</li>
273*63d4abf0Sagc</ul>
274*63d4abf0Sagc</ul>
275*63d4abf0Sagc<p>
276*63d4abf0SagcTo include a literal <tt>-</tt> in the list, make it either the first
277*63d4abf0Sagcor last item, the second endpoint of a range, or enclose it in
278*63d4abf0Sagc<tt>[.</tt> and <tt>.]</tt> to make it a collating element.  To
279*63d4abf0Sagcinclude a literal <tt>]</tt> in the list, make it either the first
280*63d4abf0Sagcitem, the second endpoint of a range, or enclose it in <tt>[.</tt> and
281*63d4abf0Sagc<tt>.]</tt>.  To use a literal <tt>-</tt> as the first
282*63d4abf0Sagcendpoint of a range, enclose it in <tt>[.</tt> and <tt>.]</tt>.
283*63d4abf0Sagc</p>
284*63d4abf0Sagc
285*63d4abf0Sagc
286*63d4abf0Sagc<h3>Assertions</h3>
287*63d4abf0Sagc<a name="assertion"></a>
288*63d4abf0Sagc
289*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
290*63d4abf0Sagc<tr><td>
291*63d4abf0Sagc<pre>
292*63d4abf0Sagc<i>assertion</i> ::= <b>"^"</b>
293*63d4abf0Sagc          |   <b>"$"</b>
294*63d4abf0Sagc          |   <b>"\"</b> <i>assertion-character</i>
295*63d4abf0Sagc</pre>
296*63d4abf0Sagc</td></tr>
297*63d4abf0Sagc</table>
298*63d4abf0Sagc
299*63d4abf0Sagc<p>
300*63d4abf0SagcThe expressions <tt>^</tt> and <tt>$</tt> are called "left anchor" and
301*63d4abf0Sagc"right anchor", respectively.  The left anchor matches the empty
302*63d4abf0Sagcstring at the beginning of the string.  The right anchor matches the
303*63d4abf0Sagcempty string at the end of the string.  The behaviour of both anchors
304*63d4abf0Sagccan be varied by specifying certain execution and compilation flags;
305*63d4abf0Sagcsee the <a href="api.html">API manual</a>.
306*63d4abf0Sagc</p>
307*63d4abf0Sagc
308*63d4abf0Sagc<p>
309*63d4abf0SagcAn assertion-character can be any of the following:
310*63d4abf0Sagc</p>
311*63d4abf0Sagc
312*63d4abf0Sagc<ul>
313*63d4abf0Sagc<li><tt>&lt;</tt> - Beginning of word
314*63d4abf0Sagc<li><tt>&gt;</tt> - End of word
315*63d4abf0Sagc<li><tt>b</tt> - Word boundary
316*63d4abf0Sagc<li><tt>B</tt> - Non-word boundary
317*63d4abf0Sagc<li><tt>d</tt> - Digit character (equivalent to <tt>[[:digit:]]</tt>)</li>
318*63d4abf0Sagc<li><tt>D</tt> - Non-digit character (equivalent to <tt>[^[:digit:]]</tt>)</li>
319*63d4abf0Sagc<li><tt>s</tt> - Space character (equivalent to <tt>[[:space:]]</tt>)</li>
320*63d4abf0Sagc<li><tt>S</tt> - Non-space character (equivalent to <tt>[^[:space:]]</tt>)</li>
321*63d4abf0Sagc<li><tt>w</tt> - Word character (equivalent to <tt>[[:alnum:]_]</tt>)</li>
322*63d4abf0Sagc<li><tt>W</tt> - Non-word character (equivalent to <tt>[^[:alnum:]_]</tt>)</li>
323*63d4abf0Sagc</ul>
324*63d4abf0Sagc
325*63d4abf0Sagc
326*63d4abf0Sagc<h3>Literals</h3>
327*63d4abf0Sagc<a name="literal"></a>
328*63d4abf0Sagc
329*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
330*63d4abf0Sagc<tr><td>
331*63d4abf0Sagc<pre>
332*63d4abf0Sagc<i>literal</i> ::= <i>ordinary-character</i>
333*63d4abf0Sagc        |   <b>"\x"</b> [<b>"1"</b>-<b>"9"</b> <b>"a"-<b>"f"</b> <b>"A"</b>-<b>"F"</b>]{0,2}
334*63d4abf0Sagc        |   <b>"\x{"</b> [<b>"1"</b>-<b>"9"</b> <b>"a"-<b>"f"</b> <b>"A"</b>-<b>"F"</b>]* <b>"}"</b>
335*63d4abf0Sagc        |   <b>"\"</b> <i>character</i>
336*63d4abf0Sagc</pre>
337*63d4abf0Sagc</td></tr>
338*63d4abf0Sagc</table>
339*63d4abf0Sagc<p>
340*63d4abf0SagcA literal is either an ordinary character (a character that has no
341*63d4abf0Sagcother significance in the context), an 8 bit hexadecimal encoded
342*63d4abf0Sagccharacter (e.g. <tt>\x1B</tt>), a wide hexadecimal encoded character
343*63d4abf0Sagc(e.g. <tt>\x{263a}</tt>), or an escaped character.  An escaped
344*63d4abf0Sagccharacter is a <tt>\</tt> followed by any character, and matches that
345*63d4abf0Sagccharacter.  Escaping can be used to match characters which have a
346*63d4abf0Sagcspecial meaning in regexp syntax.  A <tt>\</tt> cannot be the last
347*63d4abf0Sagccharacter of an ERE.  Escaping also allows you to include a few
348*63d4abf0Sagcnon-printable characters in the regular expression.  These special
349*63d4abf0Sagcescape sequences include:
350*63d4abf0Sagc</p>
351*63d4abf0Sagc
352*63d4abf0Sagc<ul>
353*63d4abf0Sagc<li><tt>\a</tt> - Bell character (ASCII code 7)
354*63d4abf0Sagc<li><tt>\e</tt> - Escape character (ASCII code 27)
355*63d4abf0Sagc<li><tt>\f</tt> - Form-feed character (ASCII code 12)
356*63d4abf0Sagc<li><tt>\n</tt> - New-line/line-feed character (ASCII code 10)
357*63d4abf0Sagc<li><tt>\r</tt> - Carriage return character (ASCII code 13)
358*63d4abf0Sagc<li><tt>\t</tt> - Horizontal tab character (ASCII code 9)
359*63d4abf0Sagc</ul>
360*63d4abf0Sagc
361*63d4abf0Sagc<p>
362*63d4abf0SagcAn ordinary character is just a single character with no other
363*63d4abf0Sagcsignificance, and matches that character.  A <tt>{</tt> followed by
364*63d4abf0Sagcsomething else than a digit is considered an ordinary character.
365*63d4abf0Sagc</p>
366*63d4abf0Sagc
367*63d4abf0Sagc
368*63d4abf0Sagc<h3>Back references</h3>
369*63d4abf0Sagc<a name="backref"></a>
370*63d4abf0Sagc
371*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
372*63d4abf0Sagc<tr><td>
373*63d4abf0Sagc<pre>
374*63d4abf0Sagc<i>back-reference</i> ::= <b>"\"</b> [<b>"1"</b>-<b>"9"</b>]
375*63d4abf0Sagc</pre>
376*63d4abf0Sagc</td></tr>
377*63d4abf0Sagc</table>
378*63d4abf0Sagc<p>
379*63d4abf0SagcA back reference is a backslash followed by a single non-zero decimal
380*63d4abf0Sagcdigit <i>d</i>.  It matches the same sequence of characters
381*63d4abf0Sagcmatched by the <i>d</i>th parenthesized subexpression.
382*63d4abf0Sagc</p>
383*63d4abf0Sagc
384*63d4abf0Sagc<p>
385*63d4abf0SagcBack references are not defined for POSIX EREs (for BREs they are),
386*63d4abf0Sagcbut many matchers, including TRE, implement back references for both
387*63d4abf0SagcEREs and BREs.
388*63d4abf0Sagc</p>
389*63d4abf0Sagc
390*63d4abf0Sagc<h3>Options</h3>
391*63d4abf0Sagc<a name="options"></a>
392*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10">
393*63d4abf0Sagc<tr><td>
394*63d4abf0Sagc<pre>
395*63d4abf0Sagc<i>options</i> ::= [<b>"i" "n" "r" "U"</b>]* (<b>"-"</b> [<b>"i" "n" "r" "U"</b>]*)?
396*63d4abf0Sagc</pre>
397*63d4abf0Sagc</td></tr>
398*63d4abf0Sagc</table>
399*63d4abf0Sagc
400*63d4abf0SagcOptions allow compile time options to be turned on/off for particular parts of the
401*63d4abf0Sagcregular expression. The options equate to several compile time options specified to
402*63d4abf0Sagcthe regcomp API function. If the option is specified in the first section, it is
403*63d4abf0Sagcturned on. If it is specified in the second section (after the <tt>-</tt>), it is
404*63d4abf0Sagcturned off.
405*63d4abf0Sagc<ul>
406*63d4abf0Sagc<li>i - Case insensitive.
407*63d4abf0Sagc<li>n - Forces special handling of the new line character. See the REG_NEWLINE flag in
408*63d4abf0Sagcthe <a href="tre-api.html">API Manual</a>.
409*63d4abf0Sagc<li>r - Causes the regex to be matched in a right associative manner rather than the normal
410*63d4abf0Sagcleft associative manner.
411*63d4abf0Sagc<li>U - Forces repetition operators to be non-greedy unless a <tt>?</tt> is appended.
412*63d4abf0Sagc</ul>
413*63d4abf0Sagc<h2>BRE Syntax</h2>
414*63d4abf0Sagc
415*63d4abf0Sagc<p>
416*63d4abf0SagcThe obsolete basic regexp (BRE) syntax differs from the ERE syntax as
417*63d4abf0Sagcfollows:
418*63d4abf0Sagc</p>
419*63d4abf0Sagc
420*63d4abf0Sagc<ul>
421*63d4abf0Sagc<li><tt>|</tt> is an ordinary character, and there is no equivalent
422*63d4abf0Sagcfor its functionality.  <tt>+</tt>, and <tt>?</tt> are ordinary
423*63d4abf0Sagccharacters.</li>
424*63d4abf0Sagc<li>The delimiters for bounds are <tt>\{</tt> and <tt>\}</tt>, with
425*63d4abf0Sagc<tt>{</tt> and <tt>}</tt> by themselves ordinary characters.</li>
426*63d4abf0Sagc<li>The parentheses for nested subexpressions are <tt>\(</tt> and
427*63d4abf0Sagc<tt>\)</tt>, with <tt>(</tt> and <tt>)</tt> by themselves ordinary
428*63d4abf0Sagccharacters.</li>
429*63d4abf0Sagc<li><tt>^</tt> is an ordinary character except at the beginning of the
430*63d4abf0SagcRE or the beginning of a parenthesized subexpression.  Similarly,
431*63d4abf0Sagc<tt>$</tt> is an ordinary character except at the end of the
432*63d4abf0SagcRE or the end of a parenthesized subexpression.</li>
433*63d4abf0Sagc</ul>
434