1*63d4abf0Sagc<h1>TRE Regexp Syntax</h1> 2*63d4abf0Sagc 3*63d4abf0Sagc<p> 4*63d4abf0SagcThis document describes the POSIX 1003.2 extended RE (ERE) syntax and 5*63d4abf0Sagcthe basic RE (BRE) syntax as implemented by TRE, and the TRE extensions 6*63d4abf0Sagcto the ERE syntax. A simple Extended Backus-Naur Form (EBNF) style 7*63d4abf0Sagcnotation is used to describe the grammar. 8*63d4abf0Sagc</p> 9*63d4abf0Sagc 10*63d4abf0Sagc<h2>ERE Syntax</h2> 11*63d4abf0Sagc 12*63d4abf0Sagc<h3>Alternation operator</h3> 13*63d4abf0Sagc<a name="alternation"></a> 14*63d4abf0Sagc<a name="extended-regexp"></a> 15*63d4abf0Sagc 16*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 17*63d4abf0Sagc<tr><td> 18*63d4abf0Sagc<pre> 19*63d4abf0Sagc<i>extended-regexp</i> ::= <a href="#branch"><i>branch</i></a> 20*63d4abf0Sagc | <i>extended-regexp</i> <b>"|"</b> <a href="#branch"><i>branch</i></a> 21*63d4abf0Sagc</pre> 22*63d4abf0Sagc</td></tr> 23*63d4abf0Sagc</table> 24*63d4abf0Sagc<p> 25*63d4abf0SagcAn extended regexp (ERE) is one or more <i>branches</i>, separated by 26*63d4abf0Sagc<tt>|</tt>. An ERE matches anything that matches one or more of the 27*63d4abf0Sagcbranches. 28*63d4abf0Sagc</p> 29*63d4abf0Sagc 30*63d4abf0Sagc<h3>Catenation of REs</h3> 31*63d4abf0Sagc<a name="catenation"></a> 32*63d4abf0Sagc<a name="branch"></a> 33*63d4abf0Sagc 34*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 35*63d4abf0Sagc<tr><td> 36*63d4abf0Sagc<pre> 37*63d4abf0Sagc<i>branch</i> ::= <i>piece</i> 38*63d4abf0Sagc | <i>branch</i> <i>piece</i> 39*63d4abf0Sagc</pre> 40*63d4abf0Sagc</td></tr> 41*63d4abf0Sagc</table> 42*63d4abf0Sagc<p> 43*63d4abf0SagcA branch is one or more <i>pieces</i> concatenated. It matches a 44*63d4abf0Sagcmatch for the first piece, followed by a match for the second piece, 45*63d4abf0Sagcand so on. 46*63d4abf0Sagc</p> 47*63d4abf0Sagc 48*63d4abf0Sagc 49*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 50*63d4abf0Sagc<tr><td> 51*63d4abf0Sagc<pre> 52*63d4abf0Sagc<i>piece</i> ::= <i>atom</i> 53*63d4abf0Sagc | <i>atom</i> <a href="#repeat-operator"><i>repeat-operator</i></a> 54*63d4abf0Sagc | <i>atom</i> <a href="#approx-settings"><i>approx-settings</i></a> 55*63d4abf0Sagc</pre> 56*63d4abf0Sagc</td></tr> 57*63d4abf0Sagc</table> 58*63d4abf0Sagc<p> 59*63d4abf0SagcA piece is an <i>atom</i> possibly followed by a repeat operator or an 60*63d4abf0Sagcexpression controlling approximate matching parameters for the <i>atom</i>. 61*63d4abf0Sagc</p> 62*63d4abf0Sagc 63*63d4abf0Sagc 64*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 65*63d4abf0Sagc<tr><td> 66*63d4abf0Sagc<pre> 67*63d4abf0Sagc<i>atom</i> ::= <b>"("</b> <i>extended-regexp</i> <b>")"</b> 68*63d4abf0Sagc | <a href="#bracket-expression"><i>bracket-expression</i></a> 69*63d4abf0Sagc | <b>"."</b> 70*63d4abf0Sagc | <a href="#assertion"><i>assertion</i></a> 71*63d4abf0Sagc | <a href="#literal"><i>literal</i></a> 72*63d4abf0Sagc | <a href="#backref"><i>back-reference</i></a> 73*63d4abf0Sagc | <b>"(?#"</b> <i>comment-text</i> <b>")"</b> 74*63d4abf0Sagc | <b>"(?"</b> <a href="#options"><i>options</i></a> <b>")"</b> <i>extended-regexp</i> 75*63d4abf0Sagc | <b>"(?"</b> <a href="#options"><i>options</i></a> <b>":"</b> <i>extended-regexp</i> <b>")"</b> 76*63d4abf0Sagc</pre> 77*63d4abf0Sagc</td></tr> 78*63d4abf0Sagc</table> 79*63d4abf0Sagc<p> 80*63d4abf0SagcAn atom is either an ERE enclosed in parenthesis, a bracket 81*63d4abf0Sagcexpression, a <tt>.</tt> (period), an assertion, or a literal. 82*63d4abf0Sagc</p> 83*63d4abf0Sagc 84*63d4abf0Sagc<p> 85*63d4abf0SagcThe dot (<tt>.</tt>) matches any single character. 86*63d4abf0SagcIf the <code>REG_NEWLINE</code> compilation flag (see <a 87*63d4abf0Sagchref="api.html">API manual</a>) is specified, the newline 88*63d4abf0Sagccharacter is not matched. 89*63d4abf0Sagc</p> 90*63d4abf0Sagc 91*63d4abf0Sagc<p> 92*63d4abf0Sagc<tt>Comment-text</tt> can contain any characters except for a closing parenthesis <tt>)</tt>. The text in the comment is 93*63d4abf0Sagccompletely ignored by the regex parser and it used solely for readability purposes. 94*63d4abf0Sagc</p> 95*63d4abf0Sagc 96*63d4abf0Sagc<h3>Repeat operators</h3> 97*63d4abf0Sagc<a name="repeat-operator"></a> 98*63d4abf0Sagc 99*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 100*63d4abf0Sagc<tr><td> 101*63d4abf0Sagc<pre> 102*63d4abf0Sagc<i>repeat-operator</i> ::= <b>"*"</b> 103*63d4abf0Sagc | <b>"+"</b> 104*63d4abf0Sagc | <b>"?"</b> 105*63d4abf0Sagc | <i>bound</i> 106*63d4abf0Sagc | <b>"*?"</b> 107*63d4abf0Sagc | <b>"+?"</b> 108*63d4abf0Sagc | <b>"??"</b> 109*63d4abf0Sagc | <i>bound</i> <b>?</b> 110*63d4abf0Sagc</pre> 111*63d4abf0Sagc</td></tr> 112*63d4abf0Sagc</table> 113*63d4abf0Sagc 114*63d4abf0Sagc<p> 115*63d4abf0SagcAn atom followed by <tt>*</tt> matches a sequence of 0 or more matches 116*63d4abf0Sagcof the atom. <tt>+</tt> is similar to <tt>*</tt>, matching a sequence 117*63d4abf0Sagcof 1 or more matches of the atom. An atom followed by <tt>?</tt> 118*63d4abf0Sagcmatches a sequence of 0 or 1 matches of the atom. 119*63d4abf0Sagc</p> 120*63d4abf0Sagc 121*63d4abf0Sagc<p> 122*63d4abf0SagcA <i>bound</i> is one of the following, where <i>m</i> and <i>m</i> 123*63d4abf0Sagcare unsigned decimal integers between <tt>0</tt> and 124*63d4abf0Sagc<tt>RE_DUP_MAX</tt>: 125*63d4abf0Sagc</p> 126*63d4abf0Sagc 127*63d4abf0Sagc<ol> 128*63d4abf0Sagc<li><tt>{</tt><i>m</i><tt>,</tt><i>n</i><tt>}</tt></li> 129*63d4abf0Sagc<li><tt>{</tt><i>m</i><tt>,}</tt></li> 130*63d4abf0Sagc<li><tt>{</tt><i>m</i><tt>}</tt></li> 131*63d4abf0Sagc</ol> 132*63d4abf0Sagc 133*63d4abf0Sagc<p> 134*63d4abf0SagcAn atom followed by [1] matches a sequence of <i>m</i> through <i>n</i> 135*63d4abf0Sagc(inclusive) matches of the atom. An atom followed by [2] 136*63d4abf0Sagcmatches a sequence of <i>m</i> or more matches of the atom. An atom 137*63d4abf0Sagcfollowed by [3] matches a sequence of exactly <i>m</i> matches of the 138*63d4abf0Sagcatom. 139*63d4abf0Sagc</p> 140*63d4abf0Sagc 141*63d4abf0Sagc 142*63d4abf0Sagc<p> 143*63d4abf0SagcAdding a <tt>?</tt> to a repeat operator makes the subexpression minimal, or 144*63d4abf0Sagcnon-greedy. Normally a repeated expression is greedy, that is, it matches as 145*63d4abf0Sagcmany characters as possible. A non-greedy subexpression matches as few 146*63d4abf0Sagccharacters as possible. Note that this does not (always) mean the same thing 147*63d4abf0Sagcas matching as many or few repetitions as possible. Also note 148*63d4abf0Sagcthat <strong>minimal repetitions are not currently supported for approximate 149*63d4abf0Sagcmatching</strong>. 150*63d4abf0Sagc</p> 151*63d4abf0Sagc 152*63d4abf0Sagc<h3>Approximate matching settings</h3> 153*63d4abf0Sagc<a name="approx-settings"></a> 154*63d4abf0Sagc 155*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 156*63d4abf0Sagc<tr><td> 157*63d4abf0Sagc<pre> 158*63d4abf0Sagc<i>approx-settings</i> ::= <b>"{"</b> <i>count-limits</i>* <b>","</b>? <i>cost-equation</i>? <b>"}"</b> 159*63d4abf0Sagc 160*63d4abf0Sagc<i>count-limits</i> ::= <b>"+"</b> <i>number</i>? 161*63d4abf0Sagc | <b>"-"</b> <i>number</i>? 162*63d4abf0Sagc | <b>"#"</b> <i>number</i>? 163*63d4abf0Sagc | <b>"~"</b> <i>number</i>? 164*63d4abf0Sagc 165*63d4abf0Sagc<i>cost-equation</i> ::= ( <i>cost-term</i> "+"? " "? )+ <b>"<"</b> <i>number</i> 166*63d4abf0Sagc 167*63d4abf0Sagc<i>cost-term</i> ::= <i>number</i> <b>"i"</b> 168*63d4abf0Sagc | <i>number</i> <b>"d"</b> 169*63d4abf0Sagc | <i>number</i> <b>"s"</b> 170*63d4abf0Sagc 171*63d4abf0Sagc</pre> 172*63d4abf0Sagc</td></tr> 173*63d4abf0Sagc</table> 174*63d4abf0Sagc 175*63d4abf0Sagc<p> 176*63d4abf0SagcThe approximate matching settings for a subpattern can be changed 177*63d4abf0Sagcby appending <i>approx-settings</i> to the subpattern. Limits for 178*63d4abf0Sagcthe number of errors can be set and an expression for specifying and 179*63d4abf0Sagclimiting the costs can be given. 180*63d4abf0Sagc</p> 181*63d4abf0Sagc 182*63d4abf0Sagc<p> 183*63d4abf0SagcThe <i>count-limits</i> can be used to set limits for the number of 184*63d4abf0Sagcinsertions (<tt>+</tt>), deletions (<tt>-</tt>), substitutions 185*63d4abf0Sagc(<tt>#</tt>), and total number of errors (<tt>~</tt>). If the 186*63d4abf0Sagc<i>number</i> part is omitted, the specified error count will be 187*63d4abf0Sagcunlimited. 188*63d4abf0Sagc</p> 189*63d4abf0Sagc 190*63d4abf0Sagc<p> 191*63d4abf0SagcThe <i>cost-equation</i> can be thought of as a mathematical equation, 192*63d4abf0Sagcwhere <tt>i</tt>, <tt>d</tt>, and <tt>s</tt> stand for the number of 193*63d4abf0Sagcinsertions, deletions, and substitutions, respectively. The equation 194*63d4abf0Sagccan have a multiplier for each of <tt>i</tt>, <tt>d</tt>, and 195*63d4abf0Sagc<tt>s</tt>. The multiplier is the cost of the error, and the number 196*63d4abf0Sagcafter <tt><</tt> is the maximum allowed cost of a match. Spaces 197*63d4abf0Sagcand pluses can be inserted to make the equation readable. In fact, when 198*63d4abf0Sagcspecifying only a cost equation, adding a space after the opening <tt>{</tt> 199*63d4abf0Sagcis <strong>required</strong>. 200*63d4abf0Sagc</p> 201*63d4abf0Sagc 202*63d4abf0Sagc<p> 203*63d4abf0SagcExamples: 204*63d4abf0Sagc<dl> 205*63d4abf0Sagc<dt><tt>{~}</tt></dt> 206*63d4abf0Sagc<dd>Sets the maximum number of errors to unlimited.</dd> 207*63d4abf0Sagc<dt><tt>{~3}</tt></dt> 208*63d4abf0Sagc<dd>Sets the maximum number of errors to three.</dd> 209*63d4abf0Sagc<dt><tt>{+2~5}</tt></dt> 210*63d4abf0Sagc<dd>Sets the maximum number of errors to five, and the maximum number 211*63d4abf0Sagcof insertions to two.</dd> 212*63d4abf0Sagc<dt><tt>{<3}</tt></dt> 213*63d4abf0Sagc<dd>Sets the maximum cost to three. 214*63d4abf0Sagc<dt><tt>{ 2i + 1d + 2s < 5 }</tt></dt> 215*63d4abf0Sagc<dd>Sets the cost of an insertion to two, a deletion to one, a 216*63d4abf0Sagcsubstitution to two, and the maximum cost to five. 217*63d4abf0Sagc</dl> 218*63d4abf0Sagc 219*63d4abf0Sagc 220*63d4abf0Sagc<h3>Bracket expressions</h3> 221*63d4abf0Sagc<a name="bracket-expression"></a> 222*63d4abf0Sagc 223*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 224*63d4abf0Sagc<tr><td> 225*63d4abf0Sagc<pre> 226*63d4abf0Sagc<i>bracket-expression</i> ::= <b>"["</b> <i>item</i>+ <b>"]"</b> 227*63d4abf0Sagc | <b>"[^"</b> <i>item</i>+ <b>"]"</b> 228*63d4abf0Sagc</pre> 229*63d4abf0Sagc</td></tr> 230*63d4abf0Sagc</table> 231*63d4abf0Sagc 232*63d4abf0Sagc<p> 233*63d4abf0SagcA bracket expression specifies a set of characters by enclosing a 234*63d4abf0Sagcnonempty list of items in brackets. Normally anything matching any 235*63d4abf0Sagcitem in the list is matched. If the list begins with <tt>^</tt> the 236*63d4abf0Sagcmeaning is negated; any character matching no item in the list is 237*63d4abf0Sagcmatched. 238*63d4abf0Sagc</p> 239*63d4abf0Sagc 240*63d4abf0Sagc<p> 241*63d4abf0SagcAn item is any of the following: 242*63d4abf0Sagc</p> 243*63d4abf0Sagc<ul> 244*63d4abf0Sagc<li>A single character, matching that character.</li> 245*63d4abf0Sagc<li>Two characters separated by <tt>-</tt>. This is shorthand for the 246*63d4abf0Sagcfull range of characters between those two (inclusive) in the 247*63d4abf0Sagccollating sequence. For example, <tt>[0-9]</tt> in ASCII matches any 248*63d4abf0Sagcdecimal digit.</li> 249*63d4abf0Sagc<li>A collating element enclosed in <tt>[.</tt> and <tt>.]</tt>, 250*63d4abf0Sagcmatching the collating element. This can be used to include a literal 251*63d4abf0Sagc<tt>-</tt> or a multi-character collating element in the list.</li> 252*63d4abf0Sagc<li>A collating element enclosed in <tt>[=</tt> and <tt>=]</tt> (an 253*63d4abf0Sagcequivalence class), matching all collating elements with the same 254*63d4abf0Sagcprimary collation weight as that element, including the element 255*63d4abf0Sagcitself.</li> 256*63d4abf0Sagc<li>The name of a character class enclosed in <tt>[:</tt> and 257*63d4abf0Sagc<tt>:]</tt>, matching any character belonging to the class. The set 258*63d4abf0Sagcof valid names depends on the <code>LC_CTYPE</code> category of the 259*63d4abf0Sagccurrent locale, but the following names are valid in all locales: 260*63d4abf0Sagc<ul> 261*63d4abf0Sagc<li><tt>alnum</tt> - alphanumeric characters</li> 262*63d4abf0Sagc<li><tt>alpha</tt> - alphabetic characters</li> 263*63d4abf0Sagc<li><tt>blank</tt> - blank characters</li> 264*63d4abf0Sagc<li><tt>cntrl</tt> - control characters</li> 265*63d4abf0Sagc<li><tt>digit</tt> - decimal digits (0 through 9)</li> 266*63d4abf0Sagc<li><tt>graph</tt> - all printable characters except space</li> 267*63d4abf0Sagc<li><tt>lower</tt> - lower-case letters</li> 268*63d4abf0Sagc<li><tt>print</tt> - printable characters including space</li> 269*63d4abf0Sagc<li><tt>punct</tt> - printable characters not space or alphanumeric</li> 270*63d4abf0Sagc<li><tt>space</tt> - white-space characters</li> 271*63d4abf0Sagc<li><tt>upper</tt> - upper case letters</li> 272*63d4abf0Sagc<li><tt>xdigit</tt> - hexadecimal digits</li> 273*63d4abf0Sagc</ul> 274*63d4abf0Sagc</ul> 275*63d4abf0Sagc<p> 276*63d4abf0SagcTo include a literal <tt>-</tt> in the list, make it either the first 277*63d4abf0Sagcor last item, the second endpoint of a range, or enclose it in 278*63d4abf0Sagc<tt>[.</tt> and <tt>.]</tt> to make it a collating element. To 279*63d4abf0Sagcinclude a literal <tt>]</tt> in the list, make it either the first 280*63d4abf0Sagcitem, the second endpoint of a range, or enclose it in <tt>[.</tt> and 281*63d4abf0Sagc<tt>.]</tt>. To use a literal <tt>-</tt> as the first 282*63d4abf0Sagcendpoint of a range, enclose it in <tt>[.</tt> and <tt>.]</tt>. 283*63d4abf0Sagc</p> 284*63d4abf0Sagc 285*63d4abf0Sagc 286*63d4abf0Sagc<h3>Assertions</h3> 287*63d4abf0Sagc<a name="assertion"></a> 288*63d4abf0Sagc 289*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 290*63d4abf0Sagc<tr><td> 291*63d4abf0Sagc<pre> 292*63d4abf0Sagc<i>assertion</i> ::= <b>"^"</b> 293*63d4abf0Sagc | <b>"$"</b> 294*63d4abf0Sagc | <b>"\"</b> <i>assertion-character</i> 295*63d4abf0Sagc</pre> 296*63d4abf0Sagc</td></tr> 297*63d4abf0Sagc</table> 298*63d4abf0Sagc 299*63d4abf0Sagc<p> 300*63d4abf0SagcThe expressions <tt>^</tt> and <tt>$</tt> are called "left anchor" and 301*63d4abf0Sagc"right anchor", respectively. The left anchor matches the empty 302*63d4abf0Sagcstring at the beginning of the string. The right anchor matches the 303*63d4abf0Sagcempty string at the end of the string. The behaviour of both anchors 304*63d4abf0Sagccan be varied by specifying certain execution and compilation flags; 305*63d4abf0Sagcsee the <a href="api.html">API manual</a>. 306*63d4abf0Sagc</p> 307*63d4abf0Sagc 308*63d4abf0Sagc<p> 309*63d4abf0SagcAn assertion-character can be any of the following: 310*63d4abf0Sagc</p> 311*63d4abf0Sagc 312*63d4abf0Sagc<ul> 313*63d4abf0Sagc<li><tt><</tt> - Beginning of word 314*63d4abf0Sagc<li><tt>></tt> - End of word 315*63d4abf0Sagc<li><tt>b</tt> - Word boundary 316*63d4abf0Sagc<li><tt>B</tt> - Non-word boundary 317*63d4abf0Sagc<li><tt>d</tt> - Digit character (equivalent to <tt>[[:digit:]]</tt>)</li> 318*63d4abf0Sagc<li><tt>D</tt> - Non-digit character (equivalent to <tt>[^[:digit:]]</tt>)</li> 319*63d4abf0Sagc<li><tt>s</tt> - Space character (equivalent to <tt>[[:space:]]</tt>)</li> 320*63d4abf0Sagc<li><tt>S</tt> - Non-space character (equivalent to <tt>[^[:space:]]</tt>)</li> 321*63d4abf0Sagc<li><tt>w</tt> - Word character (equivalent to <tt>[[:alnum:]_]</tt>)</li> 322*63d4abf0Sagc<li><tt>W</tt> - Non-word character (equivalent to <tt>[^[:alnum:]_]</tt>)</li> 323*63d4abf0Sagc</ul> 324*63d4abf0Sagc 325*63d4abf0Sagc 326*63d4abf0Sagc<h3>Literals</h3> 327*63d4abf0Sagc<a name="literal"></a> 328*63d4abf0Sagc 329*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 330*63d4abf0Sagc<tr><td> 331*63d4abf0Sagc<pre> 332*63d4abf0Sagc<i>literal</i> ::= <i>ordinary-character</i> 333*63d4abf0Sagc | <b>"\x"</b> [<b>"1"</b>-<b>"9"</b> <b>"a"-<b>"f"</b> <b>"A"</b>-<b>"F"</b>]{0,2} 334*63d4abf0Sagc | <b>"\x{"</b> [<b>"1"</b>-<b>"9"</b> <b>"a"-<b>"f"</b> <b>"A"</b>-<b>"F"</b>]* <b>"}"</b> 335*63d4abf0Sagc | <b>"\"</b> <i>character</i> 336*63d4abf0Sagc</pre> 337*63d4abf0Sagc</td></tr> 338*63d4abf0Sagc</table> 339*63d4abf0Sagc<p> 340*63d4abf0SagcA literal is either an ordinary character (a character that has no 341*63d4abf0Sagcother significance in the context), an 8 bit hexadecimal encoded 342*63d4abf0Sagccharacter (e.g. <tt>\x1B</tt>), a wide hexadecimal encoded character 343*63d4abf0Sagc(e.g. <tt>\x{263a}</tt>), or an escaped character. An escaped 344*63d4abf0Sagccharacter is a <tt>\</tt> followed by any character, and matches that 345*63d4abf0Sagccharacter. Escaping can be used to match characters which have a 346*63d4abf0Sagcspecial meaning in regexp syntax. A <tt>\</tt> cannot be the last 347*63d4abf0Sagccharacter of an ERE. Escaping also allows you to include a few 348*63d4abf0Sagcnon-printable characters in the regular expression. These special 349*63d4abf0Sagcescape sequences include: 350*63d4abf0Sagc</p> 351*63d4abf0Sagc 352*63d4abf0Sagc<ul> 353*63d4abf0Sagc<li><tt>\a</tt> - Bell character (ASCII code 7) 354*63d4abf0Sagc<li><tt>\e</tt> - Escape character (ASCII code 27) 355*63d4abf0Sagc<li><tt>\f</tt> - Form-feed character (ASCII code 12) 356*63d4abf0Sagc<li><tt>\n</tt> - New-line/line-feed character (ASCII code 10) 357*63d4abf0Sagc<li><tt>\r</tt> - Carriage return character (ASCII code 13) 358*63d4abf0Sagc<li><tt>\t</tt> - Horizontal tab character (ASCII code 9) 359*63d4abf0Sagc</ul> 360*63d4abf0Sagc 361*63d4abf0Sagc<p> 362*63d4abf0SagcAn ordinary character is just a single character with no other 363*63d4abf0Sagcsignificance, and matches that character. A <tt>{</tt> followed by 364*63d4abf0Sagcsomething else than a digit is considered an ordinary character. 365*63d4abf0Sagc</p> 366*63d4abf0Sagc 367*63d4abf0Sagc 368*63d4abf0Sagc<h3>Back references</h3> 369*63d4abf0Sagc<a name="backref"></a> 370*63d4abf0Sagc 371*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 372*63d4abf0Sagc<tr><td> 373*63d4abf0Sagc<pre> 374*63d4abf0Sagc<i>back-reference</i> ::= <b>"\"</b> [<b>"1"</b>-<b>"9"</b>] 375*63d4abf0Sagc</pre> 376*63d4abf0Sagc</td></tr> 377*63d4abf0Sagc</table> 378*63d4abf0Sagc<p> 379*63d4abf0SagcA back reference is a backslash followed by a single non-zero decimal 380*63d4abf0Sagcdigit <i>d</i>. It matches the same sequence of characters 381*63d4abf0Sagcmatched by the <i>d</i>th parenthesized subexpression. 382*63d4abf0Sagc</p> 383*63d4abf0Sagc 384*63d4abf0Sagc<p> 385*63d4abf0SagcBack references are not defined for POSIX EREs (for BREs they are), 386*63d4abf0Sagcbut many matchers, including TRE, implement back references for both 387*63d4abf0SagcEREs and BREs. 388*63d4abf0Sagc</p> 389*63d4abf0Sagc 390*63d4abf0Sagc<h3>Options</h3> 391*63d4abf0Sagc<a name="options"></a> 392*63d4abf0Sagc<table bgcolor="#e0e0f0" cellpadding="10"> 393*63d4abf0Sagc<tr><td> 394*63d4abf0Sagc<pre> 395*63d4abf0Sagc<i>options</i> ::= [<b>"i" "n" "r" "U"</b>]* (<b>"-"</b> [<b>"i" "n" "r" "U"</b>]*)? 396*63d4abf0Sagc</pre> 397*63d4abf0Sagc</td></tr> 398*63d4abf0Sagc</table> 399*63d4abf0Sagc 400*63d4abf0SagcOptions allow compile time options to be turned on/off for particular parts of the 401*63d4abf0Sagcregular expression. The options equate to several compile time options specified to 402*63d4abf0Sagcthe regcomp API function. If the option is specified in the first section, it is 403*63d4abf0Sagcturned on. If it is specified in the second section (after the <tt>-</tt>), it is 404*63d4abf0Sagcturned off. 405*63d4abf0Sagc<ul> 406*63d4abf0Sagc<li>i - Case insensitive. 407*63d4abf0Sagc<li>n - Forces special handling of the new line character. See the REG_NEWLINE flag in 408*63d4abf0Sagcthe <a href="tre-api.html">API Manual</a>. 409*63d4abf0Sagc<li>r - Causes the regex to be matched in a right associative manner rather than the normal 410*63d4abf0Sagcleft associative manner. 411*63d4abf0Sagc<li>U - Forces repetition operators to be non-greedy unless a <tt>?</tt> is appended. 412*63d4abf0Sagc</ul> 413*63d4abf0Sagc<h2>BRE Syntax</h2> 414*63d4abf0Sagc 415*63d4abf0Sagc<p> 416*63d4abf0SagcThe obsolete basic regexp (BRE) syntax differs from the ERE syntax as 417*63d4abf0Sagcfollows: 418*63d4abf0Sagc</p> 419*63d4abf0Sagc 420*63d4abf0Sagc<ul> 421*63d4abf0Sagc<li><tt>|</tt> is an ordinary character, and there is no equivalent 422*63d4abf0Sagcfor its functionality. <tt>+</tt>, and <tt>?</tt> are ordinary 423*63d4abf0Sagccharacters.</li> 424*63d4abf0Sagc<li>The delimiters for bounds are <tt>\{</tt> and <tt>\}</tt>, with 425*63d4abf0Sagc<tt>{</tt> and <tt>}</tt> by themselves ordinary characters.</li> 426*63d4abf0Sagc<li>The parentheses for nested subexpressions are <tt>\(</tt> and 427*63d4abf0Sagc<tt>\)</tt>, with <tt>(</tt> and <tt>)</tt> by themselves ordinary 428*63d4abf0Sagccharacters.</li> 429*63d4abf0Sagc<li><tt>^</tt> is an ordinary character except at the beginning of the 430*63d4abf0SagcRE or the beginning of a parenthesized subexpression. Similarly, 431*63d4abf0Sagc<tt>$</tt> is an ordinary character except at the end of the 432*63d4abf0SagcRE or the end of a parenthesized subexpression.</li> 433*63d4abf0Sagc</ul> 434