xref: /inferno-os/man/6/regexp (revision 46439007cf417cbd9ac8049bb4122c890097a0fa)
REGEXP 6
NAME
regexp, regex - regular expression notation
DESCRIPTION
A "regular expression" specifies a set of strings of characters. A member of this set of strings is said to be matched by the regular expression. In many applications a delimiter character, commonly .LR / , bounds a regular expression. In the following specification for regular expressions the word `character' means any character (rune) but newline.

The syntax for a regular expression e0 is

.EX e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')' e2: e3 | e2 REP REP: '*' | '+' | '?' e1: e2 | e1 e2 e0: e1 | e0 '|' e1

A literal is any non-metacharacter, or a metacharacter (one of .*+?[]()|\e^$ ), or the delimiter preceded by .LR \e .

A charclass is a nonempty string s bracketed [ \|s\| ] (or [^ s\| ]); it matches any character in (or not in) s . A negated character class never matches newline. A substring a - b\f1, with a and b in ascending order, stands for the inclusive range of characters between a and b . In s , the metacharacters .LR - , .LR ] , an initial .LR ^ , and the regular expression delimiter must be preceded by a .LR \e ; other metacharacters have no special meaning and may appear unescaped.

A .L . matches any character.

A .L ^ matches the beginning of a line; .L $ matches the end of the line.

The REP operators match zero or more ( * ), one or more ( + ), zero or one ( ? ), instances respectively of the preceding regular expression e2 .

A concatenated regular expression, "e1\|e2" , matches a match to e1 followed by a match to e2 .

An alternative regular expression, "e0\||\|e1" , matches either a match to e0 or a match to e1 .

A match to any part of a regular expression extends as far as possible without preventing a match to the remainder of the regular expression.

"SEE ALSO"
acme (1), sh-regex (1), regex (2)