xref: /csrg-svn/lib/libcompat/regexp/regexp.3 (revision 55838)
148375Scael.\" Copyright 1991 The Regents of the University of California.
248375Scael.\" All rights reserved.
348375Scael.\"
448375Scael.\" %sccs.include.redist.man%
548375Scael.\"
6*55838Sbostic.\"     @(#)regexp.3	5.3 (Berkeley) 08/05/92
748375Scael.\"
848375Scael.Dd
948375Scael.Dt REGEXP 3
1048375Scael.Os
1148375Scael.Sh NAME
1248375Scael.Nm regcomp ,
1348375Scael.Nm regexec ,
1448375Scael.Nm regsub ,
1548375Scael.Nm regerror
1648375Scael.Nd regular expression handlers
1748375Scael.Sh SYNOPSIS
1848375Scael.Fd #include <regexp.h>
1948375Scael.Ft regexp *
2048375Scael.Fn regcomp "const char *exp"
2148375Scael.Ft int
2248375Scael.Fn regexec "const regexp *prog" "const char *string"
2348375Scael.Ft void
2448375Scael.Fn regsub "const regexp *prog" "const char *source" "char *dest"
2548375Scael.Sh DESCRIPTION
26*55838SbosticThis interface is made obsolete by
27*55838Sbostic.Xr regex 3 .
28*55838Sbostic.Pp
2948375ScaelThe
3048375Scael.Fn regcomp ,
3148375Scael.Fn regexec ,
3248375Scael.Fn regsub ,
3348375Scaeland
3448375Scael.Fn regerror
3548375Scaelfunctions
3648375Scaelimplement
3748375Scael.Xr egrep 1 Ns -style
3841347Sbosticregular expressions and supporting facilities.
3948375Scael.Pp
4048375ScaelThe
4148375Scael.Fn regcomp
4248375Scaelfunction
4341347Sbosticcompiles a regular expression into a structure of type
4448375Scael.Xr regexp ,
4541347Sbosticand returns a pointer to it.
4641347SbosticThe space has been allocated using
4748375Scael.Xr malloc 3
4841347Sbosticand may be released by
4948375Scael.Xr free .
5048375Scael.Pp
5148375ScaelThe
5248375Scael.Fn regexec
5348375Scaelfunction
5448375Scaelmatches a
5548375Scael.Dv NUL Ns -terminated
5648375Scael.Fa string
5748375Scaelagainst the compiled regular expression
5848375Scaelin
5948375Scael.Fa prog .
6041347SbosticIt returns 1 for success and 0 for failure, and adjusts the contents of
6148375Scael.Fa prog Ns 's
6248375Scael.Em startp
6348375Scaeland
6448375Scael.Em endp
6548375Scael(see below) accordingly.
6648375Scael.Pp
6741347SbosticThe members of a
6848375Scael.Xr regexp
6941347Sbosticstructure include at least the following (not necessarily in order):
7048375Scael.Bd -literal -offset indent
7141347Sbosticchar *startp[NSUBEXP];
7241347Sbosticchar *endp[NSUBEXP];
7348375Scael.Ed
7448375Scael.Pp
7541347Sbosticwhere
7648375Scael.Dv NSUBEXP
7741347Sbosticis defined (as 10) in the header file.
7848375ScaelOnce a successful
7948375Scael.Fn regexec
8048375Scaelhas been done using the
8148375Scael.Fn regexp ,
8248375Scaeleach
8348375Scael.Em startp Ns - Em endp
8448375Scaelpair describes one substring
8548375Scaelwithin the
8648375Scael.Fa string ,
8748375Scaelwith the
8848375Scael.Em startp
8948375Scaelpointing to the first character of the substring and
9048375Scaelthe
9148375Scael.Em endp
9248375Scaelpointing to the first character following the substring.
9348375ScaelThe 0th substring is the substring of
9448375Scael.Fa string
9548375Scaelthat matched the whole
9641347Sbosticregular expression.
9741347SbosticThe others are those substrings that matched parenthesized expressions
9841347Sbosticwithin the regular expression, with parenthesized expressions numbered
9941347Sbosticin left-to-right order of their opening parentheses.
10048375Scael.Pp
10148375ScaelThe
10248375Scael.Fn regsub
10348375Scaelfunction
10448375Scaelcopies
10548375Scael.Fa source
10648375Scaelto
10748375Scael.Fa dest ,
10848375Scaelmaking substitutions according to the
10948375Scaelmost recent
11048375Scael.Fn regexec
11148375Scaelperformed using
11248375Scael.Fa prog .
11348375ScaelEach instance of `&' in
11448375Scael.Fa source
11548375Scaelis replaced by the substring
11648375Scaelindicated by
11748375Scael.Em startp Ns Bq
11848375Scaeland
11948375Scael.Em endp Ns Bq .
12048375ScaelEach instance of
12148375Scael.Sq \e Ns Em n ,
12248375Scaelwhere
12348375Scael.Em n
12448375Scaelis a digit, is replaced by
12541347Sbosticthe substring indicated by
12648375Scael.Em startp Ns Bq Em n
12748375Scaeland
12848375Scael.Em endp Ns Bq Em n .
12948375ScaelTo get a literal `&' or
13048375Scael.Sq \e Ns Em n
13148375Scaelinto
13248375Scael.Fa dest ,
13348375Scaelprefix it with `\e';
13448375Scaelto get a literal `\e' preceding `&' or
13548375Scael.Sq \e Ns Em n ,
13648375Scaelprefix it with
13741347Sbosticanother `\e'.
13848375Scael.Pp
13948375ScaelThe
14048375Scael.Fn regerror
14148375Scaelfunction
14248375Scaelis called whenever an error is detected in
14348375Scael.Fn regcomp ,
14448375Scael.Fn regexec ,
14548375Scaelor
14648375Scael.Fn regsub .
14748375ScaelThe default
14848375Scael.Fn regerror
14948375Scaelwrites the string
15048375Scael.Fa msg ,
15141347Sbosticwith a suitable indicator of origin,
15241347Sbosticon the standard
15341347Sbosticerror output
15448375Scaeland invokes
15548375Scael.Xr exit 2 .
15648375ScaelThe
15748375Scael.Fn regerror
15848375Scaelfunction
15941347Sbosticcan be replaced by the user if other actions are desirable.
16048375Scael.Sh REGULAR EXPRESSION SYNTAX
16148375ScaelA regular expression is zero or more
16248375Scael.Em branches ,
16348375Scaelseparated by `|'.
16441347SbosticIt matches anything that matches one of the branches.
16548375Scael.Pp
16648375ScaelA branch is zero or more
16748375Scael.Em pieces ,
16848375Scaelconcatenated.
16941347SbosticIt matches a match for the first, followed by a match for the second, etc.
17048375Scael.Pp
17148375ScaelA piece is an
17248375Scael.Em atom
17348375Scaelpossibly followed by `*', `+', or `?'.
17441347SbosticAn atom followed by `*' matches a sequence of 0 or more matches of the atom.
17541347SbosticAn atom followed by `+' matches a sequence of 1 or more matches of the atom.
17641347SbosticAn atom followed by `?' matches a match of the atom, or the null string.
17748375Scael.Pp
17841347SbosticAn atom is a regular expression in parentheses (matching a match for the
17948375Scaelregular expression), a
18048375Scael.Em range
18148375Scael(see below), `.'
18241347Sbostic(matching any single character), `^' (matching the null string at the
18341347Sbosticbeginning of the input string), `$' (matching the null string at the
18441347Sbosticend of the input string), a `\e' followed by a single character (matching
18541347Sbosticthat character), or a single character with no other significance
18641347Sbostic(matching that character).
18748375Scael.Pp
18848375ScaelA
18948375Scael.Em range
19048375Scaelis a sequence of characters enclosed in `[]'.
19141347SbosticIt normally matches any single character from the sequence.
19241347SbosticIf the sequence begins with `^',
19348375Scaelit matches any single character
19448375Scael.Em not
19548375Scaelfrom the rest of the sequence.
19641347SbosticIf two characters in the sequence are separated by `\-', this is shorthand
19748375Scaelfor the full list of
19848375Scael.Tn ASCII
19948375Scaelcharacters between them
20041347Sbostic(e.g. `[0-9]' matches any decimal digit).
20141347SbosticTo include a literal `]' in the sequence, make it the first character
20241347Sbostic(following a possible `^').
20341347SbosticTo include a literal `\-', make it the first or last character.
20448375Scael.Sh AMBIGUITY
20541347SbosticIf a regular expression could match two different parts of the input string,
20641347Sbosticit will match the one which begins earliest.
20748375ScaelIf both begin in the same place but match different lengths, or match
20841347Sbosticthe same length in different ways, life gets messier, as follows.
20948375Scael.Pp
21041347SbosticIn general, the possibilities in a list of branches are considered in
21141347Sbosticleft-to-right order, the possibilities for `*', `+', and `?' are
21241347Sbosticconsidered longest-first, nested constructs are considered from the
21341347Sbosticoutermost in, and concatenated constructs are considered leftmost-first.
21441347SbosticThe match that will be chosen is the one that uses the earliest
21541347Sbosticpossibility in the first choice that has to be made.
21641347SbosticIf there is more than one choice, the next will be made in the same manner
21741347Sbostic(earliest possibility) subject to the decision on the first choice.
21841347SbosticAnd so forth.
21948375Scael.Pp
22048375ScaelFor example,
22148375Scael.Sq Li (ab|a)b*c
22248375Scaelcould match
22348375Scael`abc' in one of two ways.
22441347SbosticThe first choice is between `ab' and `a'; since `ab' is earlier, and does
22541347Sbosticlead to a successful overall match, it is chosen.
22641347SbosticSince the `b' is already spoken for,
22741347Sbosticthe `b*' must match its last possibility\(emthe empty string\(emsince
22841347Sbosticit must respect the earlier choice.
22948375Scael.Pp
23041347SbosticIn the particular case where no `|'s are present and there is only one
23141347Sbostic`*', `+', or `?', the net effect is that the longest possible
23241347Sbosticmatch will be chosen.
23348375ScaelSo
23448375Scael.Sq Li ab* ,
23548375Scaelpresented with `xabbbby', will match `abbbb'.
23648375ScaelNote that if
23748375Scael.Sq Li ab* ,
23848375Scaelis tried against `xabyabbbz', it
23941347Sbosticwill match `ab' just after `x', due to the begins-earliest rule.
24041347Sbostic(In effect, the decision on where to start the match is the first choice
24141347Sbosticto be made, hence subsequent choices must respect it even if this leads them
24241347Sbosticto less-preferred alternatives.)
24348375Scael.Sh RETURN VALUES
24448375ScaelThe
24548375Scael.Fn regcomp
24648375Scaelfunction
24748375Scaelreturns
24848375Scael.Dv NULL
24948375Scaelfor a failure
25048375Scael.Pf ( Fn regerror
25148375Scaelpermitting),
25241347Sbosticwhere failures are syntax errors, exceeding implementation limits,
25341347Sbosticor applying `+' or `*' to a possibly-null operand.
25448375Scael.Sh SEE ALSO
25548375Scael.Xr ed 1 ,
25648375Scael.Xr ex 1 ,
25748375Scael.Xr expr 1 ,
25848375Scael.Xr egrep 1 ,
25948375Scael.Xr fgrep 1 ,
26048375Scael.Xr grep 1 ,
26148375Scael.Xr regex 3
26248375Scael.Sh HISTORY
26348375ScaelBoth code and manual page for
26448375Scael.Fn regcomp ,
26548375Scael.Fn regexec ,
26648375Scael.Fn regsub ,
26748375Scaeland
26848375Scael.Fn regerror
26948375Scaelwere written at the University of Toronto
27048375Scaeland appeared in
27148375Scael.Bx 4.3 tahoe .
27248375ScaelThey are intended to be compatible with the Bell V8
27348375Scael.Xr regexp 3 ,
27441347Sbosticbut are not derived from Bell code.
27548375Scael.Sh BUGS
27641347SbosticEmpty branches and empty regular expressions are not portable to V8.
27748375Scael.Pp
27841347SbosticThe restriction against
27941347Sbosticapplying `*' or `+' to a possibly-null operand is an artifact of the
28041347Sbosticsimplistic implementation.
28148375Scael.Pp
28248375ScaelDoes not support
28348375Scael.Xr egrep Ns 's
28448375Scaelnewline-separated branches;
28548375Scaelneither does the V8
28648375Scael.Xr regexp 3 ,
28748375Scaelthough.
28848375Scael.Pp
28941347SbosticDue to emphasis on
29041347Sbosticcompactness and simplicity,
29141347Sbosticit's not strikingly fast.
29241347SbosticIt does give special attention to handling simple cases quickly.
293