148375Scael.\" Copyright 1991 The Regents of the University of California. 248375Scael.\" All rights reserved. 348375Scael.\" 448375Scael.\" %sccs.include.redist.man% 548375Scael.\" 6*55838Sbostic.\" @(#)regexp.3 5.3 (Berkeley) 08/05/92 748375Scael.\" 848375Scael.Dd 948375Scael.Dt REGEXP 3 1048375Scael.Os 1148375Scael.Sh NAME 1248375Scael.Nm regcomp , 1348375Scael.Nm regexec , 1448375Scael.Nm regsub , 1548375Scael.Nm regerror 1648375Scael.Nd regular expression handlers 1748375Scael.Sh SYNOPSIS 1848375Scael.Fd #include <regexp.h> 1948375Scael.Ft regexp * 2048375Scael.Fn regcomp "const char *exp" 2148375Scael.Ft int 2248375Scael.Fn regexec "const regexp *prog" "const char *string" 2348375Scael.Ft void 2448375Scael.Fn regsub "const regexp *prog" "const char *source" "char *dest" 2548375Scael.Sh DESCRIPTION 26*55838SbosticThis interface is made obsolete by 27*55838Sbostic.Xr regex 3 . 28*55838Sbostic.Pp 2948375ScaelThe 3048375Scael.Fn regcomp , 3148375Scael.Fn regexec , 3248375Scael.Fn regsub , 3348375Scaeland 3448375Scael.Fn regerror 3548375Scaelfunctions 3648375Scaelimplement 3748375Scael.Xr egrep 1 Ns -style 3841347Sbosticregular expressions and supporting facilities. 3948375Scael.Pp 4048375ScaelThe 4148375Scael.Fn regcomp 4248375Scaelfunction 4341347Sbosticcompiles a regular expression into a structure of type 4448375Scael.Xr regexp , 4541347Sbosticand returns a pointer to it. 4641347SbosticThe space has been allocated using 4748375Scael.Xr malloc 3 4841347Sbosticand may be released by 4948375Scael.Xr free . 5048375Scael.Pp 5148375ScaelThe 5248375Scael.Fn regexec 5348375Scaelfunction 5448375Scaelmatches a 5548375Scael.Dv NUL Ns -terminated 5648375Scael.Fa string 5748375Scaelagainst the compiled regular expression 5848375Scaelin 5948375Scael.Fa prog . 6041347SbosticIt returns 1 for success and 0 for failure, and adjusts the contents of 6148375Scael.Fa prog Ns 's 6248375Scael.Em startp 6348375Scaeland 6448375Scael.Em endp 6548375Scael(see below) accordingly. 6648375Scael.Pp 6741347SbosticThe members of a 6848375Scael.Xr regexp 6941347Sbosticstructure include at least the following (not necessarily in order): 7048375Scael.Bd -literal -offset indent 7141347Sbosticchar *startp[NSUBEXP]; 7241347Sbosticchar *endp[NSUBEXP]; 7348375Scael.Ed 7448375Scael.Pp 7541347Sbosticwhere 7648375Scael.Dv NSUBEXP 7741347Sbosticis defined (as 10) in the header file. 7848375ScaelOnce a successful 7948375Scael.Fn regexec 8048375Scaelhas been done using the 8148375Scael.Fn regexp , 8248375Scaeleach 8348375Scael.Em startp Ns - Em endp 8448375Scaelpair describes one substring 8548375Scaelwithin the 8648375Scael.Fa string , 8748375Scaelwith the 8848375Scael.Em startp 8948375Scaelpointing to the first character of the substring and 9048375Scaelthe 9148375Scael.Em endp 9248375Scaelpointing to the first character following the substring. 9348375ScaelThe 0th substring is the substring of 9448375Scael.Fa string 9548375Scaelthat matched the whole 9641347Sbosticregular expression. 9741347SbosticThe others are those substrings that matched parenthesized expressions 9841347Sbosticwithin the regular expression, with parenthesized expressions numbered 9941347Sbosticin left-to-right order of their opening parentheses. 10048375Scael.Pp 10148375ScaelThe 10248375Scael.Fn regsub 10348375Scaelfunction 10448375Scaelcopies 10548375Scael.Fa source 10648375Scaelto 10748375Scael.Fa dest , 10848375Scaelmaking substitutions according to the 10948375Scaelmost recent 11048375Scael.Fn regexec 11148375Scaelperformed using 11248375Scael.Fa prog . 11348375ScaelEach instance of `&' in 11448375Scael.Fa source 11548375Scaelis replaced by the substring 11648375Scaelindicated by 11748375Scael.Em startp Ns Bq 11848375Scaeland 11948375Scael.Em endp Ns Bq . 12048375ScaelEach instance of 12148375Scael.Sq \e Ns Em n , 12248375Scaelwhere 12348375Scael.Em n 12448375Scaelis a digit, is replaced by 12541347Sbosticthe substring indicated by 12648375Scael.Em startp Ns Bq Em n 12748375Scaeland 12848375Scael.Em endp Ns Bq Em n . 12948375ScaelTo get a literal `&' or 13048375Scael.Sq \e Ns Em n 13148375Scaelinto 13248375Scael.Fa dest , 13348375Scaelprefix it with `\e'; 13448375Scaelto get a literal `\e' preceding `&' or 13548375Scael.Sq \e Ns Em n , 13648375Scaelprefix it with 13741347Sbosticanother `\e'. 13848375Scael.Pp 13948375ScaelThe 14048375Scael.Fn regerror 14148375Scaelfunction 14248375Scaelis called whenever an error is detected in 14348375Scael.Fn regcomp , 14448375Scael.Fn regexec , 14548375Scaelor 14648375Scael.Fn regsub . 14748375ScaelThe default 14848375Scael.Fn regerror 14948375Scaelwrites the string 15048375Scael.Fa msg , 15141347Sbosticwith a suitable indicator of origin, 15241347Sbosticon the standard 15341347Sbosticerror output 15448375Scaeland invokes 15548375Scael.Xr exit 2 . 15648375ScaelThe 15748375Scael.Fn regerror 15848375Scaelfunction 15941347Sbosticcan be replaced by the user if other actions are desirable. 16048375Scael.Sh REGULAR EXPRESSION SYNTAX 16148375ScaelA regular expression is zero or more 16248375Scael.Em branches , 16348375Scaelseparated by `|'. 16441347SbosticIt matches anything that matches one of the branches. 16548375Scael.Pp 16648375ScaelA branch is zero or more 16748375Scael.Em pieces , 16848375Scaelconcatenated. 16941347SbosticIt matches a match for the first, followed by a match for the second, etc. 17048375Scael.Pp 17148375ScaelA piece is an 17248375Scael.Em atom 17348375Scaelpossibly followed by `*', `+', or `?'. 17441347SbosticAn atom followed by `*' matches a sequence of 0 or more matches of the atom. 17541347SbosticAn atom followed by `+' matches a sequence of 1 or more matches of the atom. 17641347SbosticAn atom followed by `?' matches a match of the atom, or the null string. 17748375Scael.Pp 17841347SbosticAn atom is a regular expression in parentheses (matching a match for the 17948375Scaelregular expression), a 18048375Scael.Em range 18148375Scael(see below), `.' 18241347Sbostic(matching any single character), `^' (matching the null string at the 18341347Sbosticbeginning of the input string), `$' (matching the null string at the 18441347Sbosticend of the input string), a `\e' followed by a single character (matching 18541347Sbosticthat character), or a single character with no other significance 18641347Sbostic(matching that character). 18748375Scael.Pp 18848375ScaelA 18948375Scael.Em range 19048375Scaelis a sequence of characters enclosed in `[]'. 19141347SbosticIt normally matches any single character from the sequence. 19241347SbosticIf the sequence begins with `^', 19348375Scaelit matches any single character 19448375Scael.Em not 19548375Scaelfrom the rest of the sequence. 19641347SbosticIf two characters in the sequence are separated by `\-', this is shorthand 19748375Scaelfor the full list of 19848375Scael.Tn ASCII 19948375Scaelcharacters between them 20041347Sbostic(e.g. `[0-9]' matches any decimal digit). 20141347SbosticTo include a literal `]' in the sequence, make it the first character 20241347Sbostic(following a possible `^'). 20341347SbosticTo include a literal `\-', make it the first or last character. 20448375Scael.Sh AMBIGUITY 20541347SbosticIf a regular expression could match two different parts of the input string, 20641347Sbosticit will match the one which begins earliest. 20748375ScaelIf both begin in the same place but match different lengths, or match 20841347Sbosticthe same length in different ways, life gets messier, as follows. 20948375Scael.Pp 21041347SbosticIn general, the possibilities in a list of branches are considered in 21141347Sbosticleft-to-right order, the possibilities for `*', `+', and `?' are 21241347Sbosticconsidered longest-first, nested constructs are considered from the 21341347Sbosticoutermost in, and concatenated constructs are considered leftmost-first. 21441347SbosticThe match that will be chosen is the one that uses the earliest 21541347Sbosticpossibility in the first choice that has to be made. 21641347SbosticIf there is more than one choice, the next will be made in the same manner 21741347Sbostic(earliest possibility) subject to the decision on the first choice. 21841347SbosticAnd so forth. 21948375Scael.Pp 22048375ScaelFor example, 22148375Scael.Sq Li (ab|a)b*c 22248375Scaelcould match 22348375Scael`abc' in one of two ways. 22441347SbosticThe first choice is between `ab' and `a'; since `ab' is earlier, and does 22541347Sbosticlead to a successful overall match, it is chosen. 22641347SbosticSince the `b' is already spoken for, 22741347Sbosticthe `b*' must match its last possibility\(emthe empty string\(emsince 22841347Sbosticit must respect the earlier choice. 22948375Scael.Pp 23041347SbosticIn the particular case where no `|'s are present and there is only one 23141347Sbostic`*', `+', or `?', the net effect is that the longest possible 23241347Sbosticmatch will be chosen. 23348375ScaelSo 23448375Scael.Sq Li ab* , 23548375Scaelpresented with `xabbbby', will match `abbbb'. 23648375ScaelNote that if 23748375Scael.Sq Li ab* , 23848375Scaelis tried against `xabyabbbz', it 23941347Sbosticwill match `ab' just after `x', due to the begins-earliest rule. 24041347Sbostic(In effect, the decision on where to start the match is the first choice 24141347Sbosticto be made, hence subsequent choices must respect it even if this leads them 24241347Sbosticto less-preferred alternatives.) 24348375Scael.Sh RETURN VALUES 24448375ScaelThe 24548375Scael.Fn regcomp 24648375Scaelfunction 24748375Scaelreturns 24848375Scael.Dv NULL 24948375Scaelfor a failure 25048375Scael.Pf ( Fn regerror 25148375Scaelpermitting), 25241347Sbosticwhere failures are syntax errors, exceeding implementation limits, 25341347Sbosticor applying `+' or `*' to a possibly-null operand. 25448375Scael.Sh SEE ALSO 25548375Scael.Xr ed 1 , 25648375Scael.Xr ex 1 , 25748375Scael.Xr expr 1 , 25848375Scael.Xr egrep 1 , 25948375Scael.Xr fgrep 1 , 26048375Scael.Xr grep 1 , 26148375Scael.Xr regex 3 26248375Scael.Sh HISTORY 26348375ScaelBoth code and manual page for 26448375Scael.Fn regcomp , 26548375Scael.Fn regexec , 26648375Scael.Fn regsub , 26748375Scaeland 26848375Scael.Fn regerror 26948375Scaelwere written at the University of Toronto 27048375Scaeland appeared in 27148375Scael.Bx 4.3 tahoe . 27248375ScaelThey are intended to be compatible with the Bell V8 27348375Scael.Xr regexp 3 , 27441347Sbosticbut are not derived from Bell code. 27548375Scael.Sh BUGS 27641347SbosticEmpty branches and empty regular expressions are not portable to V8. 27748375Scael.Pp 27841347SbosticThe restriction against 27941347Sbosticapplying `*' or `+' to a possibly-null operand is an artifact of the 28041347Sbosticsimplistic implementation. 28148375Scael.Pp 28248375ScaelDoes not support 28348375Scael.Xr egrep Ns 's 28448375Scaelnewline-separated branches; 28548375Scaelneither does the V8 28648375Scael.Xr regexp 3 , 28748375Scaelthough. 28848375Scael.Pp 28941347SbosticDue to emphasis on 29041347Sbosticcompactness and simplicity, 29141347Sbosticit's not strikingly fast. 29241347SbosticIt does give special attention to handling simple cases quickly. 293