xref: /minix3/lib/libc/regex/regex.3 (revision f14fb602092e015ff630df58e17c2a9cd57d29b3)
1*f14fb602SLionel Sambuc.\"	$NetBSD: regex.3,v 1.22 2011/05/17 03:35:38 enami Exp $
22fe8fb19SBen Gras.\"
3b7061124SArun Thomas.\" Copyright (c) 1992, 1993, 1994
4b7061124SArun Thomas.\"	The Regents of the University of California.  All rights reserved.
5b7061124SArun Thomas.\"
6b7061124SArun Thomas.\" This code is derived from software contributed to Berkeley by
7b7061124SArun Thomas.\" Henry Spencer.
8b7061124SArun Thomas.\"
9b7061124SArun Thomas.\" Redistribution and use in source and binary forms, with or without
10b7061124SArun Thomas.\" modification, are permitted provided that the following conditions
11b7061124SArun Thomas.\" are met:
12b7061124SArun Thomas.\" 1. Redistributions of source code must retain the above copyright
13b7061124SArun Thomas.\"    notice, this list of conditions and the following disclaimer.
14b7061124SArun Thomas.\" 2. Redistributions in binary form must reproduce the above copyright
15b7061124SArun Thomas.\"    notice, this list of conditions and the following disclaimer in the
16b7061124SArun Thomas.\"    documentation and/or other materials provided with the distribution.
172fe8fb19SBen Gras.\" 3. Neither the name of the University nor the names of its contributors
182fe8fb19SBen Gras.\"    may be used to endorse or promote products derived from this software
192fe8fb19SBen Gras.\"    without specific prior written permission.
202fe8fb19SBen Gras.\"
212fe8fb19SBen Gras.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
222fe8fb19SBen Gras.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
232fe8fb19SBen Gras.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
242fe8fb19SBen Gras.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
252fe8fb19SBen Gras.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
262fe8fb19SBen Gras.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
272fe8fb19SBen Gras.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
282fe8fb19SBen Gras.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
292fe8fb19SBen Gras.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
302fe8fb19SBen Gras.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
312fe8fb19SBen Gras.\" SUCH DAMAGE.
322fe8fb19SBen Gras.\"
332fe8fb19SBen Gras.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
342fe8fb19SBen Gras.\"
352fe8fb19SBen Gras.\" This code is derived from software contributed to Berkeley by
362fe8fb19SBen Gras.\" Henry Spencer.
372fe8fb19SBen Gras.\"
382fe8fb19SBen Gras.\" Redistribution and use in source and binary forms, with or without
392fe8fb19SBen Gras.\" modification, are permitted provided that the following conditions
402fe8fb19SBen Gras.\" are met:
412fe8fb19SBen Gras.\" 1. Redistributions of source code must retain the above copyright
422fe8fb19SBen Gras.\"    notice, this list of conditions and the following disclaimer.
432fe8fb19SBen Gras.\" 2. Redistributions in binary form must reproduce the above copyright
442fe8fb19SBen Gras.\"    notice, this list of conditions and the following disclaimer in the
452fe8fb19SBen Gras.\"    documentation and/or other materials provided with the distribution.
46b7061124SArun Thomas.\" 3. All advertising materials mentioning features or use of this software
47b7061124SArun Thomas.\"    must display the following acknowledgement:
48b7061124SArun Thomas.\"	This product includes software developed by the University of
49b7061124SArun Thomas.\"	California, Berkeley and its contributors.
50b7061124SArun Thomas.\" 4. Neither the name of the University nor the names of its contributors
51b7061124SArun Thomas.\"    may be used to endorse or promote products derived from this software
52b7061124SArun Thomas.\"    without specific prior written permission.
53b7061124SArun Thomas.\"
54b7061124SArun Thomas.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
55b7061124SArun Thomas.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
56b7061124SArun Thomas.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
57b7061124SArun Thomas.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
58b7061124SArun Thomas.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
59b7061124SArun Thomas.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
60b7061124SArun Thomas.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
61b7061124SArun Thomas.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
62b7061124SArun Thomas.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
63b7061124SArun Thomas.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
64b7061124SArun Thomas.\" SUCH DAMAGE.
65b7061124SArun Thomas.\"
66b7061124SArun Thomas.\"	@(#)regex.3	8.4 (Berkeley) 3/20/94
67b7061124SArun Thomas.\"
682fe8fb19SBen Gras.Dd December 29, 2003
692fe8fb19SBen Gras.Dt REGEX 3
702fe8fb19SBen Gras.Os
712fe8fb19SBen Gras.Sh NAME
722fe8fb19SBen Gras.Nm regex ,
732fe8fb19SBen Gras.Nm regcomp ,
742fe8fb19SBen Gras.Nm regexec ,
752fe8fb19SBen Gras.Nm regerror ,
762fe8fb19SBen Gras.Nm regfree
772fe8fb19SBen Gras.Nd regular-expression library
782fe8fb19SBen Gras.Sh LIBRARY
792fe8fb19SBen Gras.Lb libc
802fe8fb19SBen Gras.Sh SYNOPSIS
812fe8fb19SBen Gras.In regex.h
822fe8fb19SBen Gras.Ft int
832fe8fb19SBen Gras.Fn regcomp "regex_t * restrict preg" "const char * restrict pattern" "int cflags"
842fe8fb19SBen Gras.Ft int
852fe8fb19SBen Gras.Fn regexec "const regex_t * restrict preg" "const char * restrict string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
862fe8fb19SBen Gras.Ft size_t
872fe8fb19SBen Gras.Fn regerror "int errcode" "const regex_t * restrict preg" "char * restrict errbuf" "size_t errbuf_size"
882fe8fb19SBen Gras.Ft void
892fe8fb19SBen Gras.Fn regfree "regex_t *preg"
902fe8fb19SBen Gras.Sh DESCRIPTION
912fe8fb19SBen GrasThese routines implement
922fe8fb19SBen Gras.St -p1003.2-92
932fe8fb19SBen Grasregular expressions (``RE''s);
94b7061124SArun Thomassee
952fe8fb19SBen Gras.Xr re_format 7 .
962fe8fb19SBen Gras.Fn regcomp
97b7061124SArun Thomascompiles an RE written as a string into an internal form,
982fe8fb19SBen Gras.Fn regexec
99b7061124SArun Thomasmatches that internal form against a string and reports results,
1002fe8fb19SBen Gras.Fn regerror
101b7061124SArun Thomastransforms error codes from either into human-readable messages,
102b7061124SArun Thomasand
1032fe8fb19SBen Gras.Fn regfree
104b7061124SArun Thomasfrees any dynamically-allocated storage used by the internal form
105b7061124SArun Thomasof an RE.
1062fe8fb19SBen Gras.Pp
107b7061124SArun ThomasThe header
1082fe8fb19SBen Gras.In regex.h
109b7061124SArun Thomasdeclares two structure types,
1102fe8fb19SBen Gras.Fa regex_t
111b7061124SArun Thomasand
1122fe8fb19SBen Gras.Fa regmatch_t ,
113b7061124SArun Thomasthe former for compiled internal forms and the latter for match reporting.
114b7061124SArun ThomasIt also declares the four functions,
115b7061124SArun Thomasa type
1162fe8fb19SBen Gras.Fa regoff_t ,
117b7061124SArun Thomasand a number of constants with names starting with ``REG_''.
1182fe8fb19SBen Gras.Pp
1192fe8fb19SBen Gras.Fn regcomp
120b7061124SArun Thomascompiles the regular expression contained in the
1212fe8fb19SBen Gras.Fa pattern
122b7061124SArun Thomasstring,
123b7061124SArun Thomassubject to the flags in
1242fe8fb19SBen Gras.Fa cflags ,
125b7061124SArun Thomasand places the results in the
1262fe8fb19SBen Gras.Fa regex_t
127b7061124SArun Thomasstructure pointed to by
1282fe8fb19SBen Gras.Fa preg .
1292fe8fb19SBen Gras.Fa cflags
130b7061124SArun Thomasis the bitwise OR of zero or more of the following flags:
1312fe8fb19SBen Gras.Bl -tag -width XXXREG_EXTENDED
1322fe8fb19SBen Gras.It Dv REG_EXTENDED
1332fe8fb19SBen GrasCompile modern (``extended'') REs, rather than the obsolete
1342fe8fb19SBen Gras(``basic'') REs that are the default.
1352fe8fb19SBen Gras.It Dv REG_BASIC
136b7061124SArun ThomasThis is a synonym for 0,
137b7061124SArun Thomasprovided as a counterpart to REG_EXTENDED to improve readability.
1382fe8fb19SBen Gras.It Dv REG_NOSPEC
139b7061124SArun ThomasCompile with recognition of all special characters turned off.
1402fe8fb19SBen GrasAll characters are thus considered ordinary, so the ``RE'' is a literal
1412fe8fb19SBen Grasstring.
1422fe8fb19SBen GrasThis is an extension, compatible with but not specified by
1432fe8fb19SBen Gras.St -p1003.2-92 ,
1442fe8fb19SBen Grasand should be used with caution in software intended to be portable to
1452fe8fb19SBen Grasother systems.
1462fe8fb19SBen Gras.Dv REG_EXTENDED
1472fe8fb19SBen Grasand
1482fe8fb19SBen Gras.Dv REG_NOSPEC
1492fe8fb19SBen Grasmay not be used in the same call to
1502fe8fb19SBen Gras.Fn regcomp .
1512fe8fb19SBen Gras.It Dv REG_ICASE
152b7061124SArun ThomasCompile for matching that ignores upper/lower case distinctions.
153b7061124SArun ThomasSee
1542fe8fb19SBen Gras.Xr re_format 7 .
1552fe8fb19SBen Gras.It Dv REG_NOSUB
1562fe8fb19SBen GrasCompile for matching that need only report success or failure, not
1572fe8fb19SBen Graswhat was matched.
1582fe8fb19SBen Gras.It Dv REG_NEWLINE
159b7061124SArun ThomasCompile for newline-sensitive matching.
160b7061124SArun ThomasBy default, newline is a completely ordinary character with no special
161b7061124SArun Thomasmeaning in either REs or strings.
162b7061124SArun ThomasWith this flag,
163b7061124SArun Thomas`[^' bracket expressions and `.' never match newline,
164b7061124SArun Thomasa `^' anchor matches the null string after any newline in the string
165b7061124SArun Thomasin addition to its normal function,
166b7061124SArun Thomasand the `$' anchor matches the null string before any newline in the
167b7061124SArun Thomasstring in addition to its normal function.
1682fe8fb19SBen Gras.It Dv REG_PEND
1692fe8fb19SBen GrasThe regular expression ends, not at the first NUL, but just before the
1702fe8fb19SBen Grascharacter pointed to by the
1712fe8fb19SBen Gras.Fa re_endp
172b7061124SArun Thomasmember of the structure pointed to by
1732fe8fb19SBen Gras.Fa preg .
174b7061124SArun ThomasThe
1752fe8fb19SBen Gras.Fa re_endp
176b7061124SArun Thomasmember is of type
1772fe8fb19SBen Gras.Fa "const\ char\ *" .
1782fe8fb19SBen GrasThis flag permits inclusion of NULs in the RE; they are considered
1792fe8fb19SBen Grasordinary characters.
1802fe8fb19SBen GrasThis is an extension, compatible with but not specified by
1812fe8fb19SBen Gras.St -p1003.2-92 ,
1822fe8fb19SBen Grasand should be used with caution in software intended to be portable to
1832fe8fb19SBen Grasother systems.
1842fe8fb19SBen Gras.El
1852fe8fb19SBen Gras.Pp
186b7061124SArun ThomasWhen successful,
1872fe8fb19SBen Gras.Fn regcomp
188b7061124SArun Thomasreturns 0 and fills in the structure pointed to by
1892fe8fb19SBen Gras.Fa preg .
1902fe8fb19SBen GrasOne member of that structure (other than
1912fe8fb19SBen Gras.Fa re_endp )
192b7061124SArun Thomasis publicized:
1932fe8fb19SBen Gras.Fa re_nsub ,
194b7061124SArun Thomasof type
1952fe8fb19SBen Gras.Fa size_t ,
196b7061124SArun Thomascontains the number of parenthesized subexpressions within the RE
197b7061124SArun Thomas(except that the value of this member is undefined if the
1982fe8fb19SBen Gras.Dv REG_NOSUB
1992fe8fb19SBen Grasflag was used).
200b7061124SArun ThomasIf
2012fe8fb19SBen Gras.Fn regcomp
202b7061124SArun Thomasfails, it returns a non-zero error code;
2032fe8fb19SBen Grassee
2042fe8fb19SBen Gras.Sx DIAGNOSTICS .
2052fe8fb19SBen Gras.Pp
2062fe8fb19SBen Gras.Fn regexec
207b7061124SArun Thomasmatches the compiled RE pointed to by
2082fe8fb19SBen Gras.Fa preg
209b7061124SArun Thomasagainst the
2102fe8fb19SBen Gras.Fa string ,
211b7061124SArun Thomassubject to the flags in
2122fe8fb19SBen Gras.Fa eflags ,
213b7061124SArun Thomasand reports results using
2142fe8fb19SBen Gras.Fa nmatch ,
2152fe8fb19SBen Gras.Fa pmatch ,
216b7061124SArun Thomasand the returned value.
217b7061124SArun ThomasThe RE must have been compiled by a previous invocation of
2182fe8fb19SBen Gras.Fn regcomp .
219b7061124SArun ThomasThe compiled form is not altered during execution of
2202fe8fb19SBen Gras.Fn regexec ,
221b7061124SArun Thomasso a single compiled RE can be used simultaneously by multiple threads.
2222fe8fb19SBen Gras.Pp
223b7061124SArun ThomasBy default,
224b7061124SArun Thomasthe NUL-terminated string pointed to by
2252fe8fb19SBen Gras.Fa string
226b7061124SArun Thomasis considered to be the text of an entire line, minus any terminating
227b7061124SArun Thomasnewline.
228b7061124SArun ThomasThe
2292fe8fb19SBen Gras.Fa eflags
230b7061124SArun Thomasargument is the bitwise OR of zero or more of the following flags:
2312fe8fb19SBen Gras.Bl -tag -width XXXREG_NOTBOL
2322fe8fb19SBen Gras.It Dv REG_NOTBOL
2332fe8fb19SBen GrasThe first character of the string
234b7061124SArun Thomasis not the beginning of a line, so the `^' anchor should not match before it.
2352fe8fb19SBen GrasThis does not affect the behavior of newlines under
2362fe8fb19SBen Gras.Dv REG_NEWLINE .
2372fe8fb19SBen Gras.It Dv REG_NOTEOL
2382fe8fb19SBen GrasThe NUL terminating the string does not end a line, so the `$' anchor
2392fe8fb19SBen Grasshould not match before it.
2402fe8fb19SBen GrasThis does not affect the behavior of newlines under
2412fe8fb19SBen Gras.Dv REG_NEWLINE .
2422fe8fb19SBen Gras.It Dv REG_STARTEND
243b7061124SArun ThomasThe string is considered to start at
2442fe8fb19SBen Gras.Fa string
2452fe8fb19SBen Gras+
2462fe8fb19SBen Gras.Fa pmatch[0].rm_so
247b7061124SArun Thomasand to have a terminating NUL located at
2482fe8fb19SBen Gras.Fa string
2492fe8fb19SBen Gras+
2502fe8fb19SBen Gras.Fa pmatch[0].rm_eo
251b7061124SArun Thomas(there need not actually be a NUL at that location),
252b7061124SArun Thomasregardless of the value of
2532fe8fb19SBen Gras.Fa nmatch .
254b7061124SArun ThomasSee below for the definition of
2552fe8fb19SBen Gras.Fa pmatch
256b7061124SArun Thomasand
2572fe8fb19SBen Gras.Fa nmatch .
2582fe8fb19SBen GrasThis is an extension, compatible with but not specified by
2592fe8fb19SBen Gras.St -p1003.2-92 ,
2602fe8fb19SBen Grasand should be used with caution in software intended to be portable to
2612fe8fb19SBen Grasother systems.
2622fe8fb19SBen GrasNote that a non-zero
2632fe8fb19SBen Gras.Fa rm_so
2642fe8fb19SBen Grasdoes not imply
2652fe8fb19SBen Gras.Dv REG_NOTBOL ;
2662fe8fb19SBen Gras.Dv REG_STARTEND
2672fe8fb19SBen Grasaffects only the location of the string, not how it is matched.
2682fe8fb19SBen Gras.El
2692fe8fb19SBen Gras.Pp
270b7061124SArun ThomasSee
2712fe8fb19SBen Gras.Xr re_format 7
272b7061124SArun Thomasfor a discussion of what is matched in situations where an RE or a
273b7061124SArun Thomasportion thereof could match any of several substrings of
2742fe8fb19SBen Gras.Fa string .
2752fe8fb19SBen Gras.Pp
276b7061124SArun ThomasNormally,
2772fe8fb19SBen Gras.Fn regexec
2782fe8fb19SBen Grasreturns 0 for success and the non-zero code
2792fe8fb19SBen Gras.Dv REG_NOMATCH
2802fe8fb19SBen Grasfor failure.
281b7061124SArun ThomasOther non-zero error codes may be returned in exceptional situations;
2822fe8fb19SBen Grassee
2832fe8fb19SBen Gras.Sx DIAGNOSTICS .
2842fe8fb19SBen Gras.Pp
2852fe8fb19SBen GrasIf
2862fe8fb19SBen Gras.Dv REG_NOSUB
2872fe8fb19SBen Graswas specified in the compilation of the RE, or if
2882fe8fb19SBen Gras.Fa nmatch
289b7061124SArun Thomasis 0,
2902fe8fb19SBen Gras.Fn regexec
291b7061124SArun Thomasignores the
2922fe8fb19SBen Gras.Fa pmatch
2932fe8fb19SBen Grasargument (but see below for the case where
2942fe8fb19SBen Gras.Dv REG_STARTEND
2952fe8fb19SBen Grasis specified).
296b7061124SArun ThomasOtherwise,
2972fe8fb19SBen Gras.Fa pmatch
298b7061124SArun Thomaspoints to an array of
2992fe8fb19SBen Gras.Fa nmatch
300b7061124SArun Thomasstructures of type
3012fe8fb19SBen Gras.Fa regmatch_t .
302b7061124SArun ThomasSuch a structure has at least the members
3032fe8fb19SBen Gras.Fa rm_so
304b7061124SArun Thomasand
3052fe8fb19SBen Gras.Fa rm_eo ,
306b7061124SArun Thomasboth of type
3072fe8fb19SBen Gras.Fa regoff_t
308b7061124SArun Thomas(a signed arithmetic type at least as large as an
3092fe8fb19SBen Gras.Fa off_t
310b7061124SArun Thomasand a
3112fe8fb19SBen Gras.Fa ssize_t ) ,
312b7061124SArun Thomascontaining respectively the offset of the first character of a substring
313b7061124SArun Thomasand the offset of the first character after the end of the substring.
314b7061124SArun ThomasOffsets are measured from the beginning of the
3152fe8fb19SBen Gras.Fa string
316b7061124SArun Thomasargument given to
3172fe8fb19SBen Gras.Fn regexec .
318b7061124SArun ThomasAn empty substring is denoted by equal offsets,
319b7061124SArun Thomasboth indicating the character following the empty substring.
3202fe8fb19SBen Gras.Pp
321b7061124SArun ThomasThe 0th member of the
3222fe8fb19SBen Gras.Fa pmatch
323b7061124SArun Thomasarray is filled in to indicate what substring of
3242fe8fb19SBen Gras.Fa string
325b7061124SArun Thomaswas matched by the entire RE.
326b7061124SArun ThomasRemaining members report what substring was matched by parenthesized
327b7061124SArun Thomassubexpressions within the RE;
328b7061124SArun Thomasmember
3292fe8fb19SBen Gras.Fa i
330b7061124SArun Thomasreports subexpression
3312fe8fb19SBen Gras.Fa i ,
3322fe8fb19SBen Graswith subexpressions counted (starting at 1) by the order of their
3332fe8fb19SBen Grasopening parentheses in the RE, left to right.
334b7061124SArun ThomasUnused entries in the array\(emcorresponding either to subexpressions that
335b7061124SArun Thomasdid not participate in the match at all, or to subexpressions that do not
3362fe8fb19SBen Grasexist in the RE (that is,
3372fe8fb19SBen Gras.Fa i
3382fe8fb19SBen Gras\*[Gt]
3392fe8fb19SBen Gras.Fa preg-\*[Gt]re_nsub )
3402fe8fb19SBen Gras\(emhave both
3412fe8fb19SBen Gras.Fa rm_so
342b7061124SArun Thomasand
3432fe8fb19SBen Gras.Fa rm_eo
3442fe8fb19SBen Grasset to -1.
345b7061124SArun ThomasIf a subexpression participated in the match several times,
346b7061124SArun Thomasthe reported substring is the last one it matched.
347b7061124SArun Thomas(Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
348b7061124SArun Thomasthe parenthesized subexpression matches each of the three `b's and then
349b7061124SArun Thomasan infinite number of empty strings following the last `b',
350b7061124SArun Thomasso the reported substring is one of the empties.)
3512fe8fb19SBen Gras.Pp
3522fe8fb19SBen GrasIf
3532fe8fb19SBen Gras.Dv REG_STARTEND
3542fe8fb19SBen Grasis specified,
3552fe8fb19SBen Gras.Fa pmatch
356b7061124SArun Thomasmust point to at least one
3572fe8fb19SBen Gras.Fa regmatch_t
358b7061124SArun Thomas(even if
3592fe8fb19SBen Gras.Fa nmatch
3602fe8fb19SBen Grasis 0 or
3612fe8fb19SBen Gras.Dv REG_NOSUB
3622fe8fb19SBen Graswas specified),
3632fe8fb19SBen Grasto hold the input offsets for
3642fe8fb19SBen Gras.Dv REG_STARTEND .
365b7061124SArun ThomasUse for output is still entirely controlled by
3662fe8fb19SBen Gras.Fa nmatch ;
367b7061124SArun Thomasif
3682fe8fb19SBen Gras.Fa nmatch
3692fe8fb19SBen Grasis 0 or
3702fe8fb19SBen Gras.Dv REG_NOSUB
3712fe8fb19SBen Graswas specified,
372b7061124SArun Thomasthe value of
3732fe8fb19SBen Gras.Fa pmatch [0]
374b7061124SArun Thomaswill not be changed by a successful
3752fe8fb19SBen Gras.Fn regexec .
3762fe8fb19SBen Gras.Pp
3772fe8fb19SBen Gras.Fn regerror
378b7061124SArun Thomasmaps a non-zero
3792fe8fb19SBen Gras.Fa errcode
380b7061124SArun Thomasfrom either
3812fe8fb19SBen Gras.Fn regcomp
382b7061124SArun Thomasor
3832fe8fb19SBen Gras.Fn regexec
384b7061124SArun Thomasto a human-readable, printable message.
385b7061124SArun ThomasIf
3862fe8fb19SBen Gras.Fa preg
387b7061124SArun Thomasis non-NULL,
3882fe8fb19SBen Grasthe error code should have arisen from use of the
3892fe8fb19SBen Gras.Fa regex_t
390b7061124SArun Thomaspointed to by
3912fe8fb19SBen Gras.Fa preg ,
392b7061124SArun Thomasand if the error code came from
3932fe8fb19SBen Gras.Fn regcomp ,
394b7061124SArun Thomasit should have been the result from the most recent
3952fe8fb19SBen Gras.Fn regcomp
396b7061124SArun Thomasusing that
397*f14fb602SLionel Sambuc.Fa regex_t .
398*f14fb602SLionel Sambuc.Po Fn regerror
399b7061124SArun Thomasmay be able to supply a more detailed message using information
400b7061124SArun Thomasfrom the
401*f14fb602SLionel Sambuc.Fa regex_t . Pc
4022fe8fb19SBen Gras.Fn regerror
403b7061124SArun Thomasplaces the NUL-terminated message into the buffer pointed to by
4042fe8fb19SBen Gras.Fa errbuf ,
405b7061124SArun Thomaslimiting the length (including the NUL) to at most
4062fe8fb19SBen Gras.Fa errbuf_size
407b7061124SArun Thomasbytes.
408b7061124SArun ThomasIf the whole message won't fit,
409b7061124SArun Thomasas much of it as will fit before the terminating NUL is supplied.
410b7061124SArun ThomasIn any case,
411b7061124SArun Thomasthe returned value is the size of buffer needed to hold the whole
412b7061124SArun Thomasmessage (including terminating NUL).
413b7061124SArun ThomasIf
4142fe8fb19SBen Gras.Fa errbuf_size
415b7061124SArun Thomasis 0,
4162fe8fb19SBen Gras.Fa errbuf
417b7061124SArun Thomasis ignored but the return value is still correct.
4182fe8fb19SBen Gras.Pp
419b7061124SArun ThomasIf the
4202fe8fb19SBen Gras.Fa errcode
421b7061124SArun Thomasgiven to
4222fe8fb19SBen Gras.Fn regerror
4232fe8fb19SBen Grasis first ORed with
4242fe8fb19SBen Gras.Dv REG_ITOA ,
425b7061124SArun Thomasthe ``message'' that results is the printable name of the error code,
426b7061124SArun Thomase.g. ``REG_NOMATCH'',
427b7061124SArun Thomasrather than an explanation thereof.
428b7061124SArun ThomasIf
4292fe8fb19SBen Gras.Fa errcode
4302fe8fb19SBen Grasis
4312fe8fb19SBen Gras.Dv REG_ATOI ,
432b7061124SArun Thomasthen
4332fe8fb19SBen Gras.Fa preg
434b7061124SArun Thomasshall be non-NULL and the
4352fe8fb19SBen Gras.Fa re_endp
436b7061124SArun Thomasmember of the structure it points to
437b7061124SArun Thomasmust point to the printable name of an error code;
438b7061124SArun Thomasin this case, the result in
4392fe8fb19SBen Gras.Fa errbuf
440b7061124SArun Thomasis the decimal digits of
441b7061124SArun Thomasthe numeric value of the error code
442b7061124SArun Thomas(0 if the name is not recognized).
4432fe8fb19SBen Gras.Dv REG_ITOA
4442fe8fb19SBen Grasand
4452fe8fb19SBen Gras.Dv REG_ATOI
4462fe8fb19SBen Grasare intended primarily as debugging facilities;
4472fe8fb19SBen Grasthey are extensions, compatible with but not specified by
4482fe8fb19SBen Gras.St -p1003.2-92 ,
4492fe8fb19SBen Grasand should be used with caution in software intended to be portable to
4502fe8fb19SBen Grasother systems.
451b7061124SArun ThomasBe warned also that they are considered experimental and changes are possible.
4522fe8fb19SBen Gras.Pp
4532fe8fb19SBen Gras.Fn regfree
454b7061124SArun Thomasfrees any dynamically-allocated storage associated with the compiled RE
455b7061124SArun Thomaspointed to by
4562fe8fb19SBen Gras.Fa preg .
457b7061124SArun ThomasThe remaining
4582fe8fb19SBen Gras.Fa regex_t
459b7061124SArun Thomasis no longer a valid compiled RE
460b7061124SArun Thomasand the effect of supplying it to
4612fe8fb19SBen Gras.Fn regexec
462b7061124SArun Thomasor
4632fe8fb19SBen Gras.Fn regerror
464b7061124SArun Thomasis undefined.
4652fe8fb19SBen Gras.Pp
466b7061124SArun ThomasNone of these functions references global variables except for tables
467b7061124SArun Thomasof constants;
468b7061124SArun Thomasall are safe for use from multiple threads if the arguments are safe.
4692fe8fb19SBen Gras.Sh IMPLEMENTATION CHOICES
4702fe8fb19SBen GrasThere are a number of decisions that
4712fe8fb19SBen Gras.St -p1003.2-92
4722fe8fb19SBen Grasleaves up to the implementor,
473b7061124SArun Thomaseither by explicitly saying ``undefined'' or by virtue of them being
474b7061124SArun Thomasforbidden by the RE grammar.
475b7061124SArun ThomasThis implementation treats them as follows.
4762fe8fb19SBen Gras.Pp
477b7061124SArun ThomasSee
4782fe8fb19SBen Gras.Xr re_format 7
479b7061124SArun Thomasfor a discussion of the definition of case-independent matching.
4802fe8fb19SBen Gras.Pp
481b7061124SArun ThomasThere is no particular limit on the length of REs,
482b7061124SArun Thomasexcept insofar as memory is limited.
483b7061124SArun ThomasMemory usage is approximately linear in RE size, and largely insensitive
484b7061124SArun Thomasto RE complexity, except for bounded repetitions.
485b7061124SArun ThomasSee BUGS for one short RE using them
486b7061124SArun Thomasthat will run almost any system out of memory.
4872fe8fb19SBen Gras.Pp
488b7061124SArun ThomasA backslashed character other than one specifically given a magic meaning
4892fe8fb19SBen Grasby
4902fe8fb19SBen Gras.St -p1003.2-92
4912fe8fb19SBen Gras(such magic meanings occur only in obsolete [``basic''] REs)
492b7061124SArun Thomasis taken as an ordinary character.
4932fe8fb19SBen Gras.Pp
4942fe8fb19SBen GrasAny unmatched [ is a
4952fe8fb19SBen Gras.Dv REG_EBRACK
4962fe8fb19SBen Graserror.
4972fe8fb19SBen Gras.Pp
498b7061124SArun ThomasEquivalence classes cannot begin or end bracket-expression ranges.
499b7061124SArun ThomasThe endpoint of one range cannot begin another.
5002fe8fb19SBen Gras.Pp
5012fe8fb19SBen Gras.Dv RE_DUP_MAX ,
5022fe8fb19SBen Grasthe limit on repetition counts in bounded repetitions, is 255.
5032fe8fb19SBen Gras.Pp
504b7061124SArun ThomasA repetition operator (?, *, +, or bounds) cannot follow another
505b7061124SArun Thomasrepetition operator.
506b7061124SArun ThomasA repetition operator cannot begin an expression or subexpression
507b7061124SArun Thomasor follow `^' or `|'.
5082fe8fb19SBen Gras.Pp
509b7061124SArun Thomas`|' cannot appear first or last in a (sub)expression or after another `|',
510b7061124SArun Thomasi.e. an operand of `|' cannot be an empty subexpression.
511b7061124SArun ThomasAn empty parenthesized subexpression, `()', is legal and matches an
512b7061124SArun Thomasempty (sub)string.
513b7061124SArun ThomasAn empty string is not a legal RE.
5142fe8fb19SBen Gras.Pp
515b7061124SArun ThomasA `{' followed by a digit is considered the beginning of bounds for a
516b7061124SArun Thomasbounded repetition, which must then follow the syntax for bounds.
5172fe8fb19SBen GrasA `{'
5182fe8fb19SBen Gras.Em not
5192fe8fb19SBen Grasfollowed by a digit is considered an ordinary character.
5202fe8fb19SBen Gras.Pp
521b7061124SArun Thomas`^' and `$' beginning and ending subexpressions in obsolete (``basic'')
522b7061124SArun ThomasREs are anchors, not ordinary characters.
5232fe8fb19SBen Gras.Sh DIAGNOSTICS
5242fe8fb19SBen GrasNon-zero error codes from
5252fe8fb19SBen Gras.Fn regcomp
5262fe8fb19SBen Grasand
5272fe8fb19SBen Gras.Fn regexec
5282fe8fb19SBen Grasinclude the following:
5292fe8fb19SBen Gras.Pp
5302fe8fb19SBen Gras.Bl -tag -width XXXREG_ECOLLATE -compact
5312fe8fb19SBen Gras.It Dv REG_NOMATCH
5322fe8fb19SBen Gras.Fn regexec
5332fe8fb19SBen Grasfailed to match
5342fe8fb19SBen Gras.It Dv REG_BADPAT
5352fe8fb19SBen Grasinvalid regular expression
5362fe8fb19SBen Gras.It Dv REG_ECOLLATE
5372fe8fb19SBen Grasinvalid collating element
5382fe8fb19SBen Gras.It Dv REG_ECTYPE
5392fe8fb19SBen Grasinvalid character class
5402fe8fb19SBen Gras.It Dv REG_EESCAPE
5412fe8fb19SBen Gras\e applied to unescapable character
5422fe8fb19SBen Gras.It Dv REG_ESUBREG
5432fe8fb19SBen Grasinvalid backreference number
5442fe8fb19SBen Gras.It Dv REG_EBRACK
5452fe8fb19SBen Grasbrackets [ ] not balanced
5462fe8fb19SBen Gras.It Dv REG_EPAREN
5472fe8fb19SBen Grasparentheses ( ) not balanced
5482fe8fb19SBen Gras.It Dv REG_EBRACE
5492fe8fb19SBen Grasbraces { } not balanced
5502fe8fb19SBen Gras.It Dv REG_BADBR
5512fe8fb19SBen Grasinvalid repetition count(s) in { }
5522fe8fb19SBen Gras.It Dv REG_ERANGE
5532fe8fb19SBen Grasinvalid character range in [ ]
5542fe8fb19SBen Gras.It Dv REG_ESPACE
5552fe8fb19SBen Grasran out of memory
5562fe8fb19SBen Gras.It Dv REG_BADRPT
5572fe8fb19SBen Gras?, *, or + operand invalid
5582fe8fb19SBen Gras.It Dv REG_EMPTY
5592fe8fb19SBen Grasempty (sub)expression
5602fe8fb19SBen Gras.It Dv REG_ASSERT
5612fe8fb19SBen Gras``can't happen''\(emyou found a bug
5622fe8fb19SBen Gras.It Dv REG_INVARG
5632fe8fb19SBen Grasinvalid argument, e.g. negative-length string
5642fe8fb19SBen Gras.El
5652fe8fb19SBen Gras.Sh SEE ALSO
5662fe8fb19SBen Gras.Xr grep 1 ,
5672fe8fb19SBen Gras.Xr sed 1 ,
5682fe8fb19SBen Gras.Xr re_format 7
5692fe8fb19SBen Gras.Pp
5702fe8fb19SBen Gras.St -p1003.2-92 ,
5712fe8fb19SBen Grassections 2.8 (Regular Expression Notation)
572b7061124SArun Thomasand
573b7061124SArun ThomasB.5 (C Binding for Regular Expression Matching).
5742fe8fb19SBen Gras.Sh HISTORY
575b7061124SArun ThomasOriginally written by Henry Spencer.
5762fe8fb19SBen GrasAltered for inclusion in the
5772fe8fb19SBen Gras.Bx 4.4
5782fe8fb19SBen Grasdistribution.
5792fe8fb19SBen Gras.Sh BUGS
580b7061124SArun ThomasThere is one known functionality bug.
581b7061124SArun ThomasThe implementation of internationalization is incomplete:
5822fe8fb19SBen Grasthe locale is always assumed to be the default one of
5832fe8fb19SBen Gras.St -p1003.2-92 ,
584b7061124SArun Thomasand only the collating elements etc. of that locale are available.
5852fe8fb19SBen Gras.Pp
586b7061124SArun ThomasThe back-reference code is subtle and doubts linger about its correctness
587b7061124SArun Thomasin complex cases.
5882fe8fb19SBen Gras.Pp
5892fe8fb19SBen Gras.Fn regexec
590b7061124SArun Thomasperformance is poor.
591b7061124SArun ThomasThis will improve with later releases.
5922fe8fb19SBen Gras.Fa nmatch
593b7061124SArun Thomasexceeding 0 is expensive;
5942fe8fb19SBen Gras.Fa nmatch
595b7061124SArun Thomasexceeding 1 is worse.
5962fe8fb19SBen Gras.Fa regexec
5972fe8fb19SBen Grasis largely insensitive to RE complexity
5982fe8fb19SBen Gras.Em except
5992fe8fb19SBen Grasthat back references are massively expensive.
600b7061124SArun ThomasRE length does matter; in particular, there is a strong speed bonus
601b7061124SArun Thomasfor keeping RE length under about 30 characters,
602b7061124SArun Thomaswith most special characters counting roughly double.
6032fe8fb19SBen Gras.Pp
6042fe8fb19SBen Gras.Fn regcomp
605b7061124SArun Thomasimplements bounded repetitions by macro expansion,
606b7061124SArun Thomaswhich is costly in time and space if counts are large
607b7061124SArun Thomasor bounded repetitions are nested.
608b7061124SArun ThomasAn RE like, say,
609b7061124SArun Thomas`((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
610b7061124SArun Thomaswill (eventually) run almost any existing machine out of swap space.
6112fe8fb19SBen Gras.Pp
612b7061124SArun ThomasThere are suspected problems with response to obscure error conditions.
613b7061124SArun ThomasNotably,
614b7061124SArun Thomascertain kinds of internal overflow,
615b7061124SArun Thomasproduced only by truly enormous REs or by multiply nested bounded repetitions,
616b7061124SArun Thomasare probably not handled well.
6172fe8fb19SBen Gras.Pp
6182fe8fb19SBen GrasDue to a mistake in
6192fe8fb19SBen Gras.St -p1003.2-92 ,
6202fe8fb19SBen Grasthings like `a)b' are legal REs because `)' is a special character
6212fe8fb19SBen Grasonly in the presence of a previous unmatched `('.
622b7061124SArun ThomasThis can't be fixed until the spec is fixed.
6232fe8fb19SBen Gras.Pp
624b7061124SArun ThomasThe standard's definition of back references is vague.
625b7061124SArun ThomasFor example, does
626b7061124SArun Thomas`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
6272fe8fb19SBen GrasUntil the standard is clarified, behavior in such cases should not be
6282fe8fb19SBen Grasrelied on.
6292fe8fb19SBen Gras.Pp
630b7061124SArun ThomasThe implementation of word-boundary matching is a bit of a kludge,
631b7061124SArun Thomasand bugs may lurk in combinations of word-boundary matching and anchoring.
632