xref: /openbsd-src/lib/libc/regex/regex.3 (revision d32639f6ddc06f615d8e07a47d97091b3531a48f)
1*d32639f6Sjmc.\"	$OpenBSD: regex.3,v 1.30 2022/09/11 06:38:10 jmc Exp $
25f2a12f2Sflipk.\"
35f2a12f2Sflipk.\" Copyright (c) 1997, Phillip F Knaack. All rights reserved.
4df930be7Sderaadt.\"
5df930be7Sderaadt.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
6df930be7Sderaadt.\" Copyright (c) 1992, 1993, 1994
7df930be7Sderaadt.\"	The Regents of the University of California.  All rights reserved.
8df930be7Sderaadt.\"
9df930be7Sderaadt.\" This code is derived from software contributed to Berkeley by
10df930be7Sderaadt.\" Henry Spencer.
11df930be7Sderaadt.\"
12df930be7Sderaadt.\" Redistribution and use in source and binary forms, with or without
13df930be7Sderaadt.\" modification, are permitted provided that the following conditions
14df930be7Sderaadt.\" are met:
15df930be7Sderaadt.\" 1. Redistributions of source code must retain the above copyright
16df930be7Sderaadt.\"    notice, this list of conditions and the following disclaimer.
17df930be7Sderaadt.\" 2. Redistributions in binary form must reproduce the above copyright
18df930be7Sderaadt.\"    notice, this list of conditions and the following disclaimer in the
19df930be7Sderaadt.\"    documentation and/or other materials provided with the distribution.
206580fee3Smillert.\" 3. Neither the name of the University nor the names of its contributors
21df930be7Sderaadt.\"    may be used to endorse or promote products derived from this software
22df930be7Sderaadt.\"    without specific prior written permission.
23df930be7Sderaadt.\"
24df930be7Sderaadt.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
25df930be7Sderaadt.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26df930be7Sderaadt.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27df930be7Sderaadt.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
28df930be7Sderaadt.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
29df930be7Sderaadt.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
30df930be7Sderaadt.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
31df930be7Sderaadt.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
32df930be7Sderaadt.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
33df930be7Sderaadt.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34df930be7Sderaadt.\" SUCH DAMAGE.
35df930be7Sderaadt.\"
36442a6afdSmillert.\"	@(#)regex.3	8.4 (Berkeley) 3/20/94
37442a6afdSmillert.\"
38*d32639f6Sjmc.Dd $Mdocdate: September 11 2022 $
39d04ba2ccSjmc.Dt REGEXEC 3
40fc8533a3Saaron.Os
415f2a12f2Sflipk.Sh NAME
42ecb90728Saaron.Nm regcomp ,
43ecb90728Saaron.Nm regexec ,
44ecb90728Saaron.Nm regerror ,
45ecb90728Saaron.Nm regfree
46c1899ebbSjmc.Nd regular expression routines
475f2a12f2Sflipk.Sh SYNOPSIS
4864d4e987Stedu.In sys/types.h
4964d4e987Stedu.In regex.h
50d1ce6025Swcobb.Ft int
51d1ce6025Swcobb.Fn regcomp "regex_t *preg" "const char *pattern" "int cflags"
52d1ce6025Swcobb.Pp
53d1ce6025Swcobb.Ft int
54d1ce6025Swcobb.Fn regexec "const regex_t *preg" "const char *string" "size_t nmatch" \
55d1ce6025Swcobb            "regmatch_t pmatch[]" "int eflags"
56d1ce6025Swcobb.Pp
57d1ce6025Swcobb.Ft size_t
58d1ce6025Swcobb.Fn regerror "int errcode" "const regex_t *preg" "char *errbuf" \
59d1ce6025Swcobb             "size_t errbuf_size"
60d1ce6025Swcobb.Pp
61d1ce6025Swcobb.Ft void
62d1ce6025Swcobb.Fn regfree "regex_t *preg"
635f2a12f2Sflipk.Sh DESCRIPTION
64d1ce6025SwcobbThese routines implement
65d1ce6025Swcobb.St -p1003.2
66d1ce6025Swcobbregular expressions
6702cdb9c2Saaron.Pq Dq REs ;
68df930be7Sderaadtsee
695f2a12f2Sflipk.Xr re_format 7 .
70960f8fbdSderaadt.Fn regcomp
71df930be7Sderaadtcompiles an RE written as a string into an internal form,
725f2a12f2Sflipk.Fn regexec
73df930be7Sderaadtmatches that internal form against a string and reports results,
745f2a12f2Sflipk.Fn regerror
755f2a12f2Sflipktransforms error codes from either into human-readable messages, and
765f2a12f2Sflipk.Fn regfree
77892a7bb8Saaronfrees any dynamically allocated storage used by the internal form
78df930be7Sderaadtof an RE.
795f2a12f2Sflipk.Pp
80df930be7SderaadtThe header
81369bef3aSschwarze.In regex.h
82df930be7Sderaadtdeclares two structure types,
83*d32639f6Sjmc.Vt regex_t
84df930be7Sderaadtand
85*d32639f6Sjmc.Vt regmatch_t ,
86df930be7Sderaadtthe former for compiled internal forms and the latter for match reporting.
87df930be7SderaadtIt also declares the four functions,
88df930be7Sderaadta type
89*d32639f6Sjmc.Vt regoff_t ,
9002cdb9c2Saaronand a number of constants with names starting with
9102cdb9c2Saaron.Dv REG_ .
925f2a12f2Sflipk.Pp
93960f8fbdSderaadt.Fn regcomp
94df930be7Sderaadtcompiles the regular expression contained in the
955f2a12f2Sflipk.Fa pattern
96df930be7Sderaadtstring,
97df930be7Sderaadtsubject to the flags in
985f2a12f2Sflipk.Fa cflags ,
99df930be7Sderaadtand places the results in the
100*d32639f6Sjmc.Vt regex_t
101df930be7Sderaadtstructure pointed to by
1025f2a12f2Sflipk.Fa preg .
10313d73fe3SguentherThe
104960f8fbdSderaadt.Fa cflags
1051e49e6c5Sschwarzeargument is the bitwise OR of zero or more of the following values:
1065f2a12f2Sflipk.Bl -tag -width XREG_EXTENDEDX
10702cdb9c2Saaron.It Dv REG_EXTENDED
10802cdb9c2SaaronCompile modern
10902cdb9c2Saaron.Pq Dq extended
11002cdb9c2SaaronREs,
11102cdb9c2Saaronrather than the obsolete
11202cdb9c2Saaron.Pq Dq basic
11302cdb9c2SaaronREs that are the default.
11402cdb9c2Saaron.It Dv REG_BASIC
115df930be7SderaadtThis is a synonym for 0,
11602cdb9c2Saaronprovided as a counterpart to
11702cdb9c2Saaron.Dv REG_EXTENDED
11802cdb9c2Saaronto improve readability.
11902cdb9c2Saaron.It Dv REG_NOSPEC
120df930be7SderaadtCompile with recognition of all special characters turned off.
121df930be7SderaadtAll characters are thus considered ordinary,
12202cdb9c2Saaronso the RE is a literal string.
123df930be7SderaadtThis is an extension,
124d1ce6025Swcobbcompatible with but not specified by
125d1ce6025Swcobb.St -p1003.2 ,
126df930be7Sderaadtand should be used with
127df930be7Sderaadtcaution in software intended to be portable to other systems.
12802cdb9c2Saaron.Dv REG_EXTENDED
12902cdb9c2Saaronand
13002cdb9c2Saaron.Dv REG_NOSPEC
13102cdb9c2Saaronmay not be used in the same call to
1325f2a12f2Sflipk.Fn regcomp .
13302cdb9c2Saaron.It Dv REG_ICASE
134df930be7SderaadtCompile for matching that ignores upper/lower case distinctions.
135df930be7SderaadtSee
1365f2a12f2Sflipk.Xr re_format 7 .
13702cdb9c2Saaron.It Dv REG_NOSUB
138df930be7SderaadtCompile for matching that need only report success or failure,
139df930be7Sderaadtnot what was matched.
14002cdb9c2Saaron.It Dv REG_NEWLINE
141df930be7SderaadtCompile for newline-sensitive matching.
142df930be7SderaadtBy default, newline is a completely ordinary character with no special
143df930be7Sderaadtmeaning in either REs or strings.
144df930be7SderaadtWith this flag,
14502cdb9c2Saaron.Ql \&[^
14602cdb9c2Saaronbracket expressions and
14702cdb9c2Saaron.Ql \&.
14802cdb9c2Saaronnever match newline,
14902cdb9c2Saarona
15002cdb9c2Saaron.Ql ^
15102cdb9c2Saaronanchor matches the null string after any newline in the string
152df930be7Sderaadtin addition to its normal function,
15302cdb9c2Saaronand the
15402cdb9c2Saaron.Ql $
15502cdb9c2Saaronanchor matches the null string before any newline in the
156df930be7Sderaadtstring in addition to its normal function.
15702cdb9c2Saaron.It Dv REG_PEND
158df930be7SderaadtThe regular expression ends,
159df930be7Sderaadtnot at the first NUL,
160df930be7Sderaadtbut just before the character pointed to by the
1615f2a12f2Sflipk.Fa re_endp
162df930be7Sderaadtmember of the structure pointed to by
1635f2a12f2Sflipk.Fa preg .
164df930be7SderaadtThe
1655f2a12f2Sflipk.Fa re_endp
166df930be7Sderaadtmember is of type
1675f2a12f2Sflipk.Fa const\ char\ * .
168df930be7SderaadtThis flag permits inclusion of NULs in the RE;
169df930be7Sderaadtthey are considered ordinary characters.
170df930be7SderaadtThis is an extension,
171d1ce6025Swcobbcompatible with but not specified by
172d1ce6025Swcobb.St -p1003.2 ,
173df930be7Sderaadtand should be used with
174df930be7Sderaadtcaution in software intended to be portable to other systems.
1755f2a12f2Sflipk.El
1765f2a12f2Sflipk.Pp
177df930be7SderaadtWhen successful,
1785f2a12f2Sflipk.Fn regcomp
179df930be7Sderaadtreturns 0 and fills in the structure pointed to by
1805f2a12f2Sflipk.Fa preg .
181df930be7SderaadtOne member of that structure
182df930be7Sderaadt(other than
1835f2a12f2Sflipk.Fa re_endp )
184df930be7Sderaadtis publicized:
1855f2a12f2Sflipk.Fa re_nsub ,
186df930be7Sderaadtof type
1875f2a12f2Sflipk.Fa size_t ,
188df930be7Sderaadtcontains the number of parenthesized subexpressions within the RE
189df930be7Sderaadt(except that the value of this member is undefined if the
19002cdb9c2Saaron.Dv REG_NOSUB
19102cdb9c2Saaronflag was used).
192df930be7SderaadtIf
1935f2a12f2Sflipk.Fn regcomp
194df930be7Sderaadtfails, it returns a non-zero error code;
195df930be7Sderaadtsee DIAGNOSTICS.
1965f2a12f2Sflipk.Pp
197960f8fbdSderaadt.Fn regexec
198df930be7Sderaadtmatches the compiled RE pointed to by
1995f2a12f2Sflipk.Fa preg
200df930be7Sderaadtagainst the
2015f2a12f2Sflipk.Fa string ,
202df930be7Sderaadtsubject to the flags in
2035f2a12f2Sflipk.Fa eflags ,
204df930be7Sderaadtand reports results using
2055f2a12f2Sflipk.Fa nmatch ,
2065f2a12f2Sflipk.Fa pmatch ,
207df930be7Sderaadtand the returned value.
208df930be7SderaadtThe RE must have been compiled by a previous invocation of
2095f2a12f2Sflipk.Fn regcomp .
210df930be7SderaadtThe compiled form is not altered during execution of
2115f2a12f2Sflipk.Fn regexec ,
212df930be7Sderaadtso a single compiled RE can be used simultaneously by multiple threads.
2135f2a12f2Sflipk.Pp
214df930be7SderaadtBy default,
2151e5ede29Scloderthe NUL-terminated string pointed to by
2165f2a12f2Sflipk.Fa string
217df930be7Sderaadtis considered to be the text of an entire line, minus any terminating
218df930be7Sderaadtnewline.
219df930be7SderaadtThe
2205f2a12f2Sflipk.Fa eflags
2211e49e6c5Sschwarzeargument is the bitwise OR of zero or more of the following values:
2225f2a12f2Sflipk.Bl -tag -width XREG_STARTENDX
22302cdb9c2Saaron.It Dv REG_NOTBOL
224e315dedfSmartijnThe first character of the string is treated as the continuation
225e315dedfSmartijnof a line.
226e315dedfSmartijnThis means that the anchors
227e315dedfSmartijn.Ql ^ ,
228e315dedfSmartijn.Ql [[:<:]] ,
229e315dedfSmartijnand
230e315dedfSmartijn.Ql \e<
231e315dedfSmartijndo not match before it; but see
232e315dedfSmartijn.Dv REG_STARTEND
233e315dedfSmartijnbelow.
23402cdb9c2SaaronThis does not affect the behavior of newlines under
23502cdb9c2Saaron.Dv REG_NEWLINE .
23602cdb9c2Saaron.It Dv REG_NOTEOL
237df930be7SderaadtThe NUL terminating
238df930be7Sderaadtthe string
23902cdb9c2Saarondoes not end a line, so the
24002cdb9c2Saaron.Ql $
241e315dedfSmartijnanchor does not match before it.
24202cdb9c2SaaronThis does not affect the behavior of newlines under
24302cdb9c2Saaron.Dv REG_NEWLINE .
24402cdb9c2Saaron.It Dv REG_STARTEND
245df930be7SderaadtThe string is considered to start at
246e315dedfSmartijn.Fa string No +
247e315dedfSmartijn.Fa pmatch Ns [0]. Ns Fa rm_so
248e315dedfSmartijnand to end before the byte located at
249e315dedfSmartijn.Fa string No +
250e315dedfSmartijn.Fa pmatch Ns [0]. Ns Fa rm_eo ,
251df930be7Sderaadtregardless of the value of
2525f2a12f2Sflipk.Fa nmatch .
253df930be7SderaadtSee below for the definition of
2545f2a12f2Sflipk.Fa pmatch
255df930be7Sderaadtand
2565f2a12f2Sflipk.Fa nmatch .
257df930be7SderaadtThis is an extension,
258d1ce6025Swcobbcompatible with but not specified by
259d1ce6025Swcobb.St -p1003.2 ,
260df930be7Sderaadtand should be used with
261df930be7Sderaadtcaution in software intended to be portable to other systems.
262e315dedfSmartijn.Pp
263e315dedfSmartijnWithout
264e315dedfSmartijn.Dv REG_NOTBOL ,
265e315dedfSmartijnthe position
266e315dedfSmartijn.Fa rm_so
267e315dedfSmartijnis considered the beginning of a line, such that
268e315dedfSmartijn.Ql ^
269e315dedfSmartijnmatches before it, and the beginning of a word if there is a word
270e315dedfSmartijncharacter at this position, such that
271e315dedfSmartijn.Ql [[:<:]]
272e315dedfSmartijnand
273e315dedfSmartijn.Ql \e<
274e315dedfSmartijnmatch before it.
275e315dedfSmartijn.Pp
276e315dedfSmartijnWith
277e315dedfSmartijn.Dv REG_NOTBOL ,
278e315dedfSmartijnthe character at position
279e315dedfSmartijn.Fa rm_so
280e315dedfSmartijnis treated as the continuation of a line, and if
281e315dedfSmartijn.Fa rm_so
282e315dedfSmartijnis greater than 0, the preceding character is taken into consideration.
283e315dedfSmartijnIf the preceding character is a newline and the regular expression was compiled
284e315dedfSmartijnwith
285e315dedfSmartijn.Dv REG_NEWLINE ,
286e315dedfSmartijn.Ql ^
287e315dedfSmartijnmatches before the string; if the preceding character is not a word character
288e315dedfSmartijnbut the string starts with a word character,
289e315dedfSmartijn.Ql [[:<:]]
290e315dedfSmartijnand
291e315dedfSmartijn.Ql \e<
292e315dedfSmartijnmatch before the string.
2935f2a12f2Sflipk.El
2945f2a12f2Sflipk.Pp
295df930be7SderaadtSee
2965f2a12f2Sflipk.Xr re_format 7
297df930be7Sderaadtfor a discussion of what is matched in situations where an RE or a
298df930be7Sderaadtportion thereof could match any of several substrings of
2995f2a12f2Sflipk.Fa string .
3005f2a12f2Sflipk.Pp
301df930be7SderaadtNormally,
3025f2a12f2Sflipk.Fn regexec
30302cdb9c2Saaronreturns 0 for success and the non-zero code
30402cdb9c2Saaron.Dv REG_NOMATCH
30502cdb9c2Saaronfor failure.
306df930be7SderaadtOther non-zero error codes may be returned in exceptional situations;
307df930be7Sderaadtsee DIAGNOSTICS.
3085f2a12f2Sflipk.Pp
30902cdb9c2SaaronIf
31002cdb9c2Saaron.Dv REG_NOSUB
31102cdb9c2Saaronwas specified in the compilation of the RE,
312df930be7Sderaadtor if
3135f2a12f2Sflipk.Fa nmatch
314df930be7Sderaadtis 0,
3155f2a12f2Sflipk.Fn regexec
316df930be7Sderaadtignores the
3175f2a12f2Sflipk.Fa pmatch
31802cdb9c2Saaronargument (but see below for the case where
31902cdb9c2Saaron.Dv REG_STARTEND
32002cdb9c2Saaronis specified).
321df930be7SderaadtOtherwise,
3225f2a12f2Sflipk.Fa pmatch
323df930be7Sderaadtpoints to an array of
3245f2a12f2Sflipk.Fa nmatch
325df930be7Sderaadtstructures of type
326*d32639f6Sjmc.Vt regmatch_t .
327df930be7SderaadtSuch a structure has at least the members
3285f2a12f2Sflipk.Fa rm_so
329df930be7Sderaadtand
3305f2a12f2Sflipk.Fa rm_eo ,
331df930be7Sderaadtboth of type
3325f2a12f2Sflipk.Fa regoff_t
333df930be7Sderaadt(a signed arithmetic type at least as large as an
334*d32639f6Sjmc.Vt off_t
335df930be7Sderaadtand a
336*d32639f6Sjmc.Vt ssize_t ) ,
337df930be7Sderaadtcontaining respectively the offset of the first character of a substring
338df930be7Sderaadtand the offset of the first character after the end of the substring.
339df930be7SderaadtOffsets are measured from the beginning of the
3405f2a12f2Sflipk.Fa string
341df930be7Sderaadtargument given to
3425f2a12f2Sflipk.Fn regexec .
343df930be7SderaadtAn empty substring is denoted by equal offsets,
344df930be7Sderaadtboth indicating the character following the empty substring.
3455f2a12f2Sflipk.Pp
346df930be7SderaadtThe 0th member of the
3475f2a12f2Sflipk.Fa pmatch
348df930be7Sderaadtarray is filled in to indicate what substring of
349de517754Saaron.Fa string
350df930be7Sderaadtwas matched by the entire RE.
351df930be7SderaadtRemaining members report what substring was matched by parenthesized
352df930be7Sderaadtsubexpressions within the RE;
353df930be7Sderaadtmember
3545f2a12f2Sflipk.Va i
355df930be7Sderaadtreports subexpression
3565f2a12f2Sflipk.Va i ,
357df930be7Sderaadtwith subexpressions counted (starting at 1) by the order of their opening
358df930be7Sderaadtparentheses in the RE, left to right.
359df930be7SderaadtUnused entries in the array\(emcorresponding either to subexpressions that
360df930be7Sderaadtdid not participate in the match at all, or to subexpressions that do not
361df930be7Sderaadtexist in the RE (that is, \fIi\fR\ > \fIpreg\fR\->\fIre_nsub\fR)\(emhave both
3625f2a12f2Sflipk.Fa rm_so
363df930be7Sderaadtand
3645f2a12f2Sflipk.Fa rm_eo
365df930be7Sderaadtset to \-1.
366df930be7SderaadtIf a subexpression participated in the match several times,
367df930be7Sderaadtthe reported substring is the last one it matched.
36802cdb9c2Saaron(Note, as an example in particular, that when the RE
36902cdb9c2Saaron.Dq (b*)+
37002cdb9c2Saaronmatches
37102cdb9c2Saaron.Dq bbb ,
37202cdb9c2Saaronthe parenthesized subexpression matches each of the three
37302cdb9c2Saaron.Sq b Ns s
37402cdb9c2Saaronand then
37502cdb9c2Saaronan infinite number of empty strings following the last
37602cdb9c2Saaron.Sq b ,
377df930be7Sderaadtso the reported substring is one of the empties.)
3785f2a12f2Sflipk.Pp
37902cdb9c2SaaronIf
38002cdb9c2Saaron.Dv REG_STARTEND
38102cdb9c2Saaronis specified,
3825f2a12f2Sflipk.Fa pmatch
383df930be7Sderaadtmust point to at least one
384*d32639f6Sjmc.Vt regmatch_t
385df930be7Sderaadt(even if
3865f2a12f2Sflipk.Fa nmatch
38702cdb9c2Saaronis 0 or
38802cdb9c2Saaron.Dv REG_NOSUB
38902cdb9c2Saaronwas specified),
39002cdb9c2Saaronto hold the input offsets for
39102cdb9c2Saaron.Dv REG_STARTEND .
392df930be7SderaadtUse for output is still entirely controlled by
3935f2a12f2Sflipk.Fa nmatch ;
394df930be7Sderaadtif
3955f2a12f2Sflipk.Fa nmatch
39602cdb9c2Saaronis 0 or
39702cdb9c2Saaron.Dv REG_NOSUB
39802cdb9c2Saaronwas specified,
399df930be7Sderaadtthe value of
4005f2a12f2Sflipk.Fa pmatch[0]
401df930be7Sderaadtwill not be changed by a successful
4025f2a12f2Sflipk.Fn regexec .
4035f2a12f2Sflipk.Pp
404960f8fbdSderaadt.Fn regerror
405df930be7Sderaadtmaps a non-zero
4065f2a12f2Sflipk.Va errcode
407df930be7Sderaadtfrom either
4085f2a12f2Sflipk.Fn regcomp
409df930be7Sderaadtor
4105f2a12f2Sflipk.Fn regexec
411df930be7Sderaadtto a human-readable, printable message.
412df930be7SderaadtIf
4135f2a12f2Sflipk.Fa preg
414df930be7Sderaadtis non-NULL,
415df930be7Sderaadtthe error code should have arisen from use of
416df930be7Sderaadtthe
417*d32639f6Sjmc.Vt regex_t
418df930be7Sderaadtpointed to by
4195f2a12f2Sflipk.Fa preg ,
420df930be7Sderaadtand if the error code came from
4215f2a12f2Sflipk.Fn regcomp ,
422df930be7Sderaadtit should have been the result from the most recent
4235f2a12f2Sflipk.Fn regcomp
424df930be7Sderaadtusing that
425*d32639f6Sjmc.Vt regex_t .
426d301afafSaaron.Pf ( Fn regerror
427df930be7Sderaadtmay be able to supply a more detailed message using information
428df930be7Sderaadtfrom the
429*d32639f6Sjmc.Vt regex_t . )
430960f8fbdSderaadt.Fn regerror
4311e5ede29Scloderplaces the NUL-terminated message into the buffer pointed to by
4325f2a12f2Sflipk.Fa errbuf ,
433df930be7Sderaadtlimiting the length (including the NUL) to at most
4345f2a12f2Sflipk.Fa errbuf_size
435df930be7Sderaadtbytes.
436df930be7SderaadtIf the whole message won't fit,
437df930be7Sderaadtas much of it as will fit before the terminating NUL is supplied.
438df930be7SderaadtIn any case,
439df930be7Sderaadtthe returned value is the size of buffer needed to hold the whole
44002cdb9c2Saaronmessage (including the terminating NUL).
441df930be7SderaadtIf
4425f2a12f2Sflipk.Fa errbuf_size
443df930be7Sderaadtis 0,
4445f2a12f2Sflipk.Fa errbuf
445df930be7Sderaadtis ignored but the return value is still correct.
4465f2a12f2Sflipk.Pp
447df930be7SderaadtIf the
4485f2a12f2Sflipk.Fa errcode
449df930be7Sderaadtgiven to
4505f2a12f2Sflipk.Fn regerror
4511e49e6c5Sschwarzeis first OR'ed with
45202cdb9c2Saaron.Dv REG_ITOA ,
45302cdb9c2Saaronthe
45402cdb9c2Saaron.Dq message
45502cdb9c2Saaronthat results is the printable name of the error code,
45602cdb9c2Saarone.g.,
45702cdb9c2Saaron.Dq REG_NOMATCH ,
458df930be7Sderaadtrather than an explanation thereof.
459df930be7SderaadtIf
4605f2a12f2Sflipk.Fa errcode
46102cdb9c2Saaronis
46202cdb9c2Saaron.Dv REG_ATOI ,
463df930be7Sderaadtthen
4645f2a12f2Sflipk.Fa preg
46502cdb9c2Saaronshall be non-null and the
4665f2a12f2Sflipk.Fa re_endp
467df930be7Sderaadtmember of the structure it points to
468df930be7Sderaadtmust point to the printable name of an error code;
469df930be7Sderaadtin this case, the result in
4705f2a12f2Sflipk.Fa errbuf
471df930be7Sderaadtis the decimal digits of
472df930be7Sderaadtthe numeric value of the error code
473df930be7Sderaadt(0 if the name is not recognized).
47402cdb9c2Saaron.Dv REG_ITOA
47502cdb9c2Saaronand
47602cdb9c2Saaron.Dv REG_ATOI
47702cdb9c2Saaronare intended primarily as debugging facilities;
478df930be7Sderaadtthey are extensions,
479d1ce6025Swcobbcompatible with but not specified by
480d1ce6025Swcobb.St -p1003.2
481df930be7Sderaadtand should be used with
482df930be7Sderaadtcaution in software intended to be portable to other systems.
483df930be7SderaadtBe warned also that they are considered experimental and changes are possible.
4845f2a12f2Sflipk.Pp
485960f8fbdSderaadt.Fn regfree
486892a7bb8Saaronfrees any dynamically allocated storage associated with the compiled RE
487df930be7Sderaadtpointed to by
4885f2a12f2Sflipk.Fa preg .
489df930be7SderaadtThe remaining
490*d32639f6Sjmc.Vt regex_t
491df930be7Sderaadtis no longer a valid compiled RE
492df930be7Sderaadtand the effect of supplying it to
4935f2a12f2Sflipk.Fn regexec
494df930be7Sderaadtor
4955f2a12f2Sflipk.Fn regerror
496df930be7Sderaadtis undefined.
4975f2a12f2Sflipk.Pp
498df930be7SderaadtNone of these functions references global variables except for tables
499df930be7Sderaadtof constants;
500df930be7Sderaadtall are safe for use from multiple threads if the arguments are safe.
5015f2a12f2Sflipk.Sh IMPLEMENTATION CHOICES
502d1ce6025SwcobbThere are a number of decisions that
503d1ce6025Swcobb.St -p1003.2
504d1ce6025Swcobbleaves up to the implementor,
50502cdb9c2Saaroneither by explicitly saying
50602cdb9c2Saaron.Dq undefined
50702cdb9c2Saaronor by virtue of them being
508df930be7Sderaadtforbidden by the RE grammar.
509df930be7SderaadtThis implementation treats them as follows.
5105f2a12f2Sflipk.Pp
511df930be7SderaadtSee
5125f2a12f2Sflipk.Xr re_format 7
513df930be7Sderaadtfor a discussion of the definition of case-independent matching.
5145f2a12f2Sflipk.Pp
515df930be7SderaadtThere is no particular limit on the length of REs,
516df930be7Sderaadtexcept insofar as memory is limited.
517df930be7SderaadtMemory usage is approximately linear in RE size, and largely insensitive
518df930be7Sderaadtto RE complexity, except for bounded repetitions.
5195f2a12f2SflipkSee
5205f2a12f2Sflipk.Sx BUGS
5215f2a12f2Sflipkfor one short RE using them
522df930be7Sderaadtthat will run almost any system out of memory.
5235f2a12f2Sflipk.Pp
524df930be7SderaadtA backslashed character other than one specifically given a magic meaning
525d1ce6025Swcobbby
526d1ce6025Swcobb.St -p1003.2
527d1ce6025Swcobb(such magic meanings occur only in obsolete REs)
528df930be7Sderaadtis taken as an ordinary character.
5295f2a12f2Sflipk.Pp
53002cdb9c2SaaronAny unmatched
53102cdb9c2Saaron.Ql \&[
53202cdb9c2Saaronis a
53302cdb9c2Saaron.Dv REG_EBRACK
53402cdb9c2Saaronerror.
5355f2a12f2Sflipk.Pp
536df930be7SderaadtEquivalence classes cannot begin or end bracket-expression ranges.
537df930be7SderaadtThe endpoint of one range cannot begin another.
5385f2a12f2Sflipk.Pp
539df930be7SderaadtRE_DUP_MAX, the limit on repetition counts in bounded repetitions, is 255.
5405f2a12f2Sflipk.Pp
541df930be7SderaadtA repetition operator (?, *, +, or bounds) cannot follow another
542df930be7Sderaadtrepetition operator.
543df930be7SderaadtA repetition operator cannot begin an expression or subexpression
54402cdb9c2Saaronor follow
54502cdb9c2Saaron.Ql ^
54602cdb9c2Saaronor
54702cdb9c2Saaron.Ql | .
5485f2a12f2Sflipk.Pp
54902cdb9c2SaaronA
55002cdb9c2Saaron.Ql |
55102cdb9c2Saaroncannot appear first or last in a (sub)expression, or after another
55202cdb9c2Saaron.Ql | ,
55302cdb9c2Saaroni.e., an operand of
55402cdb9c2Saaron.Ql |
55502cdb9c2Saaroncannot be an empty subexpression.
55602cdb9c2SaaronAn empty parenthesized subexpression,
55702cdb9c2Saaron.Ql \&(\&) ,
55802cdb9c2Saaronis legal and matches an
559df930be7Sderaadtempty (sub)string.
560df930be7SderaadtAn empty string is not a legal RE.
5615f2a12f2Sflipk.Pp
56202cdb9c2SaaronA
56302cdb9c2Saaron.Ql {
56402cdb9c2Saaronfollowed by a digit is considered the beginning of bounds for a
565df930be7Sderaadtbounded repetition, which must then follow the syntax for bounds.
56602cdb9c2SaaronA
56702cdb9c2Saaron.Ql {
56802cdb9c2Saaron.Em not
56902cdb9c2Saaronfollowed by a digit is considered an ordinary character.
5705f2a12f2Sflipk.Pp
57102cdb9c2Saaron.Ql ^
57202cdb9c2Saaronand
57302cdb9c2Saaron.Ql $
57402cdb9c2Saaronbeginning and ending subexpressions in obsolete
57502cdb9c2Saaron.Pq Dq basic
576df930be7SderaadtREs are anchors, not ordinary characters.
5775f2a12f2Sflipk.Sh DIAGNOSTICS
578df930be7SderaadtNon-zero error codes from
5795f2a12f2Sflipk.Fn regcomp
580df930be7Sderaadtand
5815f2a12f2Sflipk.Fn regexec
582df930be7Sderaadtinclude the following:
5835f2a12f2Sflipk.Pp
5845f2a12f2Sflipk.Bl -tag -compact -width XREG_ECOLLATEX
5855f2a12f2Sflipk.It Er REG_NOMATCH
5861e49e6c5Sschwarze.Fn regexec
5871e49e6c5Sschwarzefailed to match
5885f2a12f2Sflipk.It Er REG_BADPAT
5895f2a12f2Sflipkinvalid regular expression
5905f2a12f2Sflipk.It Er REG_ECOLLATE
5915f2a12f2Sflipkinvalid collating element
5925f2a12f2Sflipk.It Er REG_ECTYPE
5935f2a12f2Sflipkinvalid character class
5945f2a12f2Sflipk.It Er REG_EESCAPE
5955f2a12f2Sflipk\e applied to unescapable character
5965f2a12f2Sflipk.It Er REG_ESUBREG
5975f2a12f2Sflipkinvalid backreference number
5985f2a12f2Sflipk.It Er REG_EBRACK
5995f2a12f2Sflipkbrackets [ ] not balanced
6005f2a12f2Sflipk.It Er REG_EPAREN
6015f2a12f2Sflipkparentheses ( ) not balanced
6025f2a12f2Sflipk.It Er REG_EBRACE
6035f2a12f2Sflipkbraces { } not balanced
6045f2a12f2Sflipk.It Er REG_BADBR
6055f2a12f2Sflipkinvalid repetition count(s) in { }
6065f2a12f2Sflipk.It Er REG_ERANGE
6075f2a12f2Sflipkinvalid character range in [ ]
6085f2a12f2Sflipk.It Er REG_ESPACE
6095f2a12f2Sflipkran out of memory
6105f2a12f2Sflipk.It Er REG_BADRPT
6115f2a12f2Sflipk?, *, or + operand invalid
6125f2a12f2Sflipk.It Er REG_EMPTY
6135f2a12f2Sflipkempty (sub)expression
6145f2a12f2Sflipk.It Er REG_ASSERT
61502cdb9c2Saaron.Dq can't happen
61602cdb9c2Saaron\(emyou found a bug
6175f2a12f2Sflipk.It Er REG_INVARG
61842671979Saaroninvalid argument, e.g., negative-length string
6195f2a12f2Sflipk.El
620628fcf47Sjmc.Sh SEE ALSO
621628fcf47Sjmc.Xr grep 1 ,
622628fcf47Sjmc.Xr re_format 7
623628fcf47Sjmc.Pp
624628fcf47Sjmc.St -p1003.2 ,
625628fcf47Sjmcsections 2.8 (Regular Expression Notation)
626628fcf47Sjmcand
627628fcf47SjmcB.5 (C Binding for Regular Expression Matching).
6285f2a12f2Sflipk.Sh HISTORY
629878b8cfaSschwarzePredecessors called
63023fbb714Sschwarze.Fn regcmp
63123fbb714Sschwarzeand
632878b8cfaSschwarze.Fn regex
633878b8cfaSschwarzefirst appeared in PWB/UNIX 1.0.
63423fbb714Sschwarze.Pp
635878b8cfaSschwarzePredecessors
636878b8cfaSschwarze.Fn re_comp
637878b8cfaSschwarzeand
638878b8cfaSschwarze.Fn re_exec
639878b8cfaSschwarzefirst appeared in
640878b8cfaSschwarze.Bx 4.0 ,
641878b8cfaSschwarzebecame part of
642878b8cfaSschwarze.In unistd.h
643878b8cfaSschwarzein
644878b8cfaSschwarze.Bx 4.4 ,
645878b8cfaSschwarzeand were deleted after
646878b8cfaSschwarze.Ox 5.4 .
647878b8cfaSschwarze.Pp
648878b8cfaSschwarzeFunctions called
649878b8cfaSschwarze.Fn regcomp ,
650878b8cfaSschwarze.Fn regexec ,
651878b8cfaSschwarze.Fn regerror ,
652878b8cfaSschwarzeand
653878b8cfaSschwarze.Fn regsub
654878b8cfaSschwarzefirst appeared in Version\~8
655878b8cfaSschwarze.At ,
656878b8cfaSschwarzewere reimplemented and declared in
657878b8cfaSschwarze.In regexp.h
658878b8cfaSschwarzefor
659878b8cfaSschwarze.Bx 4.3 Tahoe ,
660878b8cfaSschwarzeand were also deleted after
661878b8cfaSschwarze.Ox 5.4 .
662878b8cfaSschwarze.Pp
663878b8cfaSschwarzeTaking different arguments, the POSIX
664878b8cfaSschwarze.In regex.h
665878b8cfaSschwarzefunctions
666878b8cfaSschwarze.Fn regcomp ,
667878b8cfaSschwarze.Fn regexec ,
668878b8cfaSschwarze.Fn regerror ,
669878b8cfaSschwarzeand
670878b8cfaSschwarze.Fn regfree
671878b8cfaSschwarzeappeared in
672878b8cfaSschwarze.Bx 4.4 .
673878b8cfaSschwarze.Sh AUTHORS
674878b8cfaSschwarze.An -nosplit
675878b8cfaSschwarzeThe
676878b8cfaSschwarzeVersion\~8
677878b8cfaSschwarze.At
678878b8cfaSschwarzecode was implemented by
679878b8cfaSschwarze.An Rob Pike
680878b8cfaSschwarzeand extracted into a library by
681878b8cfaSschwarze.An Dave Presotto .
682878b8cfaSschwarzeThe
683878b8cfaSschwarze.Bx 4.3 Tahoe
684878b8cfaSschwarzeand
685a873166dSmickey.Bx 4.4
686878b8cfaSschwarzeversions were both written by
687878b8cfaSschwarze.An Henry Spencer .
6885f2a12f2Sflipk.Sh BUGS
689df930be7SderaadtThe implementation of internationalization is incomplete:
690d1ce6025Swcobbthe locale is always assumed to be the default one of
691d1ce6025Swcobb.St -p1003.2 ,
692df930be7Sderaadtand only the collating elements etc. of that locale are available.
6935f2a12f2Sflipk.Pp
694df930be7SderaadtThe back-reference code is subtle and doubts linger about its correctness
695df930be7Sderaadtin complex cases.
6965f2a12f2Sflipk.Pp
697960f8fbdSderaadt.Fn regexec
698df930be7Sderaadtperformance is poor.
699df930be7SderaadtThis will improve with later releases.
700960f8fbdSderaadt.Fa nmatch
701df930be7Sderaadtexceeding 0 is expensive;
7025f2a12f2Sflipk.Fa nmatch
703df930be7Sderaadtexceeding 1 is worse.
704960f8fbdSderaadt.Fn regexec
7055f2a12f2Sflipkis largely insensitive to RE complexity
7065f2a12f2Sflipk.Em except
7075f2a12f2Sflipkthat back references are massively expensive.
708df930be7SderaadtRE length does matter; in particular, there is a strong speed bonus
709df930be7Sderaadtfor keeping RE length under about 30 characters,
710df930be7Sderaadtwith most special characters counting roughly double.
7115f2a12f2Sflipk.Pp
712960f8fbdSderaadt.Fn regcomp
713df930be7Sderaadtimplements bounded repetitions by macro expansion,
714df930be7Sderaadtwhich is costly in time and space if counts are large
715df930be7Sderaadtor bounded repetitions are nested.
71602cdb9c2SaaronA RE like, say,
71702cdb9c2Saaron.Dq ((((a{1,100}){1,100}){1,100}){1,100}){1,100}
718df930be7Sderaadtwill (eventually) run almost any existing machine out of swap space.
7195f2a12f2Sflipk.Pp
720df930be7SderaadtThere are suspected problems with response to obscure error conditions.
721df930be7SderaadtNotably,
722df930be7Sderaadtcertain kinds of internal overflow,
723df930be7Sderaadtproduced only by truly enormous REs or by multiply nested bounded repetitions,
724df930be7Sderaadtare probably not handled well.
7255f2a12f2Sflipk.Pp
726d1ce6025SwcobbDue to a mistake in
727d1ce6025Swcobb.St -p1003.2 ,
728d1ce6025Swcobbthings like
72902cdb9c2Saaron.Ql a)b
73002cdb9c2Saaronare legal REs because
73102cdb9c2Saaron.Ql \&)
73202cdb9c2Saaronis
73302cdb9c2Saarona special character only in the presence of a previous unmatched
73402cdb9c2Saaron.Ql \&( .
735df930be7SderaadtThis can't be fixed until the spec is fixed.
7365f2a12f2Sflipk.Pp
737df930be7SderaadtThe standard's definition of back references is vague.
738df930be7SderaadtFor example, does
73902cdb9c2Saaron.Dq a\e(\e(b\e)*\e2\e)*d
74002cdb9c2Saaronmatch
74102cdb9c2Saaron.Dq abbbd ?
742df930be7SderaadtUntil the standard is clarified,
743df930be7Sderaadtbehavior in such cases should not be relied on.
7445f2a12f2Sflipk.Pp
745df930be7SderaadtThe implementation of word-boundary matching is a bit of a kludge,
746df930be7Sderaadtand bugs may lurk in combinations of word-boundary matching and anchoring.
747