1*f14fb602SLionel Sambuc.\" $NetBSD: regex.3,v 1.22 2011/05/17 03:35:38 enami Exp $ 22fe8fb19SBen Gras.\" 3b7061124SArun Thomas.\" Copyright (c) 1992, 1993, 1994 4b7061124SArun Thomas.\" The Regents of the University of California. All rights reserved. 5b7061124SArun Thomas.\" 6b7061124SArun Thomas.\" This code is derived from software contributed to Berkeley by 7b7061124SArun Thomas.\" Henry Spencer. 8b7061124SArun Thomas.\" 9b7061124SArun Thomas.\" Redistribution and use in source and binary forms, with or without 10b7061124SArun Thomas.\" modification, are permitted provided that the following conditions 11b7061124SArun Thomas.\" are met: 12b7061124SArun Thomas.\" 1. Redistributions of source code must retain the above copyright 13b7061124SArun Thomas.\" notice, this list of conditions and the following disclaimer. 14b7061124SArun Thomas.\" 2. Redistributions in binary form must reproduce the above copyright 15b7061124SArun Thomas.\" notice, this list of conditions and the following disclaimer in the 16b7061124SArun Thomas.\" documentation and/or other materials provided with the distribution. 172fe8fb19SBen Gras.\" 3. Neither the name of the University nor the names of its contributors 182fe8fb19SBen Gras.\" may be used to endorse or promote products derived from this software 192fe8fb19SBen Gras.\" without specific prior written permission. 202fe8fb19SBen Gras.\" 212fe8fb19SBen Gras.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 222fe8fb19SBen Gras.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 232fe8fb19SBen Gras.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 242fe8fb19SBen Gras.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 252fe8fb19SBen Gras.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 262fe8fb19SBen Gras.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 272fe8fb19SBen Gras.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 282fe8fb19SBen Gras.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 292fe8fb19SBen Gras.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 302fe8fb19SBen Gras.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 312fe8fb19SBen Gras.\" SUCH DAMAGE. 322fe8fb19SBen Gras.\" 332fe8fb19SBen Gras.\" Copyright (c) 1992, 1993, 1994 Henry Spencer. 342fe8fb19SBen Gras.\" 352fe8fb19SBen Gras.\" This code is derived from software contributed to Berkeley by 362fe8fb19SBen Gras.\" Henry Spencer. 372fe8fb19SBen Gras.\" 382fe8fb19SBen Gras.\" Redistribution and use in source and binary forms, with or without 392fe8fb19SBen Gras.\" modification, are permitted provided that the following conditions 402fe8fb19SBen Gras.\" are met: 412fe8fb19SBen Gras.\" 1. Redistributions of source code must retain the above copyright 422fe8fb19SBen Gras.\" notice, this list of conditions and the following disclaimer. 432fe8fb19SBen Gras.\" 2. Redistributions in binary form must reproduce the above copyright 442fe8fb19SBen Gras.\" notice, this list of conditions and the following disclaimer in the 452fe8fb19SBen Gras.\" documentation and/or other materials provided with the distribution. 46b7061124SArun Thomas.\" 3. All advertising materials mentioning features or use of this software 47b7061124SArun Thomas.\" must display the following acknowledgement: 48b7061124SArun Thomas.\" This product includes software developed by the University of 49b7061124SArun Thomas.\" California, Berkeley and its contributors. 50b7061124SArun Thomas.\" 4. Neither the name of the University nor the names of its contributors 51b7061124SArun Thomas.\" may be used to endorse or promote products derived from this software 52b7061124SArun Thomas.\" without specific prior written permission. 53b7061124SArun Thomas.\" 54b7061124SArun Thomas.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 55b7061124SArun Thomas.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 56b7061124SArun Thomas.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 57b7061124SArun Thomas.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 58b7061124SArun Thomas.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 59b7061124SArun Thomas.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 60b7061124SArun Thomas.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 61b7061124SArun Thomas.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 62b7061124SArun Thomas.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 63b7061124SArun Thomas.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 64b7061124SArun Thomas.\" SUCH DAMAGE. 65b7061124SArun Thomas.\" 66b7061124SArun Thomas.\" @(#)regex.3 8.4 (Berkeley) 3/20/94 67b7061124SArun Thomas.\" 682fe8fb19SBen Gras.Dd December 29, 2003 692fe8fb19SBen Gras.Dt REGEX 3 702fe8fb19SBen Gras.Os 712fe8fb19SBen Gras.Sh NAME 722fe8fb19SBen Gras.Nm regex , 732fe8fb19SBen Gras.Nm regcomp , 742fe8fb19SBen Gras.Nm regexec , 752fe8fb19SBen Gras.Nm regerror , 762fe8fb19SBen Gras.Nm regfree 772fe8fb19SBen Gras.Nd regular-expression library 782fe8fb19SBen Gras.Sh LIBRARY 792fe8fb19SBen Gras.Lb libc 802fe8fb19SBen Gras.Sh SYNOPSIS 812fe8fb19SBen Gras.In regex.h 822fe8fb19SBen Gras.Ft int 832fe8fb19SBen Gras.Fn regcomp "regex_t * restrict preg" "const char * restrict pattern" "int cflags" 842fe8fb19SBen Gras.Ft int 852fe8fb19SBen Gras.Fn regexec "const regex_t * restrict preg" "const char * restrict string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags" 862fe8fb19SBen Gras.Ft size_t 872fe8fb19SBen Gras.Fn regerror "int errcode" "const regex_t * restrict preg" "char * restrict errbuf" "size_t errbuf_size" 882fe8fb19SBen Gras.Ft void 892fe8fb19SBen Gras.Fn regfree "regex_t *preg" 902fe8fb19SBen Gras.Sh DESCRIPTION 912fe8fb19SBen GrasThese routines implement 922fe8fb19SBen Gras.St -p1003.2-92 932fe8fb19SBen Grasregular expressions (``RE''s); 94b7061124SArun Thomassee 952fe8fb19SBen Gras.Xr re_format 7 . 962fe8fb19SBen Gras.Fn regcomp 97b7061124SArun Thomascompiles an RE written as a string into an internal form, 982fe8fb19SBen Gras.Fn regexec 99b7061124SArun Thomasmatches that internal form against a string and reports results, 1002fe8fb19SBen Gras.Fn regerror 101b7061124SArun Thomastransforms error codes from either into human-readable messages, 102b7061124SArun Thomasand 1032fe8fb19SBen Gras.Fn regfree 104b7061124SArun Thomasfrees any dynamically-allocated storage used by the internal form 105b7061124SArun Thomasof an RE. 1062fe8fb19SBen Gras.Pp 107b7061124SArun ThomasThe header 1082fe8fb19SBen Gras.In regex.h 109b7061124SArun Thomasdeclares two structure types, 1102fe8fb19SBen Gras.Fa regex_t 111b7061124SArun Thomasand 1122fe8fb19SBen Gras.Fa regmatch_t , 113b7061124SArun Thomasthe former for compiled internal forms and the latter for match reporting. 114b7061124SArun ThomasIt also declares the four functions, 115b7061124SArun Thomasa type 1162fe8fb19SBen Gras.Fa regoff_t , 117b7061124SArun Thomasand a number of constants with names starting with ``REG_''. 1182fe8fb19SBen Gras.Pp 1192fe8fb19SBen Gras.Fn regcomp 120b7061124SArun Thomascompiles the regular expression contained in the 1212fe8fb19SBen Gras.Fa pattern 122b7061124SArun Thomasstring, 123b7061124SArun Thomassubject to the flags in 1242fe8fb19SBen Gras.Fa cflags , 125b7061124SArun Thomasand places the results in the 1262fe8fb19SBen Gras.Fa regex_t 127b7061124SArun Thomasstructure pointed to by 1282fe8fb19SBen Gras.Fa preg . 1292fe8fb19SBen Gras.Fa cflags 130b7061124SArun Thomasis the bitwise OR of zero or more of the following flags: 1312fe8fb19SBen Gras.Bl -tag -width XXXREG_EXTENDED 1322fe8fb19SBen Gras.It Dv REG_EXTENDED 1332fe8fb19SBen GrasCompile modern (``extended'') REs, rather than the obsolete 1342fe8fb19SBen Gras(``basic'') REs that are the default. 1352fe8fb19SBen Gras.It Dv REG_BASIC 136b7061124SArun ThomasThis is a synonym for 0, 137b7061124SArun Thomasprovided as a counterpart to REG_EXTENDED to improve readability. 1382fe8fb19SBen Gras.It Dv REG_NOSPEC 139b7061124SArun ThomasCompile with recognition of all special characters turned off. 1402fe8fb19SBen GrasAll characters are thus considered ordinary, so the ``RE'' is a literal 1412fe8fb19SBen Grasstring. 1422fe8fb19SBen GrasThis is an extension, compatible with but not specified by 1432fe8fb19SBen Gras.St -p1003.2-92 , 1442fe8fb19SBen Grasand should be used with caution in software intended to be portable to 1452fe8fb19SBen Grasother systems. 1462fe8fb19SBen Gras.Dv REG_EXTENDED 1472fe8fb19SBen Grasand 1482fe8fb19SBen Gras.Dv REG_NOSPEC 1492fe8fb19SBen Grasmay not be used in the same call to 1502fe8fb19SBen Gras.Fn regcomp . 1512fe8fb19SBen Gras.It Dv REG_ICASE 152b7061124SArun ThomasCompile for matching that ignores upper/lower case distinctions. 153b7061124SArun ThomasSee 1542fe8fb19SBen Gras.Xr re_format 7 . 1552fe8fb19SBen Gras.It Dv REG_NOSUB 1562fe8fb19SBen GrasCompile for matching that need only report success or failure, not 1572fe8fb19SBen Graswhat was matched. 1582fe8fb19SBen Gras.It Dv REG_NEWLINE 159b7061124SArun ThomasCompile for newline-sensitive matching. 160b7061124SArun ThomasBy default, newline is a completely ordinary character with no special 161b7061124SArun Thomasmeaning in either REs or strings. 162b7061124SArun ThomasWith this flag, 163b7061124SArun Thomas`[^' bracket expressions and `.' never match newline, 164b7061124SArun Thomasa `^' anchor matches the null string after any newline in the string 165b7061124SArun Thomasin addition to its normal function, 166b7061124SArun Thomasand the `$' anchor matches the null string before any newline in the 167b7061124SArun Thomasstring in addition to its normal function. 1682fe8fb19SBen Gras.It Dv REG_PEND 1692fe8fb19SBen GrasThe regular expression ends, not at the first NUL, but just before the 1702fe8fb19SBen Grascharacter pointed to by the 1712fe8fb19SBen Gras.Fa re_endp 172b7061124SArun Thomasmember of the structure pointed to by 1732fe8fb19SBen Gras.Fa preg . 174b7061124SArun ThomasThe 1752fe8fb19SBen Gras.Fa re_endp 176b7061124SArun Thomasmember is of type 1772fe8fb19SBen Gras.Fa "const\ char\ *" . 1782fe8fb19SBen GrasThis flag permits inclusion of NULs in the RE; they are considered 1792fe8fb19SBen Grasordinary characters. 1802fe8fb19SBen GrasThis is an extension, compatible with but not specified by 1812fe8fb19SBen Gras.St -p1003.2-92 , 1822fe8fb19SBen Grasand should be used with caution in software intended to be portable to 1832fe8fb19SBen Grasother systems. 1842fe8fb19SBen Gras.El 1852fe8fb19SBen Gras.Pp 186b7061124SArun ThomasWhen successful, 1872fe8fb19SBen Gras.Fn regcomp 188b7061124SArun Thomasreturns 0 and fills in the structure pointed to by 1892fe8fb19SBen Gras.Fa preg . 1902fe8fb19SBen GrasOne member of that structure (other than 1912fe8fb19SBen Gras.Fa re_endp ) 192b7061124SArun Thomasis publicized: 1932fe8fb19SBen Gras.Fa re_nsub , 194b7061124SArun Thomasof type 1952fe8fb19SBen Gras.Fa size_t , 196b7061124SArun Thomascontains the number of parenthesized subexpressions within the RE 197b7061124SArun Thomas(except that the value of this member is undefined if the 1982fe8fb19SBen Gras.Dv REG_NOSUB 1992fe8fb19SBen Grasflag was used). 200b7061124SArun ThomasIf 2012fe8fb19SBen Gras.Fn regcomp 202b7061124SArun Thomasfails, it returns a non-zero error code; 2032fe8fb19SBen Grassee 2042fe8fb19SBen Gras.Sx DIAGNOSTICS . 2052fe8fb19SBen Gras.Pp 2062fe8fb19SBen Gras.Fn regexec 207b7061124SArun Thomasmatches the compiled RE pointed to by 2082fe8fb19SBen Gras.Fa preg 209b7061124SArun Thomasagainst the 2102fe8fb19SBen Gras.Fa string , 211b7061124SArun Thomassubject to the flags in 2122fe8fb19SBen Gras.Fa eflags , 213b7061124SArun Thomasand reports results using 2142fe8fb19SBen Gras.Fa nmatch , 2152fe8fb19SBen Gras.Fa pmatch , 216b7061124SArun Thomasand the returned value. 217b7061124SArun ThomasThe RE must have been compiled by a previous invocation of 2182fe8fb19SBen Gras.Fn regcomp . 219b7061124SArun ThomasThe compiled form is not altered during execution of 2202fe8fb19SBen Gras.Fn regexec , 221b7061124SArun Thomasso a single compiled RE can be used simultaneously by multiple threads. 2222fe8fb19SBen Gras.Pp 223b7061124SArun ThomasBy default, 224b7061124SArun Thomasthe NUL-terminated string pointed to by 2252fe8fb19SBen Gras.Fa string 226b7061124SArun Thomasis considered to be the text of an entire line, minus any terminating 227b7061124SArun Thomasnewline. 228b7061124SArun ThomasThe 2292fe8fb19SBen Gras.Fa eflags 230b7061124SArun Thomasargument is the bitwise OR of zero or more of the following flags: 2312fe8fb19SBen Gras.Bl -tag -width XXXREG_NOTBOL 2322fe8fb19SBen Gras.It Dv REG_NOTBOL 2332fe8fb19SBen GrasThe first character of the string 234b7061124SArun Thomasis not the beginning of a line, so the `^' anchor should not match before it. 2352fe8fb19SBen GrasThis does not affect the behavior of newlines under 2362fe8fb19SBen Gras.Dv REG_NEWLINE . 2372fe8fb19SBen Gras.It Dv REG_NOTEOL 2382fe8fb19SBen GrasThe NUL terminating the string does not end a line, so the `$' anchor 2392fe8fb19SBen Grasshould not match before it. 2402fe8fb19SBen GrasThis does not affect the behavior of newlines under 2412fe8fb19SBen Gras.Dv REG_NEWLINE . 2422fe8fb19SBen Gras.It Dv REG_STARTEND 243b7061124SArun ThomasThe string is considered to start at 2442fe8fb19SBen Gras.Fa string 2452fe8fb19SBen Gras+ 2462fe8fb19SBen Gras.Fa pmatch[0].rm_so 247b7061124SArun Thomasand to have a terminating NUL located at 2482fe8fb19SBen Gras.Fa string 2492fe8fb19SBen Gras+ 2502fe8fb19SBen Gras.Fa pmatch[0].rm_eo 251b7061124SArun Thomas(there need not actually be a NUL at that location), 252b7061124SArun Thomasregardless of the value of 2532fe8fb19SBen Gras.Fa nmatch . 254b7061124SArun ThomasSee below for the definition of 2552fe8fb19SBen Gras.Fa pmatch 256b7061124SArun Thomasand 2572fe8fb19SBen Gras.Fa nmatch . 2582fe8fb19SBen GrasThis is an extension, compatible with but not specified by 2592fe8fb19SBen Gras.St -p1003.2-92 , 2602fe8fb19SBen Grasand should be used with caution in software intended to be portable to 2612fe8fb19SBen Grasother systems. 2622fe8fb19SBen GrasNote that a non-zero 2632fe8fb19SBen Gras.Fa rm_so 2642fe8fb19SBen Grasdoes not imply 2652fe8fb19SBen Gras.Dv REG_NOTBOL ; 2662fe8fb19SBen Gras.Dv REG_STARTEND 2672fe8fb19SBen Grasaffects only the location of the string, not how it is matched. 2682fe8fb19SBen Gras.El 2692fe8fb19SBen Gras.Pp 270b7061124SArun ThomasSee 2712fe8fb19SBen Gras.Xr re_format 7 272b7061124SArun Thomasfor a discussion of what is matched in situations where an RE or a 273b7061124SArun Thomasportion thereof could match any of several substrings of 2742fe8fb19SBen Gras.Fa string . 2752fe8fb19SBen Gras.Pp 276b7061124SArun ThomasNormally, 2772fe8fb19SBen Gras.Fn regexec 2782fe8fb19SBen Grasreturns 0 for success and the non-zero code 2792fe8fb19SBen Gras.Dv REG_NOMATCH 2802fe8fb19SBen Grasfor failure. 281b7061124SArun ThomasOther non-zero error codes may be returned in exceptional situations; 2822fe8fb19SBen Grassee 2832fe8fb19SBen Gras.Sx DIAGNOSTICS . 2842fe8fb19SBen Gras.Pp 2852fe8fb19SBen GrasIf 2862fe8fb19SBen Gras.Dv REG_NOSUB 2872fe8fb19SBen Graswas specified in the compilation of the RE, or if 2882fe8fb19SBen Gras.Fa nmatch 289b7061124SArun Thomasis 0, 2902fe8fb19SBen Gras.Fn regexec 291b7061124SArun Thomasignores the 2922fe8fb19SBen Gras.Fa pmatch 2932fe8fb19SBen Grasargument (but see below for the case where 2942fe8fb19SBen Gras.Dv REG_STARTEND 2952fe8fb19SBen Grasis specified). 296b7061124SArun ThomasOtherwise, 2972fe8fb19SBen Gras.Fa pmatch 298b7061124SArun Thomaspoints to an array of 2992fe8fb19SBen Gras.Fa nmatch 300b7061124SArun Thomasstructures of type 3012fe8fb19SBen Gras.Fa regmatch_t . 302b7061124SArun ThomasSuch a structure has at least the members 3032fe8fb19SBen Gras.Fa rm_so 304b7061124SArun Thomasand 3052fe8fb19SBen Gras.Fa rm_eo , 306b7061124SArun Thomasboth of type 3072fe8fb19SBen Gras.Fa regoff_t 308b7061124SArun Thomas(a signed arithmetic type at least as large as an 3092fe8fb19SBen Gras.Fa off_t 310b7061124SArun Thomasand a 3112fe8fb19SBen Gras.Fa ssize_t ) , 312b7061124SArun Thomascontaining respectively the offset of the first character of a substring 313b7061124SArun Thomasand the offset of the first character after the end of the substring. 314b7061124SArun ThomasOffsets are measured from the beginning of the 3152fe8fb19SBen Gras.Fa string 316b7061124SArun Thomasargument given to 3172fe8fb19SBen Gras.Fn regexec . 318b7061124SArun ThomasAn empty substring is denoted by equal offsets, 319b7061124SArun Thomasboth indicating the character following the empty substring. 3202fe8fb19SBen Gras.Pp 321b7061124SArun ThomasThe 0th member of the 3222fe8fb19SBen Gras.Fa pmatch 323b7061124SArun Thomasarray is filled in to indicate what substring of 3242fe8fb19SBen Gras.Fa string 325b7061124SArun Thomaswas matched by the entire RE. 326b7061124SArun ThomasRemaining members report what substring was matched by parenthesized 327b7061124SArun Thomassubexpressions within the RE; 328b7061124SArun Thomasmember 3292fe8fb19SBen Gras.Fa i 330b7061124SArun Thomasreports subexpression 3312fe8fb19SBen Gras.Fa i , 3322fe8fb19SBen Graswith subexpressions counted (starting at 1) by the order of their 3332fe8fb19SBen Grasopening parentheses in the RE, left to right. 334b7061124SArun ThomasUnused entries in the array\(emcorresponding either to subexpressions that 335b7061124SArun Thomasdid not participate in the match at all, or to subexpressions that do not 3362fe8fb19SBen Grasexist in the RE (that is, 3372fe8fb19SBen Gras.Fa i 3382fe8fb19SBen Gras\*[Gt] 3392fe8fb19SBen Gras.Fa preg-\*[Gt]re_nsub ) 3402fe8fb19SBen Gras\(emhave both 3412fe8fb19SBen Gras.Fa rm_so 342b7061124SArun Thomasand 3432fe8fb19SBen Gras.Fa rm_eo 3442fe8fb19SBen Grasset to -1. 345b7061124SArun ThomasIf a subexpression participated in the match several times, 346b7061124SArun Thomasthe reported substring is the last one it matched. 347b7061124SArun Thomas(Note, as an example in particular, that when the RE `(b*)+' matches `bbb', 348b7061124SArun Thomasthe parenthesized subexpression matches each of the three `b's and then 349b7061124SArun Thomasan infinite number of empty strings following the last `b', 350b7061124SArun Thomasso the reported substring is one of the empties.) 3512fe8fb19SBen Gras.Pp 3522fe8fb19SBen GrasIf 3532fe8fb19SBen Gras.Dv REG_STARTEND 3542fe8fb19SBen Grasis specified, 3552fe8fb19SBen Gras.Fa pmatch 356b7061124SArun Thomasmust point to at least one 3572fe8fb19SBen Gras.Fa regmatch_t 358b7061124SArun Thomas(even if 3592fe8fb19SBen Gras.Fa nmatch 3602fe8fb19SBen Grasis 0 or 3612fe8fb19SBen Gras.Dv REG_NOSUB 3622fe8fb19SBen Graswas specified), 3632fe8fb19SBen Grasto hold the input offsets for 3642fe8fb19SBen Gras.Dv REG_STARTEND . 365b7061124SArun ThomasUse for output is still entirely controlled by 3662fe8fb19SBen Gras.Fa nmatch ; 367b7061124SArun Thomasif 3682fe8fb19SBen Gras.Fa nmatch 3692fe8fb19SBen Grasis 0 or 3702fe8fb19SBen Gras.Dv REG_NOSUB 3712fe8fb19SBen Graswas specified, 372b7061124SArun Thomasthe value of 3732fe8fb19SBen Gras.Fa pmatch [0] 374b7061124SArun Thomaswill not be changed by a successful 3752fe8fb19SBen Gras.Fn regexec . 3762fe8fb19SBen Gras.Pp 3772fe8fb19SBen Gras.Fn regerror 378b7061124SArun Thomasmaps a non-zero 3792fe8fb19SBen Gras.Fa errcode 380b7061124SArun Thomasfrom either 3812fe8fb19SBen Gras.Fn regcomp 382b7061124SArun Thomasor 3832fe8fb19SBen Gras.Fn regexec 384b7061124SArun Thomasto a human-readable, printable message. 385b7061124SArun ThomasIf 3862fe8fb19SBen Gras.Fa preg 387b7061124SArun Thomasis non-NULL, 3882fe8fb19SBen Grasthe error code should have arisen from use of the 3892fe8fb19SBen Gras.Fa regex_t 390b7061124SArun Thomaspointed to by 3912fe8fb19SBen Gras.Fa preg , 392b7061124SArun Thomasand if the error code came from 3932fe8fb19SBen Gras.Fn regcomp , 394b7061124SArun Thomasit should have been the result from the most recent 3952fe8fb19SBen Gras.Fn regcomp 396b7061124SArun Thomasusing that 397*f14fb602SLionel Sambuc.Fa regex_t . 398*f14fb602SLionel Sambuc.Po Fn regerror 399b7061124SArun Thomasmay be able to supply a more detailed message using information 400b7061124SArun Thomasfrom the 401*f14fb602SLionel Sambuc.Fa regex_t . Pc 4022fe8fb19SBen Gras.Fn regerror 403b7061124SArun Thomasplaces the NUL-terminated message into the buffer pointed to by 4042fe8fb19SBen Gras.Fa errbuf , 405b7061124SArun Thomaslimiting the length (including the NUL) to at most 4062fe8fb19SBen Gras.Fa errbuf_size 407b7061124SArun Thomasbytes. 408b7061124SArun ThomasIf the whole message won't fit, 409b7061124SArun Thomasas much of it as will fit before the terminating NUL is supplied. 410b7061124SArun ThomasIn any case, 411b7061124SArun Thomasthe returned value is the size of buffer needed to hold the whole 412b7061124SArun Thomasmessage (including terminating NUL). 413b7061124SArun ThomasIf 4142fe8fb19SBen Gras.Fa errbuf_size 415b7061124SArun Thomasis 0, 4162fe8fb19SBen Gras.Fa errbuf 417b7061124SArun Thomasis ignored but the return value is still correct. 4182fe8fb19SBen Gras.Pp 419b7061124SArun ThomasIf the 4202fe8fb19SBen Gras.Fa errcode 421b7061124SArun Thomasgiven to 4222fe8fb19SBen Gras.Fn regerror 4232fe8fb19SBen Grasis first ORed with 4242fe8fb19SBen Gras.Dv REG_ITOA , 425b7061124SArun Thomasthe ``message'' that results is the printable name of the error code, 426b7061124SArun Thomase.g. ``REG_NOMATCH'', 427b7061124SArun Thomasrather than an explanation thereof. 428b7061124SArun ThomasIf 4292fe8fb19SBen Gras.Fa errcode 4302fe8fb19SBen Grasis 4312fe8fb19SBen Gras.Dv REG_ATOI , 432b7061124SArun Thomasthen 4332fe8fb19SBen Gras.Fa preg 434b7061124SArun Thomasshall be non-NULL and the 4352fe8fb19SBen Gras.Fa re_endp 436b7061124SArun Thomasmember of the structure it points to 437b7061124SArun Thomasmust point to the printable name of an error code; 438b7061124SArun Thomasin this case, the result in 4392fe8fb19SBen Gras.Fa errbuf 440b7061124SArun Thomasis the decimal digits of 441b7061124SArun Thomasthe numeric value of the error code 442b7061124SArun Thomas(0 if the name is not recognized). 4432fe8fb19SBen Gras.Dv REG_ITOA 4442fe8fb19SBen Grasand 4452fe8fb19SBen Gras.Dv REG_ATOI 4462fe8fb19SBen Grasare intended primarily as debugging facilities; 4472fe8fb19SBen Grasthey are extensions, compatible with but not specified by 4482fe8fb19SBen Gras.St -p1003.2-92 , 4492fe8fb19SBen Grasand should be used with caution in software intended to be portable to 4502fe8fb19SBen Grasother systems. 451b7061124SArun ThomasBe warned also that they are considered experimental and changes are possible. 4522fe8fb19SBen Gras.Pp 4532fe8fb19SBen Gras.Fn regfree 454b7061124SArun Thomasfrees any dynamically-allocated storage associated with the compiled RE 455b7061124SArun Thomaspointed to by 4562fe8fb19SBen Gras.Fa preg . 457b7061124SArun ThomasThe remaining 4582fe8fb19SBen Gras.Fa regex_t 459b7061124SArun Thomasis no longer a valid compiled RE 460b7061124SArun Thomasand the effect of supplying it to 4612fe8fb19SBen Gras.Fn regexec 462b7061124SArun Thomasor 4632fe8fb19SBen Gras.Fn regerror 464b7061124SArun Thomasis undefined. 4652fe8fb19SBen Gras.Pp 466b7061124SArun ThomasNone of these functions references global variables except for tables 467b7061124SArun Thomasof constants; 468b7061124SArun Thomasall are safe for use from multiple threads if the arguments are safe. 4692fe8fb19SBen Gras.Sh IMPLEMENTATION CHOICES 4702fe8fb19SBen GrasThere are a number of decisions that 4712fe8fb19SBen Gras.St -p1003.2-92 4722fe8fb19SBen Grasleaves up to the implementor, 473b7061124SArun Thomaseither by explicitly saying ``undefined'' or by virtue of them being 474b7061124SArun Thomasforbidden by the RE grammar. 475b7061124SArun ThomasThis implementation treats them as follows. 4762fe8fb19SBen Gras.Pp 477b7061124SArun ThomasSee 4782fe8fb19SBen Gras.Xr re_format 7 479b7061124SArun Thomasfor a discussion of the definition of case-independent matching. 4802fe8fb19SBen Gras.Pp 481b7061124SArun ThomasThere is no particular limit on the length of REs, 482b7061124SArun Thomasexcept insofar as memory is limited. 483b7061124SArun ThomasMemory usage is approximately linear in RE size, and largely insensitive 484b7061124SArun Thomasto RE complexity, except for bounded repetitions. 485b7061124SArun ThomasSee BUGS for one short RE using them 486b7061124SArun Thomasthat will run almost any system out of memory. 4872fe8fb19SBen Gras.Pp 488b7061124SArun ThomasA backslashed character other than one specifically given a magic meaning 4892fe8fb19SBen Grasby 4902fe8fb19SBen Gras.St -p1003.2-92 4912fe8fb19SBen Gras(such magic meanings occur only in obsolete [``basic''] REs) 492b7061124SArun Thomasis taken as an ordinary character. 4932fe8fb19SBen Gras.Pp 4942fe8fb19SBen GrasAny unmatched [ is a 4952fe8fb19SBen Gras.Dv REG_EBRACK 4962fe8fb19SBen Graserror. 4972fe8fb19SBen Gras.Pp 498b7061124SArun ThomasEquivalence classes cannot begin or end bracket-expression ranges. 499b7061124SArun ThomasThe endpoint of one range cannot begin another. 5002fe8fb19SBen Gras.Pp 5012fe8fb19SBen Gras.Dv RE_DUP_MAX , 5022fe8fb19SBen Grasthe limit on repetition counts in bounded repetitions, is 255. 5032fe8fb19SBen Gras.Pp 504b7061124SArun ThomasA repetition operator (?, *, +, or bounds) cannot follow another 505b7061124SArun Thomasrepetition operator. 506b7061124SArun ThomasA repetition operator cannot begin an expression or subexpression 507b7061124SArun Thomasor follow `^' or `|'. 5082fe8fb19SBen Gras.Pp 509b7061124SArun Thomas`|' cannot appear first or last in a (sub)expression or after another `|', 510b7061124SArun Thomasi.e. an operand of `|' cannot be an empty subexpression. 511b7061124SArun ThomasAn empty parenthesized subexpression, `()', is legal and matches an 512b7061124SArun Thomasempty (sub)string. 513b7061124SArun ThomasAn empty string is not a legal RE. 5142fe8fb19SBen Gras.Pp 515b7061124SArun ThomasA `{' followed by a digit is considered the beginning of bounds for a 516b7061124SArun Thomasbounded repetition, which must then follow the syntax for bounds. 5172fe8fb19SBen GrasA `{' 5182fe8fb19SBen Gras.Em not 5192fe8fb19SBen Grasfollowed by a digit is considered an ordinary character. 5202fe8fb19SBen Gras.Pp 521b7061124SArun Thomas`^' and `$' beginning and ending subexpressions in obsolete (``basic'') 522b7061124SArun ThomasREs are anchors, not ordinary characters. 5232fe8fb19SBen Gras.Sh DIAGNOSTICS 5242fe8fb19SBen GrasNon-zero error codes from 5252fe8fb19SBen Gras.Fn regcomp 5262fe8fb19SBen Grasand 5272fe8fb19SBen Gras.Fn regexec 5282fe8fb19SBen Grasinclude the following: 5292fe8fb19SBen Gras.Pp 5302fe8fb19SBen Gras.Bl -tag -width XXXREG_ECOLLATE -compact 5312fe8fb19SBen Gras.It Dv REG_NOMATCH 5322fe8fb19SBen Gras.Fn regexec 5332fe8fb19SBen Grasfailed to match 5342fe8fb19SBen Gras.It Dv REG_BADPAT 5352fe8fb19SBen Grasinvalid regular expression 5362fe8fb19SBen Gras.It Dv REG_ECOLLATE 5372fe8fb19SBen Grasinvalid collating element 5382fe8fb19SBen Gras.It Dv REG_ECTYPE 5392fe8fb19SBen Grasinvalid character class 5402fe8fb19SBen Gras.It Dv REG_EESCAPE 5412fe8fb19SBen Gras\e applied to unescapable character 5422fe8fb19SBen Gras.It Dv REG_ESUBREG 5432fe8fb19SBen Grasinvalid backreference number 5442fe8fb19SBen Gras.It Dv REG_EBRACK 5452fe8fb19SBen Grasbrackets [ ] not balanced 5462fe8fb19SBen Gras.It Dv REG_EPAREN 5472fe8fb19SBen Grasparentheses ( ) not balanced 5482fe8fb19SBen Gras.It Dv REG_EBRACE 5492fe8fb19SBen Grasbraces { } not balanced 5502fe8fb19SBen Gras.It Dv REG_BADBR 5512fe8fb19SBen Grasinvalid repetition count(s) in { } 5522fe8fb19SBen Gras.It Dv REG_ERANGE 5532fe8fb19SBen Grasinvalid character range in [ ] 5542fe8fb19SBen Gras.It Dv REG_ESPACE 5552fe8fb19SBen Grasran out of memory 5562fe8fb19SBen Gras.It Dv REG_BADRPT 5572fe8fb19SBen Gras?, *, or + operand invalid 5582fe8fb19SBen Gras.It Dv REG_EMPTY 5592fe8fb19SBen Grasempty (sub)expression 5602fe8fb19SBen Gras.It Dv REG_ASSERT 5612fe8fb19SBen Gras``can't happen''\(emyou found a bug 5622fe8fb19SBen Gras.It Dv REG_INVARG 5632fe8fb19SBen Grasinvalid argument, e.g. negative-length string 5642fe8fb19SBen Gras.El 5652fe8fb19SBen Gras.Sh SEE ALSO 5662fe8fb19SBen Gras.Xr grep 1 , 5672fe8fb19SBen Gras.Xr sed 1 , 5682fe8fb19SBen Gras.Xr re_format 7 5692fe8fb19SBen Gras.Pp 5702fe8fb19SBen Gras.St -p1003.2-92 , 5712fe8fb19SBen Grassections 2.8 (Regular Expression Notation) 572b7061124SArun Thomasand 573b7061124SArun ThomasB.5 (C Binding for Regular Expression Matching). 5742fe8fb19SBen Gras.Sh HISTORY 575b7061124SArun ThomasOriginally written by Henry Spencer. 5762fe8fb19SBen GrasAltered for inclusion in the 5772fe8fb19SBen Gras.Bx 4.4 5782fe8fb19SBen Grasdistribution. 5792fe8fb19SBen Gras.Sh BUGS 580b7061124SArun ThomasThere is one known functionality bug. 581b7061124SArun ThomasThe implementation of internationalization is incomplete: 5822fe8fb19SBen Grasthe locale is always assumed to be the default one of 5832fe8fb19SBen Gras.St -p1003.2-92 , 584b7061124SArun Thomasand only the collating elements etc. of that locale are available. 5852fe8fb19SBen Gras.Pp 586b7061124SArun ThomasThe back-reference code is subtle and doubts linger about its correctness 587b7061124SArun Thomasin complex cases. 5882fe8fb19SBen Gras.Pp 5892fe8fb19SBen Gras.Fn regexec 590b7061124SArun Thomasperformance is poor. 591b7061124SArun ThomasThis will improve with later releases. 5922fe8fb19SBen Gras.Fa nmatch 593b7061124SArun Thomasexceeding 0 is expensive; 5942fe8fb19SBen Gras.Fa nmatch 595b7061124SArun Thomasexceeding 1 is worse. 5962fe8fb19SBen Gras.Fa regexec 5972fe8fb19SBen Grasis largely insensitive to RE complexity 5982fe8fb19SBen Gras.Em except 5992fe8fb19SBen Grasthat back references are massively expensive. 600b7061124SArun ThomasRE length does matter; in particular, there is a strong speed bonus 601b7061124SArun Thomasfor keeping RE length under about 30 characters, 602b7061124SArun Thomaswith most special characters counting roughly double. 6032fe8fb19SBen Gras.Pp 6042fe8fb19SBen Gras.Fn regcomp 605b7061124SArun Thomasimplements bounded repetitions by macro expansion, 606b7061124SArun Thomaswhich is costly in time and space if counts are large 607b7061124SArun Thomasor bounded repetitions are nested. 608b7061124SArun ThomasAn RE like, say, 609b7061124SArun Thomas`((((a{1,100}){1,100}){1,100}){1,100}){1,100}' 610b7061124SArun Thomaswill (eventually) run almost any existing machine out of swap space. 6112fe8fb19SBen Gras.Pp 612b7061124SArun ThomasThere are suspected problems with response to obscure error conditions. 613b7061124SArun ThomasNotably, 614b7061124SArun Thomascertain kinds of internal overflow, 615b7061124SArun Thomasproduced only by truly enormous REs or by multiply nested bounded repetitions, 616b7061124SArun Thomasare probably not handled well. 6172fe8fb19SBen Gras.Pp 6182fe8fb19SBen GrasDue to a mistake in 6192fe8fb19SBen Gras.St -p1003.2-92 , 6202fe8fb19SBen Grasthings like `a)b' are legal REs because `)' is a special character 6212fe8fb19SBen Grasonly in the presence of a previous unmatched `('. 622b7061124SArun ThomasThis can't be fixed until the spec is fixed. 6232fe8fb19SBen Gras.Pp 624b7061124SArun ThomasThe standard's definition of back references is vague. 625b7061124SArun ThomasFor example, does 626b7061124SArun Thomas`a\e(\e(b\e)*\e2\e)*d' match `abbbd'? 6272fe8fb19SBen GrasUntil the standard is clarified, behavior in such cases should not be 6282fe8fb19SBen Grasrelied on. 6292fe8fb19SBen Gras.Pp 630b7061124SArun ThomasThe implementation of word-boundary matching is a bit of a kludge, 631b7061124SArun Thomasand bugs may lurk in combinations of word-boundary matching and anchoring. 632