1*d32639f6Sjmc.\" $OpenBSD: regex.3,v 1.30 2022/09/11 06:38:10 jmc Exp $ 25f2a12f2Sflipk.\" 35f2a12f2Sflipk.\" Copyright (c) 1997, Phillip F Knaack. All rights reserved. 4df930be7Sderaadt.\" 5df930be7Sderaadt.\" Copyright (c) 1992, 1993, 1994 Henry Spencer. 6df930be7Sderaadt.\" Copyright (c) 1992, 1993, 1994 7df930be7Sderaadt.\" The Regents of the University of California. All rights reserved. 8df930be7Sderaadt.\" 9df930be7Sderaadt.\" This code is derived from software contributed to Berkeley by 10df930be7Sderaadt.\" Henry Spencer. 11df930be7Sderaadt.\" 12df930be7Sderaadt.\" Redistribution and use in source and binary forms, with or without 13df930be7Sderaadt.\" modification, are permitted provided that the following conditions 14df930be7Sderaadt.\" are met: 15df930be7Sderaadt.\" 1. Redistributions of source code must retain the above copyright 16df930be7Sderaadt.\" notice, this list of conditions and the following disclaimer. 17df930be7Sderaadt.\" 2. Redistributions in binary form must reproduce the above copyright 18df930be7Sderaadt.\" notice, this list of conditions and the following disclaimer in the 19df930be7Sderaadt.\" documentation and/or other materials provided with the distribution. 206580fee3Smillert.\" 3. Neither the name of the University nor the names of its contributors 21df930be7Sderaadt.\" may be used to endorse or promote products derived from this software 22df930be7Sderaadt.\" without specific prior written permission. 23df930be7Sderaadt.\" 24df930be7Sderaadt.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 25df930be7Sderaadt.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 26df930be7Sderaadt.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 27df930be7Sderaadt.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 28df930be7Sderaadt.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 29df930be7Sderaadt.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 30df930be7Sderaadt.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 31df930be7Sderaadt.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 32df930be7Sderaadt.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 33df930be7Sderaadt.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 34df930be7Sderaadt.\" SUCH DAMAGE. 35df930be7Sderaadt.\" 36442a6afdSmillert.\" @(#)regex.3 8.4 (Berkeley) 3/20/94 37442a6afdSmillert.\" 38*d32639f6Sjmc.Dd $Mdocdate: September 11 2022 $ 39d04ba2ccSjmc.Dt REGEXEC 3 40fc8533a3Saaron.Os 415f2a12f2Sflipk.Sh NAME 42ecb90728Saaron.Nm regcomp , 43ecb90728Saaron.Nm regexec , 44ecb90728Saaron.Nm regerror , 45ecb90728Saaron.Nm regfree 46c1899ebbSjmc.Nd regular expression routines 475f2a12f2Sflipk.Sh SYNOPSIS 4864d4e987Stedu.In sys/types.h 4964d4e987Stedu.In regex.h 50d1ce6025Swcobb.Ft int 51d1ce6025Swcobb.Fn regcomp "regex_t *preg" "const char *pattern" "int cflags" 52d1ce6025Swcobb.Pp 53d1ce6025Swcobb.Ft int 54d1ce6025Swcobb.Fn regexec "const regex_t *preg" "const char *string" "size_t nmatch" \ 55d1ce6025Swcobb "regmatch_t pmatch[]" "int eflags" 56d1ce6025Swcobb.Pp 57d1ce6025Swcobb.Ft size_t 58d1ce6025Swcobb.Fn regerror "int errcode" "const regex_t *preg" "char *errbuf" \ 59d1ce6025Swcobb "size_t errbuf_size" 60d1ce6025Swcobb.Pp 61d1ce6025Swcobb.Ft void 62d1ce6025Swcobb.Fn regfree "regex_t *preg" 635f2a12f2Sflipk.Sh DESCRIPTION 64d1ce6025SwcobbThese routines implement 65d1ce6025Swcobb.St -p1003.2 66d1ce6025Swcobbregular expressions 6702cdb9c2Saaron.Pq Dq REs ; 68df930be7Sderaadtsee 695f2a12f2Sflipk.Xr re_format 7 . 70960f8fbdSderaadt.Fn regcomp 71df930be7Sderaadtcompiles an RE written as a string into an internal form, 725f2a12f2Sflipk.Fn regexec 73df930be7Sderaadtmatches that internal form against a string and reports results, 745f2a12f2Sflipk.Fn regerror 755f2a12f2Sflipktransforms error codes from either into human-readable messages, and 765f2a12f2Sflipk.Fn regfree 77892a7bb8Saaronfrees any dynamically allocated storage used by the internal form 78df930be7Sderaadtof an RE. 795f2a12f2Sflipk.Pp 80df930be7SderaadtThe header 81369bef3aSschwarze.In regex.h 82df930be7Sderaadtdeclares two structure types, 83*d32639f6Sjmc.Vt regex_t 84df930be7Sderaadtand 85*d32639f6Sjmc.Vt regmatch_t , 86df930be7Sderaadtthe former for compiled internal forms and the latter for match reporting. 87df930be7SderaadtIt also declares the four functions, 88df930be7Sderaadta type 89*d32639f6Sjmc.Vt regoff_t , 9002cdb9c2Saaronand a number of constants with names starting with 9102cdb9c2Saaron.Dv REG_ . 925f2a12f2Sflipk.Pp 93960f8fbdSderaadt.Fn regcomp 94df930be7Sderaadtcompiles the regular expression contained in the 955f2a12f2Sflipk.Fa pattern 96df930be7Sderaadtstring, 97df930be7Sderaadtsubject to the flags in 985f2a12f2Sflipk.Fa cflags , 99df930be7Sderaadtand places the results in the 100*d32639f6Sjmc.Vt regex_t 101df930be7Sderaadtstructure pointed to by 1025f2a12f2Sflipk.Fa preg . 10313d73fe3SguentherThe 104960f8fbdSderaadt.Fa cflags 1051e49e6c5Sschwarzeargument is the bitwise OR of zero or more of the following values: 1065f2a12f2Sflipk.Bl -tag -width XREG_EXTENDEDX 10702cdb9c2Saaron.It Dv REG_EXTENDED 10802cdb9c2SaaronCompile modern 10902cdb9c2Saaron.Pq Dq extended 11002cdb9c2SaaronREs, 11102cdb9c2Saaronrather than the obsolete 11202cdb9c2Saaron.Pq Dq basic 11302cdb9c2SaaronREs that are the default. 11402cdb9c2Saaron.It Dv REG_BASIC 115df930be7SderaadtThis is a synonym for 0, 11602cdb9c2Saaronprovided as a counterpart to 11702cdb9c2Saaron.Dv REG_EXTENDED 11802cdb9c2Saaronto improve readability. 11902cdb9c2Saaron.It Dv REG_NOSPEC 120df930be7SderaadtCompile with recognition of all special characters turned off. 121df930be7SderaadtAll characters are thus considered ordinary, 12202cdb9c2Saaronso the RE is a literal string. 123df930be7SderaadtThis is an extension, 124d1ce6025Swcobbcompatible with but not specified by 125d1ce6025Swcobb.St -p1003.2 , 126df930be7Sderaadtand should be used with 127df930be7Sderaadtcaution in software intended to be portable to other systems. 12802cdb9c2Saaron.Dv REG_EXTENDED 12902cdb9c2Saaronand 13002cdb9c2Saaron.Dv REG_NOSPEC 13102cdb9c2Saaronmay not be used in the same call to 1325f2a12f2Sflipk.Fn regcomp . 13302cdb9c2Saaron.It Dv REG_ICASE 134df930be7SderaadtCompile for matching that ignores upper/lower case distinctions. 135df930be7SderaadtSee 1365f2a12f2Sflipk.Xr re_format 7 . 13702cdb9c2Saaron.It Dv REG_NOSUB 138df930be7SderaadtCompile for matching that need only report success or failure, 139df930be7Sderaadtnot what was matched. 14002cdb9c2Saaron.It Dv REG_NEWLINE 141df930be7SderaadtCompile for newline-sensitive matching. 142df930be7SderaadtBy default, newline is a completely ordinary character with no special 143df930be7Sderaadtmeaning in either REs or strings. 144df930be7SderaadtWith this flag, 14502cdb9c2Saaron.Ql \&[^ 14602cdb9c2Saaronbracket expressions and 14702cdb9c2Saaron.Ql \&. 14802cdb9c2Saaronnever match newline, 14902cdb9c2Saarona 15002cdb9c2Saaron.Ql ^ 15102cdb9c2Saaronanchor matches the null string after any newline in the string 152df930be7Sderaadtin addition to its normal function, 15302cdb9c2Saaronand the 15402cdb9c2Saaron.Ql $ 15502cdb9c2Saaronanchor matches the null string before any newline in the 156df930be7Sderaadtstring in addition to its normal function. 15702cdb9c2Saaron.It Dv REG_PEND 158df930be7SderaadtThe regular expression ends, 159df930be7Sderaadtnot at the first NUL, 160df930be7Sderaadtbut just before the character pointed to by the 1615f2a12f2Sflipk.Fa re_endp 162df930be7Sderaadtmember of the structure pointed to by 1635f2a12f2Sflipk.Fa preg . 164df930be7SderaadtThe 1655f2a12f2Sflipk.Fa re_endp 166df930be7Sderaadtmember is of type 1675f2a12f2Sflipk.Fa const\ char\ * . 168df930be7SderaadtThis flag permits inclusion of NULs in the RE; 169df930be7Sderaadtthey are considered ordinary characters. 170df930be7SderaadtThis is an extension, 171d1ce6025Swcobbcompatible with but not specified by 172d1ce6025Swcobb.St -p1003.2 , 173df930be7Sderaadtand should be used with 174df930be7Sderaadtcaution in software intended to be portable to other systems. 1755f2a12f2Sflipk.El 1765f2a12f2Sflipk.Pp 177df930be7SderaadtWhen successful, 1785f2a12f2Sflipk.Fn regcomp 179df930be7Sderaadtreturns 0 and fills in the structure pointed to by 1805f2a12f2Sflipk.Fa preg . 181df930be7SderaadtOne member of that structure 182df930be7Sderaadt(other than 1835f2a12f2Sflipk.Fa re_endp ) 184df930be7Sderaadtis publicized: 1855f2a12f2Sflipk.Fa re_nsub , 186df930be7Sderaadtof type 1875f2a12f2Sflipk.Fa size_t , 188df930be7Sderaadtcontains the number of parenthesized subexpressions within the RE 189df930be7Sderaadt(except that the value of this member is undefined if the 19002cdb9c2Saaron.Dv REG_NOSUB 19102cdb9c2Saaronflag was used). 192df930be7SderaadtIf 1935f2a12f2Sflipk.Fn regcomp 194df930be7Sderaadtfails, it returns a non-zero error code; 195df930be7Sderaadtsee DIAGNOSTICS. 1965f2a12f2Sflipk.Pp 197960f8fbdSderaadt.Fn regexec 198df930be7Sderaadtmatches the compiled RE pointed to by 1995f2a12f2Sflipk.Fa preg 200df930be7Sderaadtagainst the 2015f2a12f2Sflipk.Fa string , 202df930be7Sderaadtsubject to the flags in 2035f2a12f2Sflipk.Fa eflags , 204df930be7Sderaadtand reports results using 2055f2a12f2Sflipk.Fa nmatch , 2065f2a12f2Sflipk.Fa pmatch , 207df930be7Sderaadtand the returned value. 208df930be7SderaadtThe RE must have been compiled by a previous invocation of 2095f2a12f2Sflipk.Fn regcomp . 210df930be7SderaadtThe compiled form is not altered during execution of 2115f2a12f2Sflipk.Fn regexec , 212df930be7Sderaadtso a single compiled RE can be used simultaneously by multiple threads. 2135f2a12f2Sflipk.Pp 214df930be7SderaadtBy default, 2151e5ede29Scloderthe NUL-terminated string pointed to by 2165f2a12f2Sflipk.Fa string 217df930be7Sderaadtis considered to be the text of an entire line, minus any terminating 218df930be7Sderaadtnewline. 219df930be7SderaadtThe 2205f2a12f2Sflipk.Fa eflags 2211e49e6c5Sschwarzeargument is the bitwise OR of zero or more of the following values: 2225f2a12f2Sflipk.Bl -tag -width XREG_STARTENDX 22302cdb9c2Saaron.It Dv REG_NOTBOL 224e315dedfSmartijnThe first character of the string is treated as the continuation 225e315dedfSmartijnof a line. 226e315dedfSmartijnThis means that the anchors 227e315dedfSmartijn.Ql ^ , 228e315dedfSmartijn.Ql [[:<:]] , 229e315dedfSmartijnand 230e315dedfSmartijn.Ql \e< 231e315dedfSmartijndo not match before it; but see 232e315dedfSmartijn.Dv REG_STARTEND 233e315dedfSmartijnbelow. 23402cdb9c2SaaronThis does not affect the behavior of newlines under 23502cdb9c2Saaron.Dv REG_NEWLINE . 23602cdb9c2Saaron.It Dv REG_NOTEOL 237df930be7SderaadtThe NUL terminating 238df930be7Sderaadtthe string 23902cdb9c2Saarondoes not end a line, so the 24002cdb9c2Saaron.Ql $ 241e315dedfSmartijnanchor does not match before it. 24202cdb9c2SaaronThis does not affect the behavior of newlines under 24302cdb9c2Saaron.Dv REG_NEWLINE . 24402cdb9c2Saaron.It Dv REG_STARTEND 245df930be7SderaadtThe string is considered to start at 246e315dedfSmartijn.Fa string No + 247e315dedfSmartijn.Fa pmatch Ns [0]. Ns Fa rm_so 248e315dedfSmartijnand to end before the byte located at 249e315dedfSmartijn.Fa string No + 250e315dedfSmartijn.Fa pmatch Ns [0]. Ns Fa rm_eo , 251df930be7Sderaadtregardless of the value of 2525f2a12f2Sflipk.Fa nmatch . 253df930be7SderaadtSee below for the definition of 2545f2a12f2Sflipk.Fa pmatch 255df930be7Sderaadtand 2565f2a12f2Sflipk.Fa nmatch . 257df930be7SderaadtThis is an extension, 258d1ce6025Swcobbcompatible with but not specified by 259d1ce6025Swcobb.St -p1003.2 , 260df930be7Sderaadtand should be used with 261df930be7Sderaadtcaution in software intended to be portable to other systems. 262e315dedfSmartijn.Pp 263e315dedfSmartijnWithout 264e315dedfSmartijn.Dv REG_NOTBOL , 265e315dedfSmartijnthe position 266e315dedfSmartijn.Fa rm_so 267e315dedfSmartijnis considered the beginning of a line, such that 268e315dedfSmartijn.Ql ^ 269e315dedfSmartijnmatches before it, and the beginning of a word if there is a word 270e315dedfSmartijncharacter at this position, such that 271e315dedfSmartijn.Ql [[:<:]] 272e315dedfSmartijnand 273e315dedfSmartijn.Ql \e< 274e315dedfSmartijnmatch before it. 275e315dedfSmartijn.Pp 276e315dedfSmartijnWith 277e315dedfSmartijn.Dv REG_NOTBOL , 278e315dedfSmartijnthe character at position 279e315dedfSmartijn.Fa rm_so 280e315dedfSmartijnis treated as the continuation of a line, and if 281e315dedfSmartijn.Fa rm_so 282e315dedfSmartijnis greater than 0, the preceding character is taken into consideration. 283e315dedfSmartijnIf the preceding character is a newline and the regular expression was compiled 284e315dedfSmartijnwith 285e315dedfSmartijn.Dv REG_NEWLINE , 286e315dedfSmartijn.Ql ^ 287e315dedfSmartijnmatches before the string; if the preceding character is not a word character 288e315dedfSmartijnbut the string starts with a word character, 289e315dedfSmartijn.Ql [[:<:]] 290e315dedfSmartijnand 291e315dedfSmartijn.Ql \e< 292e315dedfSmartijnmatch before the string. 2935f2a12f2Sflipk.El 2945f2a12f2Sflipk.Pp 295df930be7SderaadtSee 2965f2a12f2Sflipk.Xr re_format 7 297df930be7Sderaadtfor a discussion of what is matched in situations where an RE or a 298df930be7Sderaadtportion thereof could match any of several substrings of 2995f2a12f2Sflipk.Fa string . 3005f2a12f2Sflipk.Pp 301df930be7SderaadtNormally, 3025f2a12f2Sflipk.Fn regexec 30302cdb9c2Saaronreturns 0 for success and the non-zero code 30402cdb9c2Saaron.Dv REG_NOMATCH 30502cdb9c2Saaronfor failure. 306df930be7SderaadtOther non-zero error codes may be returned in exceptional situations; 307df930be7Sderaadtsee DIAGNOSTICS. 3085f2a12f2Sflipk.Pp 30902cdb9c2SaaronIf 31002cdb9c2Saaron.Dv REG_NOSUB 31102cdb9c2Saaronwas specified in the compilation of the RE, 312df930be7Sderaadtor if 3135f2a12f2Sflipk.Fa nmatch 314df930be7Sderaadtis 0, 3155f2a12f2Sflipk.Fn regexec 316df930be7Sderaadtignores the 3175f2a12f2Sflipk.Fa pmatch 31802cdb9c2Saaronargument (but see below for the case where 31902cdb9c2Saaron.Dv REG_STARTEND 32002cdb9c2Saaronis specified). 321df930be7SderaadtOtherwise, 3225f2a12f2Sflipk.Fa pmatch 323df930be7Sderaadtpoints to an array of 3245f2a12f2Sflipk.Fa nmatch 325df930be7Sderaadtstructures of type 326*d32639f6Sjmc.Vt regmatch_t . 327df930be7SderaadtSuch a structure has at least the members 3285f2a12f2Sflipk.Fa rm_so 329df930be7Sderaadtand 3305f2a12f2Sflipk.Fa rm_eo , 331df930be7Sderaadtboth of type 3325f2a12f2Sflipk.Fa regoff_t 333df930be7Sderaadt(a signed arithmetic type at least as large as an 334*d32639f6Sjmc.Vt off_t 335df930be7Sderaadtand a 336*d32639f6Sjmc.Vt ssize_t ) , 337df930be7Sderaadtcontaining respectively the offset of the first character of a substring 338df930be7Sderaadtand the offset of the first character after the end of the substring. 339df930be7SderaadtOffsets are measured from the beginning of the 3405f2a12f2Sflipk.Fa string 341df930be7Sderaadtargument given to 3425f2a12f2Sflipk.Fn regexec . 343df930be7SderaadtAn empty substring is denoted by equal offsets, 344df930be7Sderaadtboth indicating the character following the empty substring. 3455f2a12f2Sflipk.Pp 346df930be7SderaadtThe 0th member of the 3475f2a12f2Sflipk.Fa pmatch 348df930be7Sderaadtarray is filled in to indicate what substring of 349de517754Saaron.Fa string 350df930be7Sderaadtwas matched by the entire RE. 351df930be7SderaadtRemaining members report what substring was matched by parenthesized 352df930be7Sderaadtsubexpressions within the RE; 353df930be7Sderaadtmember 3545f2a12f2Sflipk.Va i 355df930be7Sderaadtreports subexpression 3565f2a12f2Sflipk.Va i , 357df930be7Sderaadtwith subexpressions counted (starting at 1) by the order of their opening 358df930be7Sderaadtparentheses in the RE, left to right. 359df930be7SderaadtUnused entries in the array\(emcorresponding either to subexpressions that 360df930be7Sderaadtdid not participate in the match at all, or to subexpressions that do not 361df930be7Sderaadtexist in the RE (that is, \fIi\fR\ > \fIpreg\fR\->\fIre_nsub\fR)\(emhave both 3625f2a12f2Sflipk.Fa rm_so 363df930be7Sderaadtand 3645f2a12f2Sflipk.Fa rm_eo 365df930be7Sderaadtset to \-1. 366df930be7SderaadtIf a subexpression participated in the match several times, 367df930be7Sderaadtthe reported substring is the last one it matched. 36802cdb9c2Saaron(Note, as an example in particular, that when the RE 36902cdb9c2Saaron.Dq (b*)+ 37002cdb9c2Saaronmatches 37102cdb9c2Saaron.Dq bbb , 37202cdb9c2Saaronthe parenthesized subexpression matches each of the three 37302cdb9c2Saaron.Sq b Ns s 37402cdb9c2Saaronand then 37502cdb9c2Saaronan infinite number of empty strings following the last 37602cdb9c2Saaron.Sq b , 377df930be7Sderaadtso the reported substring is one of the empties.) 3785f2a12f2Sflipk.Pp 37902cdb9c2SaaronIf 38002cdb9c2Saaron.Dv REG_STARTEND 38102cdb9c2Saaronis specified, 3825f2a12f2Sflipk.Fa pmatch 383df930be7Sderaadtmust point to at least one 384*d32639f6Sjmc.Vt regmatch_t 385df930be7Sderaadt(even if 3865f2a12f2Sflipk.Fa nmatch 38702cdb9c2Saaronis 0 or 38802cdb9c2Saaron.Dv REG_NOSUB 38902cdb9c2Saaronwas specified), 39002cdb9c2Saaronto hold the input offsets for 39102cdb9c2Saaron.Dv REG_STARTEND . 392df930be7SderaadtUse for output is still entirely controlled by 3935f2a12f2Sflipk.Fa nmatch ; 394df930be7Sderaadtif 3955f2a12f2Sflipk.Fa nmatch 39602cdb9c2Saaronis 0 or 39702cdb9c2Saaron.Dv REG_NOSUB 39802cdb9c2Saaronwas specified, 399df930be7Sderaadtthe value of 4005f2a12f2Sflipk.Fa pmatch[0] 401df930be7Sderaadtwill not be changed by a successful 4025f2a12f2Sflipk.Fn regexec . 4035f2a12f2Sflipk.Pp 404960f8fbdSderaadt.Fn regerror 405df930be7Sderaadtmaps a non-zero 4065f2a12f2Sflipk.Va errcode 407df930be7Sderaadtfrom either 4085f2a12f2Sflipk.Fn regcomp 409df930be7Sderaadtor 4105f2a12f2Sflipk.Fn regexec 411df930be7Sderaadtto a human-readable, printable message. 412df930be7SderaadtIf 4135f2a12f2Sflipk.Fa preg 414df930be7Sderaadtis non-NULL, 415df930be7Sderaadtthe error code should have arisen from use of 416df930be7Sderaadtthe 417*d32639f6Sjmc.Vt regex_t 418df930be7Sderaadtpointed to by 4195f2a12f2Sflipk.Fa preg , 420df930be7Sderaadtand if the error code came from 4215f2a12f2Sflipk.Fn regcomp , 422df930be7Sderaadtit should have been the result from the most recent 4235f2a12f2Sflipk.Fn regcomp 424df930be7Sderaadtusing that 425*d32639f6Sjmc.Vt regex_t . 426d301afafSaaron.Pf ( Fn regerror 427df930be7Sderaadtmay be able to supply a more detailed message using information 428df930be7Sderaadtfrom the 429*d32639f6Sjmc.Vt regex_t . ) 430960f8fbdSderaadt.Fn regerror 4311e5ede29Scloderplaces the NUL-terminated message into the buffer pointed to by 4325f2a12f2Sflipk.Fa errbuf , 433df930be7Sderaadtlimiting the length (including the NUL) to at most 4345f2a12f2Sflipk.Fa errbuf_size 435df930be7Sderaadtbytes. 436df930be7SderaadtIf the whole message won't fit, 437df930be7Sderaadtas much of it as will fit before the terminating NUL is supplied. 438df930be7SderaadtIn any case, 439df930be7Sderaadtthe returned value is the size of buffer needed to hold the whole 44002cdb9c2Saaronmessage (including the terminating NUL). 441df930be7SderaadtIf 4425f2a12f2Sflipk.Fa errbuf_size 443df930be7Sderaadtis 0, 4445f2a12f2Sflipk.Fa errbuf 445df930be7Sderaadtis ignored but the return value is still correct. 4465f2a12f2Sflipk.Pp 447df930be7SderaadtIf the 4485f2a12f2Sflipk.Fa errcode 449df930be7Sderaadtgiven to 4505f2a12f2Sflipk.Fn regerror 4511e49e6c5Sschwarzeis first OR'ed with 45202cdb9c2Saaron.Dv REG_ITOA , 45302cdb9c2Saaronthe 45402cdb9c2Saaron.Dq message 45502cdb9c2Saaronthat results is the printable name of the error code, 45602cdb9c2Saarone.g., 45702cdb9c2Saaron.Dq REG_NOMATCH , 458df930be7Sderaadtrather than an explanation thereof. 459df930be7SderaadtIf 4605f2a12f2Sflipk.Fa errcode 46102cdb9c2Saaronis 46202cdb9c2Saaron.Dv REG_ATOI , 463df930be7Sderaadtthen 4645f2a12f2Sflipk.Fa preg 46502cdb9c2Saaronshall be non-null and the 4665f2a12f2Sflipk.Fa re_endp 467df930be7Sderaadtmember of the structure it points to 468df930be7Sderaadtmust point to the printable name of an error code; 469df930be7Sderaadtin this case, the result in 4705f2a12f2Sflipk.Fa errbuf 471df930be7Sderaadtis the decimal digits of 472df930be7Sderaadtthe numeric value of the error code 473df930be7Sderaadt(0 if the name is not recognized). 47402cdb9c2Saaron.Dv REG_ITOA 47502cdb9c2Saaronand 47602cdb9c2Saaron.Dv REG_ATOI 47702cdb9c2Saaronare intended primarily as debugging facilities; 478df930be7Sderaadtthey are extensions, 479d1ce6025Swcobbcompatible with but not specified by 480d1ce6025Swcobb.St -p1003.2 481df930be7Sderaadtand should be used with 482df930be7Sderaadtcaution in software intended to be portable to other systems. 483df930be7SderaadtBe warned also that they are considered experimental and changes are possible. 4845f2a12f2Sflipk.Pp 485960f8fbdSderaadt.Fn regfree 486892a7bb8Saaronfrees any dynamically allocated storage associated with the compiled RE 487df930be7Sderaadtpointed to by 4885f2a12f2Sflipk.Fa preg . 489df930be7SderaadtThe remaining 490*d32639f6Sjmc.Vt regex_t 491df930be7Sderaadtis no longer a valid compiled RE 492df930be7Sderaadtand the effect of supplying it to 4935f2a12f2Sflipk.Fn regexec 494df930be7Sderaadtor 4955f2a12f2Sflipk.Fn regerror 496df930be7Sderaadtis undefined. 4975f2a12f2Sflipk.Pp 498df930be7SderaadtNone of these functions references global variables except for tables 499df930be7Sderaadtof constants; 500df930be7Sderaadtall are safe for use from multiple threads if the arguments are safe. 5015f2a12f2Sflipk.Sh IMPLEMENTATION CHOICES 502d1ce6025SwcobbThere are a number of decisions that 503d1ce6025Swcobb.St -p1003.2 504d1ce6025Swcobbleaves up to the implementor, 50502cdb9c2Saaroneither by explicitly saying 50602cdb9c2Saaron.Dq undefined 50702cdb9c2Saaronor by virtue of them being 508df930be7Sderaadtforbidden by the RE grammar. 509df930be7SderaadtThis implementation treats them as follows. 5105f2a12f2Sflipk.Pp 511df930be7SderaadtSee 5125f2a12f2Sflipk.Xr re_format 7 513df930be7Sderaadtfor a discussion of the definition of case-independent matching. 5145f2a12f2Sflipk.Pp 515df930be7SderaadtThere is no particular limit on the length of REs, 516df930be7Sderaadtexcept insofar as memory is limited. 517df930be7SderaadtMemory usage is approximately linear in RE size, and largely insensitive 518df930be7Sderaadtto RE complexity, except for bounded repetitions. 5195f2a12f2SflipkSee 5205f2a12f2Sflipk.Sx BUGS 5215f2a12f2Sflipkfor one short RE using them 522df930be7Sderaadtthat will run almost any system out of memory. 5235f2a12f2Sflipk.Pp 524df930be7SderaadtA backslashed character other than one specifically given a magic meaning 525d1ce6025Swcobbby 526d1ce6025Swcobb.St -p1003.2 527d1ce6025Swcobb(such magic meanings occur only in obsolete REs) 528df930be7Sderaadtis taken as an ordinary character. 5295f2a12f2Sflipk.Pp 53002cdb9c2SaaronAny unmatched 53102cdb9c2Saaron.Ql \&[ 53202cdb9c2Saaronis a 53302cdb9c2Saaron.Dv REG_EBRACK 53402cdb9c2Saaronerror. 5355f2a12f2Sflipk.Pp 536df930be7SderaadtEquivalence classes cannot begin or end bracket-expression ranges. 537df930be7SderaadtThe endpoint of one range cannot begin another. 5385f2a12f2Sflipk.Pp 539df930be7SderaadtRE_DUP_MAX, the limit on repetition counts in bounded repetitions, is 255. 5405f2a12f2Sflipk.Pp 541df930be7SderaadtA repetition operator (?, *, +, or bounds) cannot follow another 542df930be7Sderaadtrepetition operator. 543df930be7SderaadtA repetition operator cannot begin an expression or subexpression 54402cdb9c2Saaronor follow 54502cdb9c2Saaron.Ql ^ 54602cdb9c2Saaronor 54702cdb9c2Saaron.Ql | . 5485f2a12f2Sflipk.Pp 54902cdb9c2SaaronA 55002cdb9c2Saaron.Ql | 55102cdb9c2Saaroncannot appear first or last in a (sub)expression, or after another 55202cdb9c2Saaron.Ql | , 55302cdb9c2Saaroni.e., an operand of 55402cdb9c2Saaron.Ql | 55502cdb9c2Saaroncannot be an empty subexpression. 55602cdb9c2SaaronAn empty parenthesized subexpression, 55702cdb9c2Saaron.Ql \&(\&) , 55802cdb9c2Saaronis legal and matches an 559df930be7Sderaadtempty (sub)string. 560df930be7SderaadtAn empty string is not a legal RE. 5615f2a12f2Sflipk.Pp 56202cdb9c2SaaronA 56302cdb9c2Saaron.Ql { 56402cdb9c2Saaronfollowed by a digit is considered the beginning of bounds for a 565df930be7Sderaadtbounded repetition, which must then follow the syntax for bounds. 56602cdb9c2SaaronA 56702cdb9c2Saaron.Ql { 56802cdb9c2Saaron.Em not 56902cdb9c2Saaronfollowed by a digit is considered an ordinary character. 5705f2a12f2Sflipk.Pp 57102cdb9c2Saaron.Ql ^ 57202cdb9c2Saaronand 57302cdb9c2Saaron.Ql $ 57402cdb9c2Saaronbeginning and ending subexpressions in obsolete 57502cdb9c2Saaron.Pq Dq basic 576df930be7SderaadtREs are anchors, not ordinary characters. 5775f2a12f2Sflipk.Sh DIAGNOSTICS 578df930be7SderaadtNon-zero error codes from 5795f2a12f2Sflipk.Fn regcomp 580df930be7Sderaadtand 5815f2a12f2Sflipk.Fn regexec 582df930be7Sderaadtinclude the following: 5835f2a12f2Sflipk.Pp 5845f2a12f2Sflipk.Bl -tag -compact -width XREG_ECOLLATEX 5855f2a12f2Sflipk.It Er REG_NOMATCH 5861e49e6c5Sschwarze.Fn regexec 5871e49e6c5Sschwarzefailed to match 5885f2a12f2Sflipk.It Er REG_BADPAT 5895f2a12f2Sflipkinvalid regular expression 5905f2a12f2Sflipk.It Er REG_ECOLLATE 5915f2a12f2Sflipkinvalid collating element 5925f2a12f2Sflipk.It Er REG_ECTYPE 5935f2a12f2Sflipkinvalid character class 5945f2a12f2Sflipk.It Er REG_EESCAPE 5955f2a12f2Sflipk\e applied to unescapable character 5965f2a12f2Sflipk.It Er REG_ESUBREG 5975f2a12f2Sflipkinvalid backreference number 5985f2a12f2Sflipk.It Er REG_EBRACK 5995f2a12f2Sflipkbrackets [ ] not balanced 6005f2a12f2Sflipk.It Er REG_EPAREN 6015f2a12f2Sflipkparentheses ( ) not balanced 6025f2a12f2Sflipk.It Er REG_EBRACE 6035f2a12f2Sflipkbraces { } not balanced 6045f2a12f2Sflipk.It Er REG_BADBR 6055f2a12f2Sflipkinvalid repetition count(s) in { } 6065f2a12f2Sflipk.It Er REG_ERANGE 6075f2a12f2Sflipkinvalid character range in [ ] 6085f2a12f2Sflipk.It Er REG_ESPACE 6095f2a12f2Sflipkran out of memory 6105f2a12f2Sflipk.It Er REG_BADRPT 6115f2a12f2Sflipk?, *, or + operand invalid 6125f2a12f2Sflipk.It Er REG_EMPTY 6135f2a12f2Sflipkempty (sub)expression 6145f2a12f2Sflipk.It Er REG_ASSERT 61502cdb9c2Saaron.Dq can't happen 61602cdb9c2Saaron\(emyou found a bug 6175f2a12f2Sflipk.It Er REG_INVARG 61842671979Saaroninvalid argument, e.g., negative-length string 6195f2a12f2Sflipk.El 620628fcf47Sjmc.Sh SEE ALSO 621628fcf47Sjmc.Xr grep 1 , 622628fcf47Sjmc.Xr re_format 7 623628fcf47Sjmc.Pp 624628fcf47Sjmc.St -p1003.2 , 625628fcf47Sjmcsections 2.8 (Regular Expression Notation) 626628fcf47Sjmcand 627628fcf47SjmcB.5 (C Binding for Regular Expression Matching). 6285f2a12f2Sflipk.Sh HISTORY 629878b8cfaSschwarzePredecessors called 63023fbb714Sschwarze.Fn regcmp 63123fbb714Sschwarzeand 632878b8cfaSschwarze.Fn regex 633878b8cfaSschwarzefirst appeared in PWB/UNIX 1.0. 63423fbb714Sschwarze.Pp 635878b8cfaSschwarzePredecessors 636878b8cfaSschwarze.Fn re_comp 637878b8cfaSschwarzeand 638878b8cfaSschwarze.Fn re_exec 639878b8cfaSschwarzefirst appeared in 640878b8cfaSschwarze.Bx 4.0 , 641878b8cfaSschwarzebecame part of 642878b8cfaSschwarze.In unistd.h 643878b8cfaSschwarzein 644878b8cfaSschwarze.Bx 4.4 , 645878b8cfaSschwarzeand were deleted after 646878b8cfaSschwarze.Ox 5.4 . 647878b8cfaSschwarze.Pp 648878b8cfaSschwarzeFunctions called 649878b8cfaSschwarze.Fn regcomp , 650878b8cfaSschwarze.Fn regexec , 651878b8cfaSschwarze.Fn regerror , 652878b8cfaSschwarzeand 653878b8cfaSschwarze.Fn regsub 654878b8cfaSschwarzefirst appeared in Version\~8 655878b8cfaSschwarze.At , 656878b8cfaSschwarzewere reimplemented and declared in 657878b8cfaSschwarze.In regexp.h 658878b8cfaSschwarzefor 659878b8cfaSschwarze.Bx 4.3 Tahoe , 660878b8cfaSschwarzeand were also deleted after 661878b8cfaSschwarze.Ox 5.4 . 662878b8cfaSschwarze.Pp 663878b8cfaSschwarzeTaking different arguments, the POSIX 664878b8cfaSschwarze.In regex.h 665878b8cfaSschwarzefunctions 666878b8cfaSschwarze.Fn regcomp , 667878b8cfaSschwarze.Fn regexec , 668878b8cfaSschwarze.Fn regerror , 669878b8cfaSschwarzeand 670878b8cfaSschwarze.Fn regfree 671878b8cfaSschwarzeappeared in 672878b8cfaSschwarze.Bx 4.4 . 673878b8cfaSschwarze.Sh AUTHORS 674878b8cfaSschwarze.An -nosplit 675878b8cfaSschwarzeThe 676878b8cfaSschwarzeVersion\~8 677878b8cfaSschwarze.At 678878b8cfaSschwarzecode was implemented by 679878b8cfaSschwarze.An Rob Pike 680878b8cfaSschwarzeand extracted into a library by 681878b8cfaSschwarze.An Dave Presotto . 682878b8cfaSschwarzeThe 683878b8cfaSschwarze.Bx 4.3 Tahoe 684878b8cfaSschwarzeand 685a873166dSmickey.Bx 4.4 686878b8cfaSschwarzeversions were both written by 687878b8cfaSschwarze.An Henry Spencer . 6885f2a12f2Sflipk.Sh BUGS 689df930be7SderaadtThe implementation of internationalization is incomplete: 690d1ce6025Swcobbthe locale is always assumed to be the default one of 691d1ce6025Swcobb.St -p1003.2 , 692df930be7Sderaadtand only the collating elements etc. of that locale are available. 6935f2a12f2Sflipk.Pp 694df930be7SderaadtThe back-reference code is subtle and doubts linger about its correctness 695df930be7Sderaadtin complex cases. 6965f2a12f2Sflipk.Pp 697960f8fbdSderaadt.Fn regexec 698df930be7Sderaadtperformance is poor. 699df930be7SderaadtThis will improve with later releases. 700960f8fbdSderaadt.Fa nmatch 701df930be7Sderaadtexceeding 0 is expensive; 7025f2a12f2Sflipk.Fa nmatch 703df930be7Sderaadtexceeding 1 is worse. 704960f8fbdSderaadt.Fn regexec 7055f2a12f2Sflipkis largely insensitive to RE complexity 7065f2a12f2Sflipk.Em except 7075f2a12f2Sflipkthat back references are massively expensive. 708df930be7SderaadtRE length does matter; in particular, there is a strong speed bonus 709df930be7Sderaadtfor keeping RE length under about 30 characters, 710df930be7Sderaadtwith most special characters counting roughly double. 7115f2a12f2Sflipk.Pp 712960f8fbdSderaadt.Fn regcomp 713df930be7Sderaadtimplements bounded repetitions by macro expansion, 714df930be7Sderaadtwhich is costly in time and space if counts are large 715df930be7Sderaadtor bounded repetitions are nested. 71602cdb9c2SaaronA RE like, say, 71702cdb9c2Saaron.Dq ((((a{1,100}){1,100}){1,100}){1,100}){1,100} 718df930be7Sderaadtwill (eventually) run almost any existing machine out of swap space. 7195f2a12f2Sflipk.Pp 720df930be7SderaadtThere are suspected problems with response to obscure error conditions. 721df930be7SderaadtNotably, 722df930be7Sderaadtcertain kinds of internal overflow, 723df930be7Sderaadtproduced only by truly enormous REs or by multiply nested bounded repetitions, 724df930be7Sderaadtare probably not handled well. 7255f2a12f2Sflipk.Pp 726d1ce6025SwcobbDue to a mistake in 727d1ce6025Swcobb.St -p1003.2 , 728d1ce6025Swcobbthings like 72902cdb9c2Saaron.Ql a)b 73002cdb9c2Saaronare legal REs because 73102cdb9c2Saaron.Ql \&) 73202cdb9c2Saaronis 73302cdb9c2Saarona special character only in the presence of a previous unmatched 73402cdb9c2Saaron.Ql \&( . 735df930be7SderaadtThis can't be fixed until the spec is fixed. 7365f2a12f2Sflipk.Pp 737df930be7SderaadtThe standard's definition of back references is vague. 738df930be7SderaadtFor example, does 73902cdb9c2Saaron.Dq a\e(\e(b\e)*\e2\e)*d 74002cdb9c2Saaronmatch 74102cdb9c2Saaron.Dq abbbd ? 742df930be7SderaadtUntil the standard is clarified, 743df930be7Sderaadtbehavior in such cases should not be relied on. 7445f2a12f2Sflipk.Pp 745df930be7SderaadtThe implementation of word-boundary matching is a bit of a kludge, 746df930be7Sderaadtand bugs may lurk in combinations of word-boundary matching and anchoring. 747