1*9eae2935Sderaadt.\" $OpenBSD: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $ 259355b5aSreyk.\" 359355b5aSreyk.\" Copyright (c) 2015 Reyk Floeter <reyk@openbsd.org> 459355b5aSreyk.\" Copyright (C) 1994-2015 Lua.org, PUC-Rio. 559355b5aSreyk.\" 659355b5aSreyk.\" Permission is hereby granted, free of charge, to any person obtaining 759355b5aSreyk.\" a copy of this software and associated documentation files (the 859355b5aSreyk.\" "Software"), to deal in the Software without restriction, including 959355b5aSreyk.\" without limitation the rights to use, copy, modify, merge, publish, 1059355b5aSreyk.\" distribute, sublicense, and/or sell copies of the Software, and to 1159355b5aSreyk.\" permit persons to whom the Software is furnished to do so, subject to 1259355b5aSreyk.\" the following conditions: 1359355b5aSreyk.\" 1459355b5aSreyk.\" The above copyright notice and this permission notice shall be 1559355b5aSreyk.\" included in all copies or substantial portions of the Software. 1659355b5aSreyk.\" 1759355b5aSreyk.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 1859355b5aSreyk.\" EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 1959355b5aSreyk.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 2059355b5aSreyk.\" IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 2159355b5aSreyk.\" CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 2259355b5aSreyk.\" TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 2359355b5aSreyk.\" SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 2459355b5aSreyk.\" 2559355b5aSreyk.\" Derived from section 6.4.1 in manual.html of Lua 5.3.1: 26*9eae2935Sderaadt.\" $Id: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $ 2759355b5aSreyk.\" 28*9eae2935Sderaadt.Dd $Mdocdate: November 8 2023 $ 2959355b5aSreyk.Dt PATTERNS 7 3059355b5aSreyk.Os 3159355b5aSreyk.Sh NAME 3259355b5aSreyk.Nm patterns 33f2d3a880Sjmc.Nd Lua's pattern matching rules 3459355b5aSreyk.Sh DESCRIPTION 3559355b5aSreykPattern matching in 3659355b5aSreyk.Xr httpd 8 3759355b5aSreykis based on the implementation of the Lua scripting language and 38f2d3a880Sjmcprovides a simple and fast alternative to the regular expressions (REs) that 3959355b5aSreykare described in 4059355b5aSreyk.Xr re_format 7 . 4159355b5aSreykPatterns are described by regular strings, which are interpreted as 4259355b5aSreykpatterns by the pattern-matching 4359355b5aSreyk.Dq find 4459355b5aSreykand 4559355b5aSreyk.Dq match 4659355b5aSreykfunctions. 4759355b5aSreykThis document describes the syntax and the meaning (that is, what they 4859355b5aSreykmatch) of these strings. 4959355b5aSreyk.Sh CHARACTER CLASS 5059355b5aSreykA character class is used to represent a set of characters. 5159355b5aSreykThe following combinations are allowed in describing a character 5259355b5aSreykclass: 5359355b5aSreyk.Bl -tag -width Ds 5459355b5aSreyk.It Ar x 5559355b5aSreyk(where 5659355b5aSreyk.Ar x 5759355b5aSreykis not one of the magic characters 5859355b5aSreyk.Sq ^$()%.[]*+-? ) 5959355b5aSreykrepresents the character 6059355b5aSreyk.Ar x 6159355b5aSreykitself. 6259355b5aSreyk.It . 6359355b5aSreyk(a dot) represents all characters. 6459355b5aSreyk.It %a 6559355b5aSreykrepresents all letters. 6659355b5aSreyk.It %c 6759355b5aSreykrepresents all control characters. 6859355b5aSreyk.It %d 6959355b5aSreykrepresents all digits. 7059355b5aSreyk.It %g 7159355b5aSreykrepresents all printable characters except space. 7259355b5aSreyk.It %l 7359355b5aSreykrepresents all lowercase letters. 7459355b5aSreyk.It %p 7559355b5aSreykrepresents all punctuation characters. 7659355b5aSreyk.It %s 7759355b5aSreykrepresents all space characters. 7859355b5aSreyk.It %u 7959355b5aSreykrepresents all uppercase letters. 8059355b5aSreyk.It %w 8159355b5aSreykrepresents all alphanumeric characters. 8259355b5aSreyk.It %x 8359355b5aSreykrepresents all hexadecimal digits. 8459355b5aSreyk.It Pf % Ar x 8559355b5aSreyk(where 8659355b5aSreyk.Ar x 8759355b5aSreykis any non-alphanumeric character) represents the character 8859355b5aSreyk.Ar x . 8959355b5aSreykThis is the standard way to escape the magic characters. 9059355b5aSreykAny non-alphanumeric character (including all punctuation characters, 9159355b5aSreykeven the non-magical) can be preceded by a 92f2d3a880Sjmc.Sq % 9359355b5aSreykwhen used to represent itself in a pattern. 9459355b5aSreyk.It Bq Ar set 9559355b5aSreykrepresents the class which is the union of all 9659355b5aSreykcharacters in 9759355b5aSreyk.Ar set . 9859355b5aSreykA range of characters can be specified by separating the end 9959355b5aSreykcharacters of the range, in ascending order, with a 10059355b5aSreyk.Sq - . 10159355b5aSreykAll classes 10259355b5aSreyk.Sq Ar %x 10359355b5aSreykdescribed above can also be used as components in 10459355b5aSreyk.Ar set . 10559355b5aSreykAll other characters in 10659355b5aSreyk.Ar set 10759355b5aSreykrepresent themselves. 10859355b5aSreykFor example, 10959355b5aSreyk.Sq [%w_] 11059355b5aSreyk(or 11159355b5aSreyk.Sq [_%w] ) 11259355b5aSreykrepresents all alphanumeric characters plus the underscore, 11359355b5aSreyk.Sq [0-7] 11459355b5aSreykrepresents the octal digits, 11559355b5aSreykand 11659355b5aSreyk.Sq [0-7%l%-] 11759355b5aSreykrepresents the octal digits plus the lowercase letters plus the 11859355b5aSreyk.Sq - 11959355b5aSreykcharacter. 12059355b5aSreyk.Pp 12159355b5aSreykThe interaction between ranges and classes is not defined. 12259355b5aSreykTherefore, patterns like 12359355b5aSreyk.Sq [%a-z] 12459355b5aSreykor 12559355b5aSreyk.Sq [a-%%] 12659355b5aSreykhave no meaning. 12759355b5aSreyk.It Bq Ar ^set 12859355b5aSreykrepresents the complement of 12959355b5aSreyk.Ar set , 13059355b5aSreykwhere 13159355b5aSreyk.Ar set 13259355b5aSreykis interpreted as above. 13359355b5aSreyk.El 13459355b5aSreyk.Pp 13559355b5aSreykFor all classes represented by single letters ( 13659355b5aSreyk.Sq %a , 13759355b5aSreyk.Sq %c , 13859355b5aSreyketc.), 13959355b5aSreykthe corresponding uppercase letter represents the complement of the class. 14059355b5aSreykFor instance, 14159355b5aSreyk.Sq %S 14259355b5aSreykrepresents all non-space characters. 14359355b5aSreyk.Pp 14459355b5aSreykThe definitions of letter, space, and other character groups depend on 14559355b5aSreykthe current locale. 14659355b5aSreykIn particular, the class 14759355b5aSreyk.Sq [a-z] 14859355b5aSreykmay not be equivalent to 14959355b5aSreyk.Sq %l . 15059355b5aSreyk.Sh PATTERN ITEM 15159355b5aSreykA pattern item can be 15259355b5aSreyk.Bl -bullet 15359355b5aSreyk.It 15459355b5aSreyka single character class, which matches any single character in the class; 15559355b5aSreyk.It 15659355b5aSreyka single character class followed by 15759355b5aSreyk.Sq * , 15859355b5aSreykwhich matches zero or more repetitions of characters in the class. 15959355b5aSreykThese repetition items will always match the longest possible sequence; 16059355b5aSreyk.It 16159355b5aSreyka single character class followed by 16259355b5aSreyk.Sq + , 16359355b5aSreykwhich matches one or more repetitions of characters in the class. 16459355b5aSreykThese repetition items will always match the longest possible sequence; 16559355b5aSreyk.It 16659355b5aSreyka single character class followed by 16759355b5aSreyk.Sq - , 16859355b5aSreykwhich also matches zero or more repetitions of characters in the class. 16959355b5aSreykUnlike 17059355b5aSreyk.Sq * , 17159355b5aSreykthese repetition items will always match the shortest possible sequence; 17259355b5aSreyk.It 17359355b5aSreyka single character class followed by 174f2d3a880Sjmc.Sq \&? , 17559355b5aSreykwhich matches zero or one occurrence of a character in the class. 17659355b5aSreykIt always matches one occurrence if possible; 17759355b5aSreyk.It 17859355b5aSreyk.Sq Pf % Ar n , 17959355b5aSreykfor 18059355b5aSreyk.Ar n 18159355b5aSreykbetween 1 and 9; 18259355b5aSreyksuch item matches a substring equal to the n-th captured string (see below); 18359355b5aSreyk.It 18459355b5aSreyk.Sq Pf %b Ar xy , 18559355b5aSreykwhere 18659355b5aSreyk.Ar x 18759355b5aSreykand 18859355b5aSreyk.Ar y 18959355b5aSreykare two distinct characters; 19059355b5aSreyksuch item matches strings that start with 19159355b5aSreyk.Ar x , 19259355b5aSreykend with 19359355b5aSreyk.Ar y , 19459355b5aSreykand where the 19559355b5aSreyk.Ar x 19659355b5aSreykand 19759355b5aSreyk.Ar y 19859355b5aSreykare 19959355b5aSreyk.Em balanced . 200f2d3a880SjmcThis means that if one reads the string from left to right, counting 20159355b5aSreyk.Em +1 20259355b5aSreykfor an 20359355b5aSreyk.Ar x 20459355b5aSreykand 20559355b5aSreyk.Em -1 20659355b5aSreykfor a 20759355b5aSreyk.Ar y , 20859355b5aSreykthe ending 20959355b5aSreyk.Ar y 21059355b5aSreykis the first 21159355b5aSreyk.Ar y 21259355b5aSreykwhere the count reaches 0. 21359355b5aSreykFor instance, the item 21459355b5aSreyk.Sq %b() 21559355b5aSreykmatches expressions with balanced parentheses. 21659355b5aSreyk.It 21759355b5aSreyk.Sq Pf %f Bq Ar set , 21859355b5aSreyka 21959355b5aSreyk.Em frontier pattern ; 22059355b5aSreyksuch item matches an empty string at any position such that the next 22159355b5aSreykcharacter belongs to 22259355b5aSreyk.Ar set 22359355b5aSreykand the previous character does not belong to 22459355b5aSreyk.Ar set . 22559355b5aSreykThe set 22659355b5aSreyk.Ar set 22759355b5aSreykis interpreted as previously described. 22859355b5aSreykThe beginning and the end of the subject are handled as if 22959355b5aSreykthey were the character 23059355b5aSreyk.Sq \e0 . 23159355b5aSreyk.El 23259355b5aSreyk.Sh PATTERN 23359355b5aSreykA pattern is a sequence of pattern items. 23459355b5aSreykA caret 23559355b5aSreyk.Sq ^ 23659355b5aSreykat the beginning of a pattern anchors the match at the beginning of 23759355b5aSreykthe subject string. 23859355b5aSreykA 239f2d3a880Sjmc.Sq $ 24059355b5aSreykat the end of a pattern anchors the match at the end of the subject string. 24159355b5aSreykAt other positions, 24259355b5aSreyk.Sq ^ 24359355b5aSreykand 244f2d3a880Sjmc.Sq $ 24559355b5aSreykhave no special meaning and represent themselves. 24659355b5aSreyk.Sh CAPTURES 24759355b5aSreykA pattern can contain sub-patterns enclosed in parentheses; they 24859355b5aSreykdescribe captures. 24959355b5aSreykWhen a match succeeds, the substrings of the subject string that match 25059355b5aSreykcaptures are stored (captured) for future use. 25159355b5aSreykCaptures are numbered according to their left parentheses. 25259355b5aSreykFor instance, in the pattern 25359355b5aSreyk.Qq (a*(.)%w(%s*)) , 25459355b5aSreykthe part of the string matching 25559355b5aSreyk.Qq a*(.)%w(%s*) 25659355b5aSreykis stored as the first capture (and therefore has number 1); 25759355b5aSreykthe character matching 25812c43f40Sschwarze.Qq \&. 25959355b5aSreykis captured with number 2, 26059355b5aSreykand the part matching 26159355b5aSreyk.Qq %s* 26259355b5aSreykhas number 3. 26359355b5aSreyk.Pp 26459355b5aSreykAs a special case, the empty capture 26559355b5aSreyk.Sq () 26659355b5aSreykcaptures the current string position (a number). 26759355b5aSreykFor instance, if we apply the pattern 26859355b5aSreyk.Qq ()aa() 26959355b5aSreykon the string 27059355b5aSreyk.Qq flaaap , 271b9ef24f3Ssemariethere will be two captures: 2 and 4. 27259355b5aSreyk.Sh SEE ALSO 27359355b5aSreyk.Xr fnmatch 3 , 274f2d3a880Sjmc.Xr re_format 7 , 275f2d3a880Sjmc.Xr httpd 8 27659355b5aSreyk.Rs 27759355b5aSreyk.%A Roberto Ierusalimschy 27859355b5aSreyk.%A Luiz Henrique de Figueiredo 27959355b5aSreyk.%A Waldemar Celes 28059355b5aSreyk.%Q Lua.org 28159355b5aSreyk.%Q PUC-Rio 28259355b5aSreyk.%D June 2015 28359355b5aSreyk.%R Lua 5.3 Reference Manual 28459355b5aSreyk.%T Patterns 285d905fc10Sjsg.%U https://www.lua.org/manual/5.3/manual.html#6.4.1 28659355b5aSreyk.Re 28759355b5aSreyk.Sh HISTORY 28859355b5aSreykThe first implementation of the pattern rules were introduced with Lua 2.5. 28959355b5aSreykAlmost twenty years later, 29059355b5aSreykan implementation based on Lua 5.3.1 appeared in 29159355b5aSreyk.Ox 5.8 . 29259355b5aSreyk.Sh AUTHORS 29359355b5aSreykThe pattern matching is derived from the original implementation of 294f2d3a880Sjmcthe Lua scripting language written by 29559355b5aSreyk.An -nosplit 29659355b5aSreyk.An Roberto Ierusalimschy , 29759355b5aSreyk.An Waldemar Celes , 29859355b5aSreykand 29959355b5aSreyk.An Luiz Henrique de Figueiredo 30059355b5aSreykat PUC-Rio. 30159355b5aSreykIt was turned into a native C API for 30259355b5aSreyk.Xr httpd 8 30359355b5aSreykby 30459355b5aSreyk.An Reyk Floeter Aq Mt reyk@openbsd.org . 3052a319befSsemarie.Sh CAVEATS 3062a319befSsemarieA notable difference with the Lua implementation is the position in the string 3074f67bbbbSjmcreturned by captures. 3084f67bbbbSjmcIt follows the C-style indexing (position starting from 0) 3094f67bbbbSjmcinstead of Lua-style indexing (position starting from 1). 310