xref: /openbsd-src/usr.sbin/httpd/patterns.7 (revision 9eae2935d282032ec0783176a11757cdac4c15b1)
1*9eae2935Sderaadt.\"	$OpenBSD: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $
259355b5aSreyk.\"
359355b5aSreyk.\" Copyright (c) 2015 Reyk Floeter <reyk@openbsd.org>
459355b5aSreyk.\" Copyright (C) 1994-2015 Lua.org, PUC-Rio.
559355b5aSreyk.\"
659355b5aSreyk.\" Permission is hereby granted, free of charge, to any person obtaining
759355b5aSreyk.\" a copy of this software and associated documentation files (the
859355b5aSreyk.\" "Software"), to deal in the Software without restriction, including
959355b5aSreyk.\" without limitation the rights to use, copy, modify, merge, publish,
1059355b5aSreyk.\" distribute, sublicense, and/or sell copies of the Software, and to
1159355b5aSreyk.\" permit persons to whom the Software is furnished to do so, subject to
1259355b5aSreyk.\" the following conditions:
1359355b5aSreyk.\"
1459355b5aSreyk.\" The above copyright notice and this permission notice shall be
1559355b5aSreyk.\" included in all copies or substantial portions of the Software.
1659355b5aSreyk.\"
1759355b5aSreyk.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
1859355b5aSreyk.\" EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
1959355b5aSreyk.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
2059355b5aSreyk.\" IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
2159355b5aSreyk.\" CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
2259355b5aSreyk.\" TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
2359355b5aSreyk.\" SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2459355b5aSreyk.\"
2559355b5aSreyk.\" Derived from section 6.4.1 in manual.html of Lua 5.3.1:
26*9eae2935Sderaadt.\" $Id: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $
2759355b5aSreyk.\"
28*9eae2935Sderaadt.Dd $Mdocdate: November 8 2023 $
2959355b5aSreyk.Dt PATTERNS 7
3059355b5aSreyk.Os
3159355b5aSreyk.Sh NAME
3259355b5aSreyk.Nm patterns
33f2d3a880Sjmc.Nd Lua's pattern matching rules
3459355b5aSreyk.Sh DESCRIPTION
3559355b5aSreykPattern matching in
3659355b5aSreyk.Xr httpd 8
3759355b5aSreykis based on the implementation of the Lua scripting language and
38f2d3a880Sjmcprovides a simple and fast alternative to the regular expressions (REs) that
3959355b5aSreykare described in
4059355b5aSreyk.Xr re_format 7 .
4159355b5aSreykPatterns are described by regular strings, which are interpreted as
4259355b5aSreykpatterns by the pattern-matching
4359355b5aSreyk.Dq find
4459355b5aSreykand
4559355b5aSreyk.Dq match
4659355b5aSreykfunctions.
4759355b5aSreykThis document describes the syntax and the meaning (that is, what they
4859355b5aSreykmatch) of these strings.
4959355b5aSreyk.Sh CHARACTER CLASS
5059355b5aSreykA character class is used to represent a set of characters.
5159355b5aSreykThe following combinations are allowed in describing a character
5259355b5aSreykclass:
5359355b5aSreyk.Bl -tag -width Ds
5459355b5aSreyk.It Ar x
5559355b5aSreyk(where
5659355b5aSreyk.Ar x
5759355b5aSreykis not one of the magic characters
5859355b5aSreyk.Sq ^$()%.[]*+-? )
5959355b5aSreykrepresents the character
6059355b5aSreyk.Ar x
6159355b5aSreykitself.
6259355b5aSreyk.It .
6359355b5aSreyk(a dot) represents all characters.
6459355b5aSreyk.It %a
6559355b5aSreykrepresents all letters.
6659355b5aSreyk.It %c
6759355b5aSreykrepresents all control characters.
6859355b5aSreyk.It %d
6959355b5aSreykrepresents all digits.
7059355b5aSreyk.It %g
7159355b5aSreykrepresents all printable characters except space.
7259355b5aSreyk.It %l
7359355b5aSreykrepresents all lowercase letters.
7459355b5aSreyk.It %p
7559355b5aSreykrepresents all punctuation characters.
7659355b5aSreyk.It %s
7759355b5aSreykrepresents all space characters.
7859355b5aSreyk.It %u
7959355b5aSreykrepresents all uppercase letters.
8059355b5aSreyk.It %w
8159355b5aSreykrepresents all alphanumeric characters.
8259355b5aSreyk.It %x
8359355b5aSreykrepresents all hexadecimal digits.
8459355b5aSreyk.It Pf % Ar x
8559355b5aSreyk(where
8659355b5aSreyk.Ar x
8759355b5aSreykis any non-alphanumeric character) represents the character
8859355b5aSreyk.Ar x .
8959355b5aSreykThis is the standard way to escape the magic characters.
9059355b5aSreykAny non-alphanumeric character (including all punctuation characters,
9159355b5aSreykeven the non-magical) can be preceded by a
92f2d3a880Sjmc.Sq %
9359355b5aSreykwhen used to represent itself in a pattern.
9459355b5aSreyk.It Bq Ar set
9559355b5aSreykrepresents the class which is the union of all
9659355b5aSreykcharacters in
9759355b5aSreyk.Ar set .
9859355b5aSreykA range of characters can be specified by separating the end
9959355b5aSreykcharacters of the range, in ascending order, with a
10059355b5aSreyk.Sq - .
10159355b5aSreykAll classes
10259355b5aSreyk.Sq Ar %x
10359355b5aSreykdescribed above can also be used as components in
10459355b5aSreyk.Ar set .
10559355b5aSreykAll other characters in
10659355b5aSreyk.Ar set
10759355b5aSreykrepresent themselves.
10859355b5aSreykFor example,
10959355b5aSreyk.Sq [%w_]
11059355b5aSreyk(or
11159355b5aSreyk.Sq [_%w] )
11259355b5aSreykrepresents all alphanumeric characters plus the underscore,
11359355b5aSreyk.Sq [0-7]
11459355b5aSreykrepresents the octal digits,
11559355b5aSreykand
11659355b5aSreyk.Sq [0-7%l%-]
11759355b5aSreykrepresents the octal digits plus the lowercase letters plus the
11859355b5aSreyk.Sq -
11959355b5aSreykcharacter.
12059355b5aSreyk.Pp
12159355b5aSreykThe interaction between ranges and classes is not defined.
12259355b5aSreykTherefore, patterns like
12359355b5aSreyk.Sq [%a-z]
12459355b5aSreykor
12559355b5aSreyk.Sq [a-%%]
12659355b5aSreykhave no meaning.
12759355b5aSreyk.It Bq Ar ^set
12859355b5aSreykrepresents the complement of
12959355b5aSreyk.Ar set ,
13059355b5aSreykwhere
13159355b5aSreyk.Ar set
13259355b5aSreykis interpreted as above.
13359355b5aSreyk.El
13459355b5aSreyk.Pp
13559355b5aSreykFor all classes represented by single letters (
13659355b5aSreyk.Sq %a ,
13759355b5aSreyk.Sq %c ,
13859355b5aSreyketc.),
13959355b5aSreykthe corresponding uppercase letter represents the complement of the class.
14059355b5aSreykFor instance,
14159355b5aSreyk.Sq %S
14259355b5aSreykrepresents all non-space characters.
14359355b5aSreyk.Pp
14459355b5aSreykThe definitions of letter, space, and other character groups depend on
14559355b5aSreykthe current locale.
14659355b5aSreykIn particular, the class
14759355b5aSreyk.Sq [a-z]
14859355b5aSreykmay not be equivalent to
14959355b5aSreyk.Sq %l .
15059355b5aSreyk.Sh PATTERN ITEM
15159355b5aSreykA pattern item can be
15259355b5aSreyk.Bl -bullet
15359355b5aSreyk.It
15459355b5aSreyka single character class, which matches any single character in the class;
15559355b5aSreyk.It
15659355b5aSreyka single character class followed by
15759355b5aSreyk.Sq * ,
15859355b5aSreykwhich matches zero or more repetitions of characters in the class.
15959355b5aSreykThese repetition items will always match the longest possible sequence;
16059355b5aSreyk.It
16159355b5aSreyka single character class followed by
16259355b5aSreyk.Sq + ,
16359355b5aSreykwhich matches one or more repetitions of characters in the class.
16459355b5aSreykThese repetition items will always match the longest possible sequence;
16559355b5aSreyk.It
16659355b5aSreyka single character class followed by
16759355b5aSreyk.Sq - ,
16859355b5aSreykwhich also matches zero or more repetitions of characters in the class.
16959355b5aSreykUnlike
17059355b5aSreyk.Sq * ,
17159355b5aSreykthese repetition items will always match the shortest possible sequence;
17259355b5aSreyk.It
17359355b5aSreyka single character class followed by
174f2d3a880Sjmc.Sq \&? ,
17559355b5aSreykwhich matches zero or one occurrence of a character in the class.
17659355b5aSreykIt always matches one occurrence if possible;
17759355b5aSreyk.It
17859355b5aSreyk.Sq Pf % Ar n ,
17959355b5aSreykfor
18059355b5aSreyk.Ar n
18159355b5aSreykbetween 1 and 9;
18259355b5aSreyksuch item matches a substring equal to the n-th captured string (see below);
18359355b5aSreyk.It
18459355b5aSreyk.Sq Pf %b Ar xy ,
18559355b5aSreykwhere
18659355b5aSreyk.Ar x
18759355b5aSreykand
18859355b5aSreyk.Ar y
18959355b5aSreykare two distinct characters;
19059355b5aSreyksuch item matches strings that start with
19159355b5aSreyk.Ar x ,
19259355b5aSreykend with
19359355b5aSreyk.Ar y ,
19459355b5aSreykand where the
19559355b5aSreyk.Ar x
19659355b5aSreykand
19759355b5aSreyk.Ar y
19859355b5aSreykare
19959355b5aSreyk.Em balanced .
200f2d3a880SjmcThis means that if one reads the string from left to right, counting
20159355b5aSreyk.Em +1
20259355b5aSreykfor an
20359355b5aSreyk.Ar x
20459355b5aSreykand
20559355b5aSreyk.Em -1
20659355b5aSreykfor a
20759355b5aSreyk.Ar y ,
20859355b5aSreykthe ending
20959355b5aSreyk.Ar y
21059355b5aSreykis the first
21159355b5aSreyk.Ar y
21259355b5aSreykwhere the count reaches 0.
21359355b5aSreykFor instance, the item
21459355b5aSreyk.Sq %b()
21559355b5aSreykmatches expressions with balanced parentheses.
21659355b5aSreyk.It
21759355b5aSreyk.Sq Pf %f Bq Ar set ,
21859355b5aSreyka
21959355b5aSreyk.Em frontier pattern ;
22059355b5aSreyksuch item matches an empty string at any position such that the next
22159355b5aSreykcharacter belongs to
22259355b5aSreyk.Ar set
22359355b5aSreykand the previous character does not belong to
22459355b5aSreyk.Ar set .
22559355b5aSreykThe set
22659355b5aSreyk.Ar set
22759355b5aSreykis interpreted as previously described.
22859355b5aSreykThe beginning and the end of the subject are handled as if
22959355b5aSreykthey were the character
23059355b5aSreyk.Sq \e0 .
23159355b5aSreyk.El
23259355b5aSreyk.Sh PATTERN
23359355b5aSreykA pattern is a sequence of pattern items.
23459355b5aSreykA caret
23559355b5aSreyk.Sq ^
23659355b5aSreykat the beginning of a pattern anchors the match at the beginning of
23759355b5aSreykthe subject string.
23859355b5aSreykA
239f2d3a880Sjmc.Sq $
24059355b5aSreykat the end of a pattern anchors the match at the end of the subject string.
24159355b5aSreykAt other positions,
24259355b5aSreyk.Sq ^
24359355b5aSreykand
244f2d3a880Sjmc.Sq $
24559355b5aSreykhave no special meaning and represent themselves.
24659355b5aSreyk.Sh CAPTURES
24759355b5aSreykA pattern can contain sub-patterns enclosed in parentheses; they
24859355b5aSreykdescribe captures.
24959355b5aSreykWhen a match succeeds, the substrings of the subject string that match
25059355b5aSreykcaptures are stored (captured) for future use.
25159355b5aSreykCaptures are numbered according to their left parentheses.
25259355b5aSreykFor instance, in the pattern
25359355b5aSreyk.Qq (a*(.)%w(%s*)) ,
25459355b5aSreykthe part of the string matching
25559355b5aSreyk.Qq a*(.)%w(%s*)
25659355b5aSreykis stored as the first capture (and therefore has number 1);
25759355b5aSreykthe character matching
25812c43f40Sschwarze.Qq \&.
25959355b5aSreykis captured with number 2,
26059355b5aSreykand the part matching
26159355b5aSreyk.Qq %s*
26259355b5aSreykhas number 3.
26359355b5aSreyk.Pp
26459355b5aSreykAs a special case, the empty capture
26559355b5aSreyk.Sq ()
26659355b5aSreykcaptures the current string position (a number).
26759355b5aSreykFor instance, if we apply the pattern
26859355b5aSreyk.Qq ()aa()
26959355b5aSreykon the string
27059355b5aSreyk.Qq flaaap ,
271b9ef24f3Ssemariethere will be two captures: 2 and 4.
27259355b5aSreyk.Sh SEE ALSO
27359355b5aSreyk.Xr fnmatch 3 ,
274f2d3a880Sjmc.Xr re_format 7 ,
275f2d3a880Sjmc.Xr httpd 8
27659355b5aSreyk.Rs
27759355b5aSreyk.%A Roberto Ierusalimschy
27859355b5aSreyk.%A Luiz Henrique de Figueiredo
27959355b5aSreyk.%A Waldemar Celes
28059355b5aSreyk.%Q Lua.org
28159355b5aSreyk.%Q PUC-Rio
28259355b5aSreyk.%D June 2015
28359355b5aSreyk.%R Lua 5.3 Reference Manual
28459355b5aSreyk.%T Patterns
285d905fc10Sjsg.%U https://www.lua.org/manual/5.3/manual.html#6.4.1
28659355b5aSreyk.Re
28759355b5aSreyk.Sh HISTORY
28859355b5aSreykThe first implementation of the pattern rules were introduced with Lua 2.5.
28959355b5aSreykAlmost twenty years later,
29059355b5aSreykan implementation based on Lua 5.3.1 appeared in
29159355b5aSreyk.Ox 5.8 .
29259355b5aSreyk.Sh AUTHORS
29359355b5aSreykThe pattern matching is derived from the original implementation of
294f2d3a880Sjmcthe Lua scripting language written by
29559355b5aSreyk.An -nosplit
29659355b5aSreyk.An Roberto Ierusalimschy ,
29759355b5aSreyk.An Waldemar Celes ,
29859355b5aSreykand
29959355b5aSreyk.An Luiz Henrique de Figueiredo
30059355b5aSreykat PUC-Rio.
30159355b5aSreykIt was turned into a native C API for
30259355b5aSreyk.Xr httpd 8
30359355b5aSreykby
30459355b5aSreyk.An Reyk Floeter Aq Mt reyk@openbsd.org .
3052a319befSsemarie.Sh CAVEATS
3062a319befSsemarieA notable difference with the Lua implementation is the position in the string
3074f67bbbbSjmcreturned by captures.
3084f67bbbbSjmcIt follows the C-style indexing (position starting from 0)
3094f67bbbbSjmcinstead of Lua-style indexing (position starting from 1).
310