xref: /netbsd-src/lib/libc/regex/regex.3 (revision edc04df213a0aaf4f17d71a472c70b4c3d4891b3)
1.\" $NetBSD: regex.3,v 1.35 2024/09/24 14:10:43 uwe Exp $
2.\"
3.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
4.\" Copyright (c) 1992, 1993, 1994
5.\"	The Regents of the University of California.  All rights reserved.
6.\"
7.\" This code is derived from software contributed to Berkeley by
8.\" Henry Spencer.
9.\"
10.\" Redistribution and use in source and binary forms, with or without
11.\" modification, are permitted provided that the following conditions
12.\" are met:
13.\" 1. Redistributions of source code must retain the above copyright
14.\"    notice, this list of conditions and the following disclaimer.
15.\" 2. Redistributions in binary form must reproduce the above copyright
16.\"    notice, this list of conditions and the following disclaimer in the
17.\"    documentation and/or other materials provided with the distribution.
18.\" 3. Neither the name of the University nor the names of its contributors
19.\"    may be used to endorse or promote products derived from this software
20.\"    without specific prior written permission.
21.\"
22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32.\" SUCH DAMAGE.
33.\"
34.\"	@(#)regex.3	8.4 (Berkeley) 3/20/94
35.\" $FreeBSD: head/lib/libc/regex/regex.3 363817 2020-08-04 02:06:49Z kevans $
36.\"
37.Dd September 21, 2024
38.Dt REGEX 3
39.Os
40.
41.Sh NAME
42.Nm regcomp ,
43.Nm regexec ,
44.Nm regerror ,
45.Nm regfree ,
46.Nm regasub ,
47.Nm regnsub
48.
49.Nd regular-expression library
50.
51.Sh LIBRARY
52.Lb libc
53.
54.Sh SYNOPSIS
55.
56.In regex.h
57.
58.Ft int
59.Fo regcomp
60.Fa "regex_t * restrict preg"
61.Fa "const char * restrict pattern"
62.Fa "int cflags"
63.Fc
64.
65.Ft int
66.Fo regexec
67.Fa "const regex_t * restrict preg"
68.Fa "const char * restrict string"
69.Fa "size_t nmatch"
70.Fa "regmatch_t pmatch[restrict]"
71.Fa "int eflags"
72.Fc
73.Ft size_t
74.Fo regerror
75.Fa "int errcode"
76.Fa "const regex_t * restrict preg"
77.Fa "char * restrict errbuf"
78.Fa "size_t errbuf_size"
79.Fc
80.
81.Ft void
82.Fn regfree "regex_t *preg"
83.
84.Ft ssize_t
85.Fo regnsub
86.Fa "char *buf"
87.Fa "size_t bufsiz"
88.Fa "const char *sub"
89.Fa "const regmatch_t *rm"
90.Fa "const char *str"
91.Fc
92.
93.Ft ssize_t
94.Fo regasub
95.Fa "char **buf"
96.Fa "const char *sub"
97.Fa "const regmatch_t *rm"
98.Fa "const char *sstr"
99.Fc
100.
101.Sh DESCRIPTION
102These routines implement
103.St -p1003.2
104regular expressions
105.Pq Do Tn RE Dc Ns s ;
106see
107.Xr re_format 7 .
108The
109.Fn regcomp
110function
111compiles an RE written as a string into an internal form,
112.Fn regexec
113matches that internal form against a string and reports results,
114.Fn regerror
115transforms error codes from either into human-readable messages,
116and
117.Fn regfree
118frees any dynamically-allocated storage used by the internal form
119of an RE.
120.Pp
121The header
122.In regex.h
123declares two structure types,
124.Ft regex_t
125and
126.Ft regmatch_t ,
127the former for compiled internal forms and the latter for match reporting.
128It also declares the four functions,
129a type
130.Ft regoff_t ,
131and a number of constants with names starting with
132.Ql REG_ .
133.Pp
134The
135.Fn regcomp
136function
137compiles the regular expression contained in the
138.Fa pattern
139string,
140subject to the flags in
141.Fa cflags ,
142and places the results in the
143.Ft regex_t
144structure pointed to by
145.Fa preg .
146The
147.Fa cflags
148argument
149is the bitwise
150.Em or
151of zero or more of the following flags:
152.Bl -tag -width Dv
153.
154.It Dv REG_EXTENDED
155Compile modern
156.Pq Dq extended
157REs,
158rather than the obsolete
159.Pq Dq basic
160REs that
161are the default.
162.
163.It Dv REG_BASIC
164This is a synonym for 0,
165provided as a counterpart to
166.Dv REG_EXTENDED
167to improve readability.
168.
169.It Dv REG_NOSPEC
170Compile with recognition of all special characters turned off.
171All characters are thus considered ordinary,
172so the
173.Dq RE
174is a literal string.
175This is an extension,
176compatible with but not specified by
177.St -p1003.2 ,
178and should be used with
179caution in software intended to be portable to other systems.
180.Dv REG_EXTENDED
181and
182.Dv REG_NOSPEC
183may not be used
184in the same call to
185.Fn regcomp .
186.
187.It Dv REG_ICASE
188Compile for matching that ignores upper\|/\^lower case distinctions.
189See
190.Xr re_format 7 .
191.
192.It Dv REG_NOSUB
193Compile for matching that need only report success or failure,
194not what was matched.
195.
196.It Dv REG_NEWLINE
197Compile for newline-sensitive matching.
198By default, newline is a completely ordinary character with no special
199meaning in either REs or strings.
200With this flag,
201.Ql \&[^
202bracket expressions and
203.Ql \&.
204never match newline,
205a
206.Ql \&^
207anchor matches the null string after any newline in the string
208in addition to its normal function,
209and the
210.Ql \&$
211anchor matches the null string before any newline in the
212string in addition to its normal function.
213.
214.It Dv REG_PEND
215The regular expression ends,
216not at the first
217.Tn NUL ,
218but just before the character pointed to by the
219.Fa re_endp
220member of the structure pointed to by
221.Fa preg .
222The
223.Fa re_endp
224member is of type
225.Ft "const char *" .
226This flag permits inclusion of
227.Tn NUL Ns s
228in the RE;
229they are considered ordinary characters.
230This is an extension,
231compatible with but not specified by
232.St -p1003.2 ,
233and should be used with
234caution in software intended to be portable to other systems.
235.It Dv REG_GNU
236Include GNU-inspired extensions:
237.Pp
238.Bl -tag -offset indent -width Ds -compact
239.It Ic \e Ns Ar N
240Use backreference
241.Ar N
242where
243.Ar N
244is a single digit number between 1 and 9.
245.It Ic \ea
246Visual Bell
247.It Ic \eb
248Match a position that is a word boundary.
249.It Ic \eB
250Match a position that is a not word boundary.
251.It Ic \ef
252Form Feed
253.It Ic \en
254Line Feed
255.It Ic \er
256Carriage return
257.It Ic \es
258Alias for
259.Ql [[:space:]]
260.It Ic \eS
261Alias for
262.Ql [^[:space:]]
263.It Ic \et
264Horizontal Tab
265.It Ic \ev
266Vertical Tab
267.It Ic \ew
268Alias for
269.Ql [[:alnum:]_]
270.It Ic \eW
271Alias for
272.Ql [^[:alnum:]_]
273.It Ic \e'
274Matches the end of the subject string (the string to be matched).
275.It Ic \e`
276Matches the beginning of the subject string.
277.El
278.Pp
279This is an extension,
280compatible with but not specified by
281.St -p1003.2 ,
282and should be used with
283caution in software intended to be portable to other systems.
284.El
285.Pp
286When successful,
287.Fn regcomp
288returns 0 and fills in the structure pointed to by
289.Fa preg .
290One member of that structure
291.Pq other than Fa re_endp
292is publicized:
293.Fa re_nsub ,
294of type
295.Ft size_t ,
296contains the number of parenthesized subexpressions within the RE
297.Po
298except that the value of this member is undefined if the
299.Dv REG_NOSUB
300flag was used
301.Pc .
302If
303.Fn regcomp
304fails, it returns a non-zero error code;
305see
306.Sx RETURN VALUES .
307.Pp
308The
309.Fn regexec
310function
311matches the compiled RE pointed to by
312.Fa preg
313against the
314.Fa string ,
315subject to the flags in
316.Fa eflags ,
317and reports results using
318.Fa nmatch ,
319.Fa pmatch ,
320and the returned value.
321The RE must have been compiled by a previous invocation of
322.Fn regcomp .
323The compiled form is not altered during execution of
324.Fn regexec ,
325so a single compiled RE can be used simultaneously by multiple threads.
326.Pp
327By default,
328the NUL-terminated string pointed to by
329.Fa string
330is considered to be the text of an entire line, minus any terminating
331newline.
332The
333.Fa eflags
334argument is the bitwise
335.Em or
336of zero or more of the following flags:
337.Bl -tag -width Dv
338.
339.It Dv REG_NOTBOL
340The first character of the string is treated as the continuation
341of a line.
342This means that the anchors
343.Ql \&^ ,
344.Ql [[:<:]] ,
345and
346.Ql \e<
347do not match before it; but see
348.Dv REG_STARTEND
349below.
350This does not affect the behavior of newlines under
351.Dv REG_NEWLINE .
352.
353.It Dv REG_NOTEOL
354The NUL terminating
355the string
356does not end a line, so the
357.Ql \&$
358anchor does not match before it.
359This does not affect the behavior of newlines under
360.Dv REG_NEWLINE .
361.
362.It Dv REG_STARTEND
363The string is considered to start at
364.Sm off
365.Fa string Li " + " Fa pmatch Li [0]. Fa rm_so
366.Sm on
367and to end before the byte located at
368.Sm off
369.Fa string Li " + " Fa pmatch Li [0]. Fa rm_eo ,
370.Sm on
371regardless of the value of
372.Fa nmatch .
373See below for the definition of
374.Fa pmatch
375and
376.Fa nmatch .
377This is an extension,
378compatible with but not specified by
379.St -p1003.2 ,
380and should be used with
381caution in software intended to be portable to other systems.
382.Pp
383Without
384.Dv REG_NOTBOL ,
385the position
386.Fa rm_so
387is considered the beginning of a line, such that
388.Ql \&^
389matches before it, and the beginning of a word if there is a word
390character at this position, such that
391.Ql [[:<:]]
392and
393.Ql \e<
394match before it.
395.Pp
396With
397.Dv REG_NOTBOL ,
398the character at position
399.Fa rm_so
400is treated as the continuation of a line, and if
401.Fa rm_so
402is greater than 0, the preceding character is taken into consideration.
403If the preceding character is a newline and the regular expression was compiled
404with
405.Dv REG_NEWLINE ,
406.Ql \&^
407matches before the string; if the preceding character is not a word character
408but the string starts with a word character,
409.Ql [[:<:]]
410and
411.Ql \e<
412match before the string.
413.El
414.Pp
415See
416.Xr re_format 7
417for a discussion of what is matched in situations where an RE or a
418portion thereof could match any of several substrings of
419.Fa string .
420.Pp
421Normally,
422.Fn regexec
423returns 0 for success and the non-zero code
424.Dv REG_NOMATCH
425for failure.
426Other non-zero error codes may be returned in exceptional situations;
427see
428.Sx RETURN VALUES .
429.Pp
430If
431.Dv REG_NOSUB
432was specified in the compilation of the RE,
433or if
434.Fa nmatch
435is 0,
436.Fn regexec
437ignores the
438.Fa pmatch
439argument
440.Po
441but see below for the case where
442.Dv REG_STARTEND
443is specified
444.Pc .
445Otherwise,
446.Fa pmatch
447points to an array of
448.Fa nmatch
449structures of type
450.Ft regmatch_t .
451Such a structure has at least the members
452.Va rm_so
453and
454.Va rm_eo ,
455both of type
456.Ft regoff_t
457.Po
458a signed arithmetic type at least as large as an
459.Ft off_t
460and a
461.Ft ssize_t
462.Pc ,
463containing respectively the offset of the first character of a substring
464and the offset of the first character after the end of the substring.
465Offsets are measured from the beginning of the
466.Fa string
467argument given to
468.Fn regexec .
469An empty substring is denoted by equal offsets,
470both indicating the character following the empty substring.
471.Pp
472The
473.No 0 Ap th
474member of the
475.Fa pmatch
476array is filled in to indicate what substring of
477.Fa string
478was matched by the entire RE.
479Remaining members report what substring was matched by parenthesized
480subexpressions within the RE;
481member
482.Va i
483reports subexpression
484.Va i ,
485with subexpressions counted
486.Pq starting at 1
487by the order of their opening parentheses in the RE, left to right.
488Unused entries in the array
489.Po
490corresponding either to subexpressions that
491did not participate in the match at all, or to subexpressions that do not
492exist in the RE,
493that is,
494.Va i
495>
496.Fa preg Ns Li -> Ns Fa re_nsub
497.Pc
498have both
499.Fa rm_so
500and
501.Fa rm_eo
502set to \-1.
503If a subexpression participated in the match several times,
504the reported substring is the last one it matched.
505.Po
506Note, as an example in particular, that when the RE
507.Ql "(b*)+"
508matches
509.Ql bbb ,
510the parenthesized subexpression matches each of the three
511.So Li b Sc Ns s
512and then
513an infinite number of empty strings following the last
514.Ql b ,
515so the reported substring is one of the empties.
516.Pc
517.Pp
518If
519.Dv REG_STARTEND
520is specified,
521.Fa pmatch
522must point to at least one
523.Ft regmatch_t
524.Po
525even if
526.Fa nmatch
527is 0 or
528.Dv REG_NOSUB
529was specified
530.Pc ,
531to hold the input offsets for
532.Dv REG_STARTEND .
533Use for output is still entirely controlled by
534.Fa nmatch ;
535if
536.Fa nmatch
537is 0 or
538.Dv REG_NOSUB
539was specified,
540the value of
541.Fa pmatch Ns Li [0]
542will not be changed by a successful
543.Fn regexec .
544.Pp
545The
546.Fn regerror
547function
548maps a non-zero
549.Fa errcode
550from either
551.Fn regcomp
552or
553.Fn regexec
554to a human-readable, printable message.
555If
556.Fa preg
557is
558.Pf non- Dv NULL ,
559the error code should have arisen from use of
560the
561.Ft regex_t
562pointed to by
563.Fa preg ,
564and if the error code came from
565.Fn regcomp ,
566it should have been the result from the most recent
567.Fn regcomp
568using that
569.Ft regex_t
570.Po
571the
572.Fn regerror
573may be able to supply a more detailed message using information
574from the
575.Ft regex_t
576.Pc .
577The
578.Fn regerror
579function
580places the NUL-terminated message into the buffer pointed to by
581.Fa errbuf ,
582limiting the length
583.Pq including the Tn NUL
584to at most
585.Fa errbuf_size
586bytes.
587If the whole message will not fit,
588as much of it as will fit before the terminating NUL is supplied.
589In any case,
590the returned value is the size of buffer needed to hold the whole
591message
592.Pq including terminating Tn NUL .
593If
594.Fa errbuf_size
595is 0,
596.Fa errbuf
597is ignored but the return value is still correct.
598.Pp
599If the
600.Fa errcode
601given to
602.Fn regerror
603is first
604.Em or Ap ed
605with
606.Dv REG_ITOA ,
607the
608.Dq message
609that results is the printable name of the error code,
610e.g.\&
611.Dq Dv REG_NOMATCH ,
612rather than an explanation thereof.
613If
614.Fa errcode
615is
616.Dv REG_ATOI ,
617then
618.Fa preg
619shall be
620.Pf non- Dv NULL
621and the
622.Fa re_endp
623member of the structure it points to
624must point to the printable name of an error code;
625in this case, the result in
626.Fa errbuf
627is the decimal digits of
628the numeric value of the error code
629.Pq 0 if the name is not recognized .
630.Dv REG_ITOA
631and
632.Dv REG_ATOI
633are intended primarily as debugging facilities;
634they are extensions,
635compatible with but not specified by
636.St -p1003.2 ,
637and should be used with
638caution in software intended to be portable to other systems.
639Be warned also that they are considered experimental and changes are possible.
640.Pp
641The
642.Fn regfree
643function
644frees any dynamically-allocated storage associated with the compiled RE
645pointed to by
646.Fa preg .
647The remaining
648.Ft regex_t
649is no longer a valid compiled RE
650and the effect of supplying it to
651.Fn regexec
652or
653.Fn regerror
654is undefined.
655.Pp
656None of these functions references global variables except for tables
657of constants;
658all are safe for use from multiple threads if the arguments are safe.
659.Pp
660The
661.Fn regnsub
662and
663.Fn regasub
664functions perform substitutions using
665.Xr sed 1
666like syntax.
667They return the length of the string that would have been created
668if there was enough space or \-1 on error, setting
669.Va errno .
670The result
671is being placed in
672.Fa buf
673which is user-supplied in
674.Fn regnsub
675and dynamically allocated in
676.Fn regasub .
677The
678.Fa sub
679argument contains a substitution string which might refer to the first
6809 regular expression strings using
681.So Ic \e Ns Ar N Sc
682to refer to the nth matched
683item, or
684.Ql &
685.Po
686which is equivalent to
687.Ic \e0
688.Pc
689to refer to the full match.
690The
691.Fa rm
692array must be at least 10 elements long, and should contain the result
693of the matches from a previous
694.Fn regexec
695call.
696Only 10 elements of the
697.Fa rm
698array can be used.
699The
700.Fa str
701argument contains the source string to apply the transformation to.
702.Sh IMPLEMENTATION CHOICES
703There are a number of decisions that
704.St -p1003.2
705leaves up to the implementor,
706either by explicitly saying
707.Dq undefined
708or by virtue of them being
709forbidden by the RE grammar.
710This implementation treats them as follows.
711.Pp
712See
713.Xr re_format 7
714for a discussion of the definition of case-independent matching.
715.Pp
716There is no particular limit on the length of REs,
717except insofar as memory is limited.
718Memory usage is approximately linear in RE size, and largely insensitive
719to RE complexity, except for bounded repetitions.
720See
721.Sx BUGS
722for one short RE using them
723that will run almost any system out of memory.
724.Pp
725A backslashed character other than one specifically given a magic meaning
726by
727.St -p1003.2
728.Po
729such magic meanings occur only in obsolete
730.Pq Dq basic
731REs
732.Pc
733is taken as an ordinary character.
734.Pp
735Any unmatched
736.Ql \&[
737is a
738.Dv REG_EBRACK
739error.
740.Pp
741Equivalence classes cannot begin or end bracket-expression ranges.
742The endpoint of one range cannot begin another.
743.Pp
744.Dv RE_DUP_MAX ,
745the limit on repetition counts in bounded repetitions, is 255.
746.Pp
747A repetition operator
748.Po
749.Ql \&? ,
750.Ql \&* ,
751.Ql \&+ ,
752or bounds
753.Pc
754cannot follow another
755repetition operator.
756A repetition operator cannot begin an expression or subexpression
757or follow
758.Ql \&^
759or
760.Ql \&| .
761.Pp
762.Ql \&|
763cannot appear first or last in a (sub)expression or after another
764.Ql \&| ,
765i.e., an operand of
766.Ql \&|
767cannot be an empty subexpression.
768An empty parenthesized subexpression,
769.Ql "()" ,
770is legal and matches an
771empty (sub)string.
772An empty string is not a legal RE.
773.Pp
774A
775.Ql \&{
776followed by a digit is considered the beginning of bounds for a
777bounded repetition, which must then follow the syntax for bounds.
778A
779.Ql \&{
780.Em not
781followed by a digit is considered an ordinary character.
782.Pp
783.Ql \&^
784and
785.Ql \&$
786beginning and ending subexpressions in obsolete
787.Pq Dq basic
788REs are anchors, not ordinary characters.
789.Sh RETURN VALUES
790Non-zero error codes from
791.Fn regcomp
792and
793.Fn regexec
794include the following:
795.Pp
796.Bl -tag -width ".Dv REG_ECOLLATE" -compact
797.It Dv REG_NOMATCH
798The
799.Fn regexec
800function
801failed to match
802.It Dv REG_BADPAT
803invalid regular expression
804.It Dv REG_ECOLLATE
805invalid collating element
806.It Dv REG_ECTYPE
807invalid character class
808.It Dv REG_EESCAPE
809.Ql \e
810applied to unescapable character
811.It Dv REG_ESUBREG
812invalid backreference number
813.It Dv REG_EBRACK
814brackets
815.Ql "[ ]"
816not balanced
817.It Dv REG_EPAREN
818parentheses
819.Ql "( )"
820not balanced
821.It Dv REG_EBRACE
822braces
823.Ql "{ }"
824not balanced
825.It Dv REG_BADBR
826invalid repetition count(s) in
827.Ql "{ }"
828.It Dv REG_ERANGE
829invalid character range in
830.Ql "[ ]"
831.It Dv REG_ESPACE
832ran out of memory
833.It Dv REG_BADRPT
834.Ql \&? ,
835.Ql \&* ,
836or
837.Ql \&+
838operand invalid
839.It Dv REG_EMPTY
840empty (sub)expression
841.It Dv REG_ASSERT
842cannot happen - you found a bug
843.It Dv REG_INVARG
844invalid argument, e.g.\& negative-length string
845.It Dv REG_ILLSEQ
846illegal byte sequence (bad multibyte character)
847.El
848.Sh SEE ALSO
849.Xr grep 1 ,
850.Xr re_format 7
851.Pp
852.St -p1003.2 ,
853sections 2.8
854.Pq Regular Expression Notation
855and
856B.5
857.Pq Tn C No Binding for Regular Expression Matching .
858.Sh HISTORY
859Originally written by
860.An Henry Spencer .
861Altered for inclusion in the
862.Bx 4.4
863distribution.
864.Pp
865The
866.Fn regnsub
867and
868.Fn regasub
869functions appeared in
870.Nx 8 .
871.Sh BUGS
872This is an alpha release with known defects.
873Please report problems.
874.Pp
875The back-reference code is subtle and doubts linger about its correctness
876in complex cases.
877.Pp
878The
879.Fn regexec
880function
881performance is poor.
882This will improve with later releases.
883The
884.Fa nmatch
885argument
886exceeding 0 is expensive;
887.Fa nmatch
888exceeding 1 is worse.
889The
890.Fn regexec
891function
892is largely insensitive to RE complexity
893.Em except
894that back
895references are massively expensive.
896RE length does matter; in particular, there is a strong speed bonus
897for keeping RE length under about 30 characters,
898with most special characters counting roughly double.
899.Pp
900The
901.Fn regcomp
902function
903implements bounded repetitions by macro expansion,
904which is costly in time and space if counts are large
905or bounded repetitions are nested.
906An RE like, say,
907.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}"
908will (eventually) run almost any existing machine out of swap space.
909.Pp
910There are suspected problems with response to obscure error conditions.
911Notably,
912certain kinds of internal overflow,
913produced only by truly enormous REs or by multiply nested bounded repetitions,
914are probably not handled well.
915.Pp
916Due to a mistake in
917.St -p1003.2 ,
918things like
919.Ql "a)b"
920are legal REs because
921.Ql \&)
922is
923a special character only in the presence of a previous unmatched
924.Ql \&( .
925This cannot be fixed until the spec is fixed.
926.Pp
927The standard's definition of back references is vague.
928For example, does
929.Ql "a\e(\e(b\e)*\e2\e)*d"
930match
931.Ql "abbbd" ?
932Until the standard is clarified,
933behavior in such cases should not be relied on.
934.Pp
935The implementation of word-boundary matching is a bit of a kludge,
936and bugs may lurk in combinations of word-boundary matching and anchoring.
937.Pp
938Word-boundary matching does not work properly in multibyte locales.
939