README
1# $NetBSD: README,v 1.2 1998/01/09 04:12:00 perry Exp $
2
3This is a nearly-public-domain reimplementation of the V8 regexp(3) package.
4It gives C programs the ability to use egrep-style regular expressions, and
5does it in a much cleaner fashion than the analogous routines in SysV.
6
7 Copyright (c) 1986 by University of Toronto.
8 Written by Henry Spencer. Not derived from licensed software.
9
10 Permission is granted to anyone to use this software for any
11 purpose on any computer system, and to redistribute it freely,
12 subject to the following restrictions:
13
14 1. The author is not responsible for the consequences of use of
15 this software, no matter how awful, even if they arise
16 from defects in it.
17
18 2. The origin of this software must not be misrepresented, either
19 by explicit claim or by omission.
20
21 3. Altered versions must be plainly marked as such, and must not
22 be misrepresented as being the original software.
23
24Barring a couple of small items in the BUGS list, this implementation is
25believed 100% compatible with V8. It should even be binary-compatible,
26sort of, since the only fields in a "struct regexp" that other people have
27any business touching are declared in exactly the same way at the same
28location in the struct (the beginning).
29
30This implementation is *NOT* AT&T/Bell code, and is not derived from licensed
31software. Even though U of T is a V8 licensee. This software is based on
32a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
33here is a complete rewrite and hence is not covered by AT&T copyright).
34The software was nearly complete at the time of arrival of our V8 tape.
35I haven't even looked at V8 yet, although a friend elsewhere at U of T has
36been kind enough to run a few test programs using the V8 regexp(3) to resolve
37a few fine points. I admit to some familiarity with regular-expression
38implementations of the past, but the only one that this code traces any
39ancestry to is the one published in Kernighan & Plauger (from which this
40one draws ideas but not code).
41
42Simplistically: put this stuff into a source directory, copy regexp.h into
43/usr/include, inspect Makefile for compilation options that need changing
44to suit your local environment, and then do "make r". This compiles the
45regexp(3) functions, compiles a test program, and runs a large set of
46regression tests. If there are no complaints, then put regexp.o, regsub.o,
47and regerror.o into your C library, and regexp.3 into your manual-pages
48directory.
49
50Note that if you don't put regexp.h into /usr/include *before* compiling,
51you'll have to add "-I." to CFLAGS before compiling.
52
53The files are:
54
55Makefile instructions to make everything
56regexp.3 manual page
57regexp.h header file, for /usr/include
58regexp.c source for regcomp() and regexec()
59regsub.c source for regsub()
60regerror.c source for default regerror()
61regmagic.h internal header file
62try.c source for test program
63timer.c source for timing program
64tests test list for try and timer
65
66This implementation uses nondeterministic automata rather than the
67deterministic ones found in some other implementations, which makes it
68simpler, smaller, and faster at compiling regular expressions, but slower
69at executing them. In theory, anyway. This implementation does employ
70some special-case optimizations to make the simpler cases (which do make
71up the bulk of regular expressions actually used) run quickly. In general,
72if you want blazing speed you're in the wrong place. Replacing the insides
73of egrep with this stuff is probably a mistake; if you want your own egrep
74you're going to have to do a lot more work. But if you want to use regular
75expressions a little bit in something else, you're in luck. Note that many
76existing text editors use nondeterministic regular-expression implementations,
77so you're in good company.
78
79This stuff should be pretty portable, given appropriate option settings.
80If your chars have less than 8 bits, you're going to have to change the
81internal representation of the automaton, although knowledge of the details
82of this is fairly localized. There are no "reserved" char values except for
83NUL, and no special significance is attached to the top bit of chars.
84The string(3) functions are used a fair bit, on the grounds that they are
85probably faster than coding the operations in line. Some attempts at code
86tuning have been made, but this is invariably a bit machine-specific.
87