xref: /csrg-svn/lib/libcompat/regexp/README (revision 61248)
1*41344SbosticThis is a nearly-public-domain reimplementation of the V8 regexp(3) package.
2*41344SbosticIt gives C programs the ability to use egrep-style regular expressions, and
3*41344Sbosticdoes it in a much cleaner fashion than the analogous routines in SysV.
4*41344Sbostic
5*41344Sbostic	Copyright (c) 1986 by University of Toronto.
6*41344Sbostic	Written by Henry Spencer.  Not derived from licensed software.
7*41344Sbostic
8*41344Sbostic	Permission is granted to anyone to use this software for any
9*41344Sbostic	purpose on any computer system, and to redistribute it freely,
10*41344Sbostic	subject to the following restrictions:
11*41344Sbostic
12*41344Sbostic	1. The author is not responsible for the consequences of use of
13*41344Sbostic		this software, no matter how awful, even if they arise
14*41344Sbostic		from defects in it.
15*41344Sbostic
16*41344Sbostic	2. The origin of this software must not be misrepresented, either
17*41344Sbostic		by explicit claim or by omission.
18*41344Sbostic
19*41344Sbostic	3. Altered versions must be plainly marked as such, and must not
20*41344Sbostic		be misrepresented as being the original software.
21*41344Sbostic
22*41344SbosticBarring a couple of small items in the BUGS list, this implementation is
23*41344Sbosticbelieved 100% compatible with V8.  It should even be binary-compatible,
24*41344Sbosticsort of, since the only fields in a "struct regexp" that other people have
25*41344Sbosticany business touching are declared in exactly the same way at the same
26*41344Sbosticlocation in the struct (the beginning).
27*41344Sbostic
28*41344SbosticThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed
29*41344Sbosticsoftware.  Even though U of T is a V8 licensee.  This software is based on
30*41344Sbostica V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
31*41344Sbostichere is a complete rewrite and hence is not covered by AT&T copyright).
32*41344SbosticThe software was nearly complete at the time of arrival of our V8 tape.
33*41344SbosticI haven't even looked at V8 yet, although a friend elsewhere at U of T has
34*41344Sbosticbeen kind enough to run a few test programs using the V8 regexp(3) to resolve
35*41344Sbostica few fine points.  I admit to some familiarity with regular-expression
36*41344Sbosticimplementations of the past, but the only one that this code traces any
37*41344Sbosticancestry to is the one published in Kernighan & Plauger (from which this
38*41344Sbosticone draws ideas but not code).
39*41344Sbostic
40*41344SbosticSimplistically:  put this stuff into a source directory, copy regexp.h into
41*41344Sbostic/usr/include, inspect Makefile for compilation options that need changing
42*41344Sbosticto suit your local environment, and then do "make r".  This compiles the
43*41344Sbosticregexp(3) functions, compiles a test program, and runs a large set of
44*41344Sbosticregression tests.  If there are no complaints, then put regexp.o, regsub.o,
45*41344Sbosticand regerror.o into your C library, and regexp.3 into your manual-pages
46*41344Sbosticdirectory.
47*41344Sbostic
48*41344SbosticNote that if you don't put regexp.h into /usr/include *before* compiling,
49*41344Sbosticyou'll have to add "-I." to CFLAGS before compiling.
50*41344Sbostic
51*41344SbosticThe files are:
52*41344Sbostic
53*41344SbosticMakefile	instructions to make everything
54*41344Sbosticregexp.3	manual page
55*41344Sbosticregexp.h	header file, for /usr/include
56*41344Sbosticregexp.c	source for regcomp() and regexec()
57*41344Sbosticregsub.c	source for regsub()
58*41344Sbosticregerror.c	source for default regerror()
59*41344Sbosticregmagic.h	internal header file
60*41344Sbostictry.c		source for test program
61*41344Sbostictimer.c		source for timing program
62*41344Sbostictests		test list for try and timer
63*41344Sbostic
64*41344SbosticThis implementation uses nondeterministic automata rather than the
65*41344Sbosticdeterministic ones found in some other implementations, which makes it
66*41344Sbosticsimpler, smaller, and faster at compiling regular expressions, but slower
67*41344Sbosticat executing them.  In theory, anyway.  This implementation does employ
68*41344Sbosticsome special-case optimizations to make the simpler cases (which do make
69*41344Sbosticup the bulk of regular expressions actually used) run quickly.  In general,
70*41344Sbosticif you want blazing speed you're in the wrong place.  Replacing the insides
71*41344Sbosticof egrep with this stuff is probably a mistake; if you want your own egrep
72*41344Sbosticyou're going to have to do a lot more work.  But if you want to use regular
73*41344Sbosticexpressions a little bit in something else, you're in luck.  Note that many
74*41344Sbosticexisting text editors use nondeterministic regular-expression implementations,
75*41344Sbosticso you're in good company.
76*41344Sbostic
77*41344SbosticThis stuff should be pretty portable, given appropriate option settings.
78*41344SbosticIf your chars have less than 8 bits, you're going to have to change the
79*41344Sbosticinternal representation of the automaton, although knowledge of the details
80*41344Sbosticof this is fairly localized.  There are no "reserved" char values except for
81*41344SbosticNUL, and no special significance is attached to the top bit of chars.
82*41344SbosticThe string(3) functions are used a fair bit, on the grounds that they are
83*41344Sbosticprobably faster than coding the operations in line.  Some attempts at code
84*41344Sbostictuning have been made, but this is invariably a bit machine-specific.
85