xref: /netbsd-src/lib/libcompat/regexp/README (revision b51ed80fd587a49787f7f3850a23e090b555ad90)
1*b51ed80fSperry#	$NetBSD: README,v 1.2 1998/01/09 04:12:00 perry Exp $
2*b51ed80fSperry
361f28255ScgdThis is a nearly-public-domain reimplementation of the V8 regexp(3) package.
461f28255ScgdIt gives C programs the ability to use egrep-style regular expressions, and
561f28255Scgddoes it in a much cleaner fashion than the analogous routines in SysV.
661f28255Scgd
761f28255Scgd	Copyright (c) 1986 by University of Toronto.
861f28255Scgd	Written by Henry Spencer.  Not derived from licensed software.
961f28255Scgd
1061f28255Scgd	Permission is granted to anyone to use this software for any
1161f28255Scgd	purpose on any computer system, and to redistribute it freely,
1261f28255Scgd	subject to the following restrictions:
1361f28255Scgd
1461f28255Scgd	1. The author is not responsible for the consequences of use of
1561f28255Scgd		this software, no matter how awful, even if they arise
1661f28255Scgd		from defects in it.
1761f28255Scgd
1861f28255Scgd	2. The origin of this software must not be misrepresented, either
1961f28255Scgd		by explicit claim or by omission.
2061f28255Scgd
2161f28255Scgd	3. Altered versions must be plainly marked as such, and must not
2261f28255Scgd		be misrepresented as being the original software.
2361f28255Scgd
2461f28255ScgdBarring a couple of small items in the BUGS list, this implementation is
2561f28255Scgdbelieved 100% compatible with V8.  It should even be binary-compatible,
2661f28255Scgdsort of, since the only fields in a "struct regexp" that other people have
2761f28255Scgdany business touching are declared in exactly the same way at the same
2861f28255Scgdlocation in the struct (the beginning).
2961f28255Scgd
3061f28255ScgdThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed
3161f28255Scgdsoftware.  Even though U of T is a V8 licensee.  This software is based on
3261f28255Scgda V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
3361f28255Scgdhere is a complete rewrite and hence is not covered by AT&T copyright).
3461f28255ScgdThe software was nearly complete at the time of arrival of our V8 tape.
3561f28255ScgdI haven't even looked at V8 yet, although a friend elsewhere at U of T has
3661f28255Scgdbeen kind enough to run a few test programs using the V8 regexp(3) to resolve
3761f28255Scgda few fine points.  I admit to some familiarity with regular-expression
3861f28255Scgdimplementations of the past, but the only one that this code traces any
3961f28255Scgdancestry to is the one published in Kernighan & Plauger (from which this
4061f28255Scgdone draws ideas but not code).
4161f28255Scgd
4261f28255ScgdSimplistically:  put this stuff into a source directory, copy regexp.h into
4361f28255Scgd/usr/include, inspect Makefile for compilation options that need changing
4461f28255Scgdto suit your local environment, and then do "make r".  This compiles the
4561f28255Scgdregexp(3) functions, compiles a test program, and runs a large set of
4661f28255Scgdregression tests.  If there are no complaints, then put regexp.o, regsub.o,
4761f28255Scgdand regerror.o into your C library, and regexp.3 into your manual-pages
4861f28255Scgddirectory.
4961f28255Scgd
5061f28255ScgdNote that if you don't put regexp.h into /usr/include *before* compiling,
5161f28255Scgdyou'll have to add "-I." to CFLAGS before compiling.
5261f28255Scgd
5361f28255ScgdThe files are:
5461f28255Scgd
5561f28255ScgdMakefile	instructions to make everything
5661f28255Scgdregexp.3	manual page
5761f28255Scgdregexp.h	header file, for /usr/include
5861f28255Scgdregexp.c	source for regcomp() and regexec()
5961f28255Scgdregsub.c	source for regsub()
6061f28255Scgdregerror.c	source for default regerror()
6161f28255Scgdregmagic.h	internal header file
6261f28255Scgdtry.c		source for test program
6361f28255Scgdtimer.c		source for timing program
6461f28255Scgdtests		test list for try and timer
6561f28255Scgd
6661f28255ScgdThis implementation uses nondeterministic automata rather than the
6761f28255Scgddeterministic ones found in some other implementations, which makes it
6861f28255Scgdsimpler, smaller, and faster at compiling regular expressions, but slower
6961f28255Scgdat executing them.  In theory, anyway.  This implementation does employ
7061f28255Scgdsome special-case optimizations to make the simpler cases (which do make
7161f28255Scgdup the bulk of regular expressions actually used) run quickly.  In general,
7261f28255Scgdif you want blazing speed you're in the wrong place.  Replacing the insides
7361f28255Scgdof egrep with this stuff is probably a mistake; if you want your own egrep
7461f28255Scgdyou're going to have to do a lot more work.  But if you want to use regular
7561f28255Scgdexpressions a little bit in something else, you're in luck.  Note that many
7661f28255Scgdexisting text editors use nondeterministic regular-expression implementations,
7761f28255Scgdso you're in good company.
7861f28255Scgd
7961f28255ScgdThis stuff should be pretty portable, given appropriate option settings.
8061f28255ScgdIf your chars have less than 8 bits, you're going to have to change the
8161f28255Scgdinternal representation of the automaton, although knowledge of the details
8261f28255Scgdof this is fairly localized.  There are no "reserved" char values except for
8361f28255ScgdNUL, and no special significance is attached to the top bit of chars.
8461f28255ScgdThe string(3) functions are used a fair bit, on the grounds that they are
8561f28255Scgdprobably faster than coding the operations in line.  Some attempts at code
8661f28255Scgdtuning have been made, but this is invariably a bit machine-specific.
87