1*41344SbosticThis is a nearly-public-domain reimplementation of the V8 regexp(3) package. 2*41344SbosticIt gives C programs the ability to use egrep-style regular expressions, and 3*41344Sbosticdoes it in a much cleaner fashion than the analogous routines in SysV. 4*41344Sbostic 5*41344Sbostic Copyright (c) 1986 by University of Toronto. 6*41344Sbostic Written by Henry Spencer. Not derived from licensed software. 7*41344Sbostic 8*41344Sbostic Permission is granted to anyone to use this software for any 9*41344Sbostic purpose on any computer system, and to redistribute it freely, 10*41344Sbostic subject to the following restrictions: 11*41344Sbostic 12*41344Sbostic 1. The author is not responsible for the consequences of use of 13*41344Sbostic this software, no matter how awful, even if they arise 14*41344Sbostic from defects in it. 15*41344Sbostic 16*41344Sbostic 2. The origin of this software must not be misrepresented, either 17*41344Sbostic by explicit claim or by omission. 18*41344Sbostic 19*41344Sbostic 3. Altered versions must be plainly marked as such, and must not 20*41344Sbostic be misrepresented as being the original software. 21*41344Sbostic 22*41344SbosticBarring a couple of small items in the BUGS list, this implementation is 23*41344Sbosticbelieved 100% compatible with V8. It should even be binary-compatible, 24*41344Sbosticsort of, since the only fields in a "struct regexp" that other people have 25*41344Sbosticany business touching are declared in exactly the same way at the same 26*41344Sbosticlocation in the struct (the beginning). 27*41344Sbostic 28*41344SbosticThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed 29*41344Sbosticsoftware. Even though U of T is a V8 licensee. This software is based on 30*41344Sbostica V8 manual page sent to me by Dennis Ritchie (the manual page enclosed 31*41344Sbostichere is a complete rewrite and hence is not covered by AT&T copyright). 32*41344SbosticThe software was nearly complete at the time of arrival of our V8 tape. 33*41344SbosticI haven't even looked at V8 yet, although a friend elsewhere at U of T has 34*41344Sbosticbeen kind enough to run a few test programs using the V8 regexp(3) to resolve 35*41344Sbostica few fine points. I admit to some familiarity with regular-expression 36*41344Sbosticimplementations of the past, but the only one that this code traces any 37*41344Sbosticancestry to is the one published in Kernighan & Plauger (from which this 38*41344Sbosticone draws ideas but not code). 39*41344Sbostic 40*41344SbosticSimplistically: put this stuff into a source directory, copy regexp.h into 41*41344Sbostic/usr/include, inspect Makefile for compilation options that need changing 42*41344Sbosticto suit your local environment, and then do "make r". This compiles the 43*41344Sbosticregexp(3) functions, compiles a test program, and runs a large set of 44*41344Sbosticregression tests. If there are no complaints, then put regexp.o, regsub.o, 45*41344Sbosticand regerror.o into your C library, and regexp.3 into your manual-pages 46*41344Sbosticdirectory. 47*41344Sbostic 48*41344SbosticNote that if you don't put regexp.h into /usr/include *before* compiling, 49*41344Sbosticyou'll have to add "-I." to CFLAGS before compiling. 50*41344Sbostic 51*41344SbosticThe files are: 52*41344Sbostic 53*41344SbosticMakefile instructions to make everything 54*41344Sbosticregexp.3 manual page 55*41344Sbosticregexp.h header file, for /usr/include 56*41344Sbosticregexp.c source for regcomp() and regexec() 57*41344Sbosticregsub.c source for regsub() 58*41344Sbosticregerror.c source for default regerror() 59*41344Sbosticregmagic.h internal header file 60*41344Sbostictry.c source for test program 61*41344Sbostictimer.c source for timing program 62*41344Sbostictests test list for try and timer 63*41344Sbostic 64*41344SbosticThis implementation uses nondeterministic automata rather than the 65*41344Sbosticdeterministic ones found in some other implementations, which makes it 66*41344Sbosticsimpler, smaller, and faster at compiling regular expressions, but slower 67*41344Sbosticat executing them. In theory, anyway. This implementation does employ 68*41344Sbosticsome special-case optimizations to make the simpler cases (which do make 69*41344Sbosticup the bulk of regular expressions actually used) run quickly. In general, 70*41344Sbosticif you want blazing speed you're in the wrong place. Replacing the insides 71*41344Sbosticof egrep with this stuff is probably a mistake; if you want your own egrep 72*41344Sbosticyou're going to have to do a lot more work. But if you want to use regular 73*41344Sbosticexpressions a little bit in something else, you're in luck. Note that many 74*41344Sbosticexisting text editors use nondeterministic regular-expression implementations, 75*41344Sbosticso you're in good company. 76*41344Sbostic 77*41344SbosticThis stuff should be pretty portable, given appropriate option settings. 78*41344SbosticIf your chars have less than 8 bits, you're going to have to change the 79*41344Sbosticinternal representation of the automaton, although knowledge of the details 80*41344Sbosticof this is fairly localized. There are no "reserved" char values except for 81*41344SbosticNUL, and no special significance is attached to the top bit of chars. 82*41344SbosticThe string(3) functions are used a fair bit, on the grounds that they are 83*41344Sbosticprobably faster than coding the operations in line. Some attempts at code 84*41344Sbostictuning have been made, but this is invariably a bit machine-specific. 85