1*22591Smckusick 2*22591Smckusick @(#)README 5.1 (Berkeley) 06/06/85 3*22591Smckusick 4*22591SmckusickEnclosed is compress version 3.0 with the following changes: 5*22591Smckusick 6*22591Smckusick1. "Block" compression is performed. After the BITS run out, the 7*22591Smckusick compression ratio is checked every so often. If it is decreasing, 8*22591Smckusick the table is cleared and a new set of substrings are generated. 9*22591Smckusick 10*22591Smckusick This makes the output of compress 3.0 not compatable with that of 11*22591Smckusick compress 2.0. However, compress 3.0 still accepts the output of 12*22591Smckusick compress 2.0. To generate output that is compatable with compress 13*22591Smckusick 2.0, use the undocumented "-C" flag. 14*22591Smckusick 15*22591Smckusick2. A quiet "-q" flag has been added for use by the news system. 16*22591Smckusick 17*22591Smckusick3. The character chaining has been deleted and the program now uses 18*22591Smckusick hashing. This boosts speed , especially during compression of 19*22591Smckusick large files. Other speed improvements have been made, such as 20*22591Smckusick using putc() instead of fwrite(). 21*22591Smckusick 22*22591Smckusick4. A large table is used on large machines when a relatively small 23*22591Smckusick number of bits is specified. This saves much time when compressing 24*22591Smckusick for a 16-bit machine on a 32-bit virtual machine. 25*22591Smckusick 26*22591SmckusickMost of these changes were made by James A. Woods (ames!jaw). Thank you 27*22591SmckusickJames! 28*22591Smckusick 29*22591SmckusickTo compile compress: 30*22591Smckusick 31*22591Smckusick cc -O -DUSERMEM=usermem -o compress compress.c 32*22591Smckusick 33*22591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes). 34*22591SmckusickIf any physical memory is to be reserved for other processes, put in 35*22591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 36*22591Smckusick 37*22591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be 38*22591Smckusickspecified, and the cutoff bits where the large+fast table is used. 39*22591Smckusick 40*22591Smckusickmemory: at least BITS cutoff 41*22591Smckusick------ -- ----- ---- ------ 42*22591Smckusick 4,718,592 16 13 43*22591Smckusick 2,621,440 16 12 44*22591Smckusick 1,572,864 16 11 45*22591Smckusick 631,808 16 -- 46*22591Smckusick 329,728 15 -- 47*22591Smckusick 178,176 14 -- 48*22591Smckusick 99,328 13 -- 49*22591Smckusick 0 12 -- 50*22591Smckusick 51*22591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no 52*22591Smckusicklarge+fast table. 53*22591Smckusick 54*22591SmckusickThe maximum bits can be overrulled by specifying "-DBITS=bits" at 55*22591Smckusickcompilation time. 56*22591Smckusick 57*22591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR" 58*22591Smckusickwhen compiling. 59*22591Smckusick 60*22591SmckusickAfter compilation, move "compress" to a standard executable location, such 61*22591Smckusickas /usr/local. Then: 62*22591Smckusick cd /usr/local 63*22591Smckusick ln compress uncompress 64*22591Smckusick ln compress zcat 65*22591Smckusick 66*22591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the 67*22591Smckusickstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 68*22591Smckusick 69*22591SmckusickNext, install the manual (compress.l). 70*22591Smckusick cp compress.l /usr/man/manl 71*22591Smckusick cd /usr/man/manl 72*22591Smckusick ln compress.l uncompress.l 73*22591Smckusick ln compress.l zcat.l 74*22591Smckusick 75*22591Smckusick - or - 76*22591Smckusick 77*22591Smckusick cp compress.l /usr/man/man1/compress.1 78*22591Smckusick cd /usr/man/man1 79*22591Smckusick ln compress.1 uncompress.1 80*22591Smckusick ln compress.1 zcat.1 81*22591Smckusick 82*22591Smckusick regards, 83*22591Smckusick petsd!joe 84*22591Smckusick 85*22591SmckusickHere is a note from the net: 86*22591Smckusick 87*22591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 88*22591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken 89*22591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski) 90*22591SmckusickNewsgroups: net.sources 91*22591SmckusickSubject: Re: Compress release 3.0 : sample Makefile 92*22591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA 93*22591Smckusick 94*22591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a 95*22591Smckusick#define variable which can be set for optimum performance on a machine 96*22591Smckusickwith a large amount of memory. A program (usermem) to calculate the 97*22591Smckusickuseable amount of physical user memory is enclosed, as well as a sample 98*22591Smckusick4.2bsd Vax Makefile for compress. 99*22591Smckusick 100*22591SmckusickHere is the README file from the previous version of compress (2.0): 101*22591Smckusick 102*22591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed: 103*22591Smckusick> 104*22591Smckusick>1. The packed files produced by compress are different on different 105*22591Smckusick> machines and dependent on the vax sysgen option. 106*22591Smckusick> The bug was in the different byte/bit ordering on the 107*22591Smckusick> various machines. This has been fixed. 108*22591Smckusick> 109*22591Smckusick> This version is NOT compatible with the original vax posting 110*22591Smckusick> unless the '-DCOMPATIBLE' option is specified to the C 111*22591Smckusick> compiler. The original posting has a bug which I fixed, 112*22591Smckusick> causing incompatible files. I recommend you NOT to use this 113*22591Smckusick> option unless you already have a lot of packed files from 114*22591Smckusick> the original posting by thomas. 115*22591Smckusick>2. The exit status is not well defined (on some machines) causing the 116*22591Smckusick> scripts to fail. 117*22591Smckusick> The exit status is now 0,1 or 2 and is documented in 118*22591Smckusick> compress.l. 119*22591Smckusick>3. The function getopt() is not available in all C libraries. 120*22591Smckusick> The function getopt() is no longer referenced by the 121*22591Smckusick> program. 122*22591Smckusick>4. Error status is not being checked on the fwrite() and fflush() calls. 123*22591Smckusick> Fixed. 124*22591Smckusick> 125*22591Smckusick>The following enhancements have been made: 126*22591Smckusick> 127*22591Smckusick>1. Added facilities of "compact" into the compress program. "Pack", 128*22591Smckusick> "Unpack", and "Pcat" are no longer required (no longer supplied). 129*22591Smckusick>2. Installed work around for C compiler bug with "-O". 130*22591Smckusick>3. Added a magic number header (\037\235). Put the bits specified 131*22591Smckusick> in the file. 132*22591Smckusick>4. Added "-f" flag to force overwrite of output file. 133*22591Smckusick>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 134*22591Smckusick> compile. 135*22591Smckusick>6. The 'uncompress' script has been deleted; simply 136*22591Smckusick> 'ln compress uncompress' after you compile and it will work. 137*22591Smckusick>7. Removed extra bit masking for machines that support unsigned 138*22591Smckusick> characters. If your machine doesn't support unsigned characters, 139*22591Smckusick> define "NO_UCHAR" when compiling. 140*22591Smckusick> 141*22591Smckusick>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 142*22591Smckusick>standard executable location, such as /usr/local. Then: 143*22591Smckusick> cd /usr/local 144*22591Smckusick> ln compress uncompress 145*22591Smckusick> ln compress zcat 146*22591Smckusick> 147*22591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the 148*22591Smckusick>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 149*22591Smckusick> 150*22591Smckusick>Next, install the manual (compress.l). 151*22591Smckusick> cp compress.l /usr/man/manl - or - 152*22591Smckusick> cp compress.l /usr/man/man1/compress.1 153*22591Smckusick> 154*22591Smckusick>Here is the README that I sent with my first posting: 155*22591Smckusick> 156*22591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it 157*22591Smckusick>>run identically to pack(1), unpack(1), an pcat(1). Here is what I 158*22591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did: 159*22591Smckusick>> 160*22591Smckusick>>1. Removed VAX dependencies. 161*22591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory. 162*22591Smckusick>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 163*22591Smckusick>>4. Sorted the character next chain and changed the search to stop 164*22591Smckusick>>prematurely. This saves a lot on the execution time when compressing. 165*22591Smckusick>> 166*22591Smckusick>>This version is totally compatible with the original version. Even though 167*22591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 168*22591Smckusick>>machine, due to the size of the arrays. 169*22591Smckusick>> 170*22591Smckusick>>Here is the README file from the original author: 171*22591Smckusick>> 172*22591Smckusick>>>Well, with all this discussion about file compression (for news batching 173*22591Smckusick>>>in particular) going around, I decided to implement the text compression 174*22591Smckusick>>>algorithm described in the June Computer magazine. The author claimed 175*22591Smckusick>>>blinding speed and good compression ratios. It's certainly faster than 176*22591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as 177*22591Smckusick>>>pack, and gets better compression than both of them. On 350K bytes of 178*22591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 179*22591Smckusick>>>seconds, and compress (herein) also took 80 seconds. But, compact and 180*22591Smckusick>>>pack got about 30% compression, whereas compress got over 50%. So, I 181*22591Smckusick>>>decided I had something, and that others might be interested, too. 182*22591Smckusick>>> 183*22591Smckusick>>>As is probably true of compact and pack (although I haven't checked), 184*22591Smckusick>>>the byte order within a word is probably relevant here, but as long as 185*22591Smckusick>>>you stay on a single machine type, you should be ok. (Can anybody 186*22591Smckusick>>>elucidate on this?) There are a couple of asm's in the code (extv and 187*22591Smckusick>>>insv instructions), so anyone porting it to another machine will have to 188*22591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax 189*22591Smckusick>>>byte order at the same time). Anyway, I've linted the code (both with 190*22591Smckusick>>>and without -p), so it should run elsewhere. Note the longs in the 191*22591Smckusick>>>code, you can take these out if you reduce BITS to <= 15. 192*22591Smckusick>>> 193*22591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes, 194*22591Smckusick>>>I'd like to see them. 195*22591Smckusick>>> 196*22591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 197*22591Smckusick>> 198*22591Smckusick>> regards, 199*22591Smckusick>> joe 200*22591Smckusick>> 201*22591Smckusick>>-- 202*22591Smckusick>>Full-Name: Joseph M. Orost 203*22591Smckusick>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 204*22591Smckusick>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 205*22591Smckusick>>Phone: (201) 870-5844 206